Low latency protocols like CMAF are wreaking havoc with traditional ABR algorithms. We’re having to come up with new ways of assessing if we’re running out of bandwidth. Traditionally, this is done by looking at how long a video chunk takes to download and comparing that with its playback duration. If you’re downloading at the same speed it’s playing, it’s time consider changing stream to a lower-bandwidth one.
As latencies have come down, servers will now start sending data from the beginning of a chunk as it’s being written which means it’s can’t be downloaded any quicker. To learn more about this, look at our article on ISO BMFF and this streaming primer. Since the file can’t be downloaded any quicker, we can’t ascertain if we should move up in bitrate to a better quality stream, so while we can switch down if we start running out of bandwidth, we can’t find a time to go up.
Ali’s algorithm uses the time the last chunk finished downloading in place of the missing timestamp figuring that the new chunk is going to load pretty soon after the old. Now, looking at the data, we see that the gap between one chunk finishing and the next one starting does vary. This lead Ali’s team to move to a sliding window moving average taking the last 3 download durations into consideration. This is assumed to be enough to smooth out some of those variances and provides the data to allow them to predict future bandwidth and make a decision to change bitrate or not. There have been a number of alternative suggestions over the last year or so, all of which perform worse than this technique called ACTE.
In the last section of this talk, Ali explores the entry he was part of into a Twitch-sponsored competition to keep playback latency close to a second in test conditions with varying bitrate. Playback speed is key to much work in low-latency streaming as it’s the best way to trim off a little bit of latency when things are going well and allows you to buy time if you’re waiting for data; the big challenge is doing it without the viewer noticing. The entry used a heuristics and a machine learning approach which worked so well, they were runners up in the contest.
It’s one of the most common visual artefacts affecting both video and images. The scourge of the beautiful sunset and the enemy of natural skin tones, banding is very noticeable as it’s not seen in nature. Banding happens when there is not enough bit depth to allow for a smooth gradient of colour or brightness which leads to strips of one shade and an abrupt change to a strip of the next, clearly different, shade.
In this Video Tech talk, SSIMWAVE’s Dr. Hojat Yeganeh explains what can be done to reduce or eliminate banding. He starts by explaining how banding is created during compression, where the quantiser has reduced the accuracy of otherwise unique pixels to very similar numbers leaving them looking the same.
Dr. Hojat explains why we see these edges so clearly. By both looking at how contrast is defined but also by referencing Dolby’s famous graph showing contrast steps against luminance where they plotted 10-bit HDR against 12-bit HDR and show that the 12-bit PQ image is always below the ‘Barten limit’ which is the threshold beyond which no contrast steps are visible. It shows that a 10-bit HDR image is always susceptible to showing quantised, i.e. banded, steps.
Why do we deliver 10-bit HDR video if it can still show banding? This is because in real footage, camera noise and film grain serve to break up the bands. Dr. Hojat explains that this random noise amounts to ‘dithering’. Well known in both audio and video, when you add random noise which changes over time, humans stop being able to see the bands. TV manufacturers also apply dithering to the picture before showing which can further break up banding, at the cost of more noise on the image.
How can you automatically detect banding? We hear that typical metrics like VMAF and SSIM aren’t usefully sensitive to banding. SSIMWAVE’s SSIMPLUS metric, on the other hand, has been created to also be able to create a banding detection map which helps with the automatic identification of banding.
The video finishes with questions including when banding is part of artistic intention, types of metrics not identifiable by typical metrics, consumer display limitations among others.
Automatic assessment of video quality is essential for creating encoders, selecting vendors, choosing operating points and, for online streaming services, in ongoing service improvement. But getting a computer to understand what looks good and what looks bad to humans is not trivial. When the computer doesn’t have the source video to compare against, it’s even harder.
In this talk, Dr. Ahmed Badr from SSIMWAVE looks at how video quality assessment (VQA) works and goes into detail on No-Reference (NR) techniques. He starts by stating the case for VQA which is an extension, and often replacement for subjective scoring by people. Clearly this is time-consuming, can be more expensive due to involvement of people (and the time) plus requires specific viewing conditions. When done well, a whole, carefully decorated room is required. So when it comes to analysing all the video created by a TV station or automating per-title encoding optimisation, we know we have to remove the human element.
Ahmed moves on to discuss the challenges of No Reference VQA such as identifying intended blur or noise. NR VQA is a two-step process with the first being extracting features from the video. These features are then mapped to a quality model which can be done with a machine learning/AI process which is the technique which Ahmed analyses next. The first task is to come up with a dataset of videos which should be carefully chosen, then it’s important to choose a metric to use for the training, for instance, MS-SSIM or VMAF. This is needed so that the learning algorithm can get the feedback it needs to improve. The last two elements are choosing what you are optimising for, technically called a loss function, and then choosing an AI model for use.
The data set you create needs to be aimed at exploring a certain aspect or range of aspects of video. It could be that you want to optimise for sports, but if you need a broad array of genres, optimising for reducing compression or scaling artefacts may be the main theme of the video dataset. Ahmed talks about the millions of video samples that they have collated and how they’ve used that to create their metric called SSIMPLUS which can work both with a reference and without.
Views and opinions expressed on this website are those of the author(s) and do not necessarily reflect those of SMPTE or SMPTE Members.
This website is presented for informational purposes only. Any reference to specific companies, products or services does not represent promotion, recommendation, or endorsement by SMPTE