Video: Optimal Design of Encoding Profiles for Web Streaming

With us since 1998, ABR (Adaptive Bitrate) has been allowing streaming players to select a stream appropriate for their computer and bandwidth. But in this video, we hear that over 20 years on, we’re still developing ways to understand and optimise the performance of ABRs for delivery, finding the best balance of size and quality.

Brightcove’s Yuriy Reznik takes us deep into the theory, but start at the basics of what ABR is and why we. use it. He covers how it delivers a whole series os separate streams at different resolutions and bitrates. Whilst that works well, he quickly starts to show the downsides of ‘static’ ABR profiles. These are where a provider decides that all assets will be encoded at the same set bitrate of 6 or 7 bitrates even though some titles such as cartoons will require less bandwidth than sports programmes. This is where per-title and other encoding techniques come in.

Netflix coined the term ‘per-title encoding’ which has since been called content-aware encoding. This takes in to consideration the content itself when determining the bitrate to encode at. Using automatic processes to determine objective quality of a sample encode, it is able to determine the optimum bitrate.

Content & network-aware encoding takes into account the network delivery as part of the optimisation as well as the quality of the final video itself. It’s able to estimate the likelihood of a stream being selected for playback based upon its bitrate. The trick is combining these two factors simultaneously to find the optimum bitrate vs quality.

The last element to add in order to make this ABR optimisation as realistic as practical is to take into account the way people actually view the content. Looking at a real example from the US open, we see how on PCs, the viewing window can be many different sizes and you can calculate the probability of the different sizes being used. Furthermore we know there is some intelligence in the players where they won’t take in a stream with a resolution which is much bigger than the browser viewport.

Yuriy brings starts the final section of his talk by explaining that he brought in another quality metric from Westerink & Roufs which allows him to estimate how people see video which has been encoded at a certain resolution which is then scaled to a fixed interim resolution for decoding and then to the correct size for the browser windows.

The result of adding in this further check shows that fewer points on the ladder tend to be better, giving an overall higher quality value. Going much beyond 3 is typically not useful for the website. Shows only a few resolutions needed to get good average quality. Adding more isn’t so useful.

Yuriy finishes by introducing SSIM modeling of the noise of an encoder at different bitrates. Bringing together all of these factors, modelled as equations, allows him to suggest optimal ABR ladders.

Yuriy Reznik
Technology Fellow and Head of Research, Brightcove

Video: No-Reference QoE Assessment: Knowledge-based vs. Learning-based

Automatic assessment of video quality is essential for creating encoders, selecting vendors, choosing operating points and, for online streaming services, in ongoing service improvement. But getting a computer to understand what looks good and what looks bad to humans is not trivial. When the computer doesn’t have the source video to compare against, it’s even harder.

In this talk, Dr. Ahmed Badr from SSIMWAVE looks at how video quality assessment (VQA) works and goes into detail on No-Reference (NR) techniques. He starts by stating the case for VQA which is an extension, and often replacement for subjective scoring by people. Clearly this is time-consuming, can be more expensive due to involvement of people (and the time) plus requires specific viewing conditions. When done well, a whole, carefully decorated room is required. So when it comes to analysing all the video created by a TV station or automating per-title encoding optimisation, we know we have to remove the human element.

Ahmed moves on to discuss the challenges of No Reference VQA such as identifying intended blur or noise. NR VQA is a two-step process with the first being extracting features from the video. These features are then mapped to a quality model which can be done with a machine learning/AI process which is the technique which Ahmed analyses next. The first task is to come up with a dataset of videos which should be carefully chosen, then it’s important to choose a metric to use for the training, for instance, MS-SSIM or VMAF. This is needed so that the learning algorithm can get the feedback it needs to improve. The last two elements are choosing what you are optimising for, technically called a loss function, and then choosing an AI model for use.

The data set you create needs to be aimed at exploring a certain aspect or range of aspects of video. It could be that you want to optimise for sports, but if you need a broad array of genres, optimising for reducing compression or scaling artefacts may be the main theme of the video dataset. Ahmed talks about the millions of video samples that they have collated and how they’ve used that to create their metric called SSIMPLUS which can work both with a reference and without.

Dr. Ahmed Badr

Video: Reducing peak bandwidth for OTT

‘Flattening the curve’ isn’t just about dealing with viruses, we learn from Will Law. Rather, this is one way to deal with network congestion brought on by the rise in broadband use during the global lockdown. This and other key ways such as per-title encoding and removing the top tier are just two other which are explored in this video from Akamai and Bitmovin.

Will Law starts the talk explaining why congestion happens in a world where ABR (adaptive bitrate streaming) is supposed to deal with this. With Akamai’s traffic up by around 300%, it’s perhaps not a surprise there’s a contest for bandwidth. As not all traffic is a video stream, congestion will still happen when fighting with other, static, data transfers. However deeper than that, even with two ABR streams, the congestion protocol in use has a big impact as will shows with a graph showing Akamai’s FastTCP and BBR where BBR steals all the bandwidth rather than ‘playing fair’.

Using a webpage constructed for the video, Will shows us a baseline video playback and the metrics associated with it such as data transferred and bitrate which he uses to demonstrate the different benefits of bitrate production techniques. The first is covered by Bitmovin’s Sean McCarthy who explains Bitmovin’s per-title encoding technology. This approach ensures that each asset has encoder settings tuned to get the best out of the content whilst reducing bandwidth as opposed to simply setting your encoder to a fairly-high, safe, static bitrate for all content no matter how complex it is. Will shows on the demo that the bitrate reduces by over 50%.

Swapping codecs is an obvious way to reduce bandwidth. Unlike per-title encoding which is transparent to the end-user, using AV1, VP9 or HEVC requires support by the final device. Whilst you could offer multiple versions of your assets to make sure you still cover all your players despite fragmentation, this has the downside of extra encoding costs and time.

Will then looks at three ways to reduce bandwidth by stopping the highest-bitrate rendition from being used. Method one is to manually modify the manifest file. Method two demonstrates how to do so using the Bitmovin player API, and method three uses the CDN itself to manipulate the manifests. The advantage of doing this in the CDN is because this allows much more flexibility as you can use geolocation rules, for example, to deliver different manifests to different locations.

The final method to reduce peak bandwidth is to use the CDN to throttle download speed of the stream chunks. This means that while you may – if you are lucky – have the ability to download at 100Mbps, the CDN only delivers 3- or 5-times the real-time bitrate. This goes a long way to smoothing out the peaks which is better for the end user’s equipment and for the CDN. Seen in isolation, this does very little, as the video bitrate and the data transferred remain the same. However, delivering the video in this much more co-operative way is much more likely to cause knock-on problems for other traffic. It can, of course, be used in conjunction with the other techniques. The video concludes with a Q&A.

Will Law
Chief Architect, Akamai
Sean McCarthy
Technical Product Marketing Manager, Bitmovin

Video: AV1 at Netflix

Netflix have continually been pushing forward video compression and analysis because their assets are played so many times that every bit saved is real money saved. VMAF is a great example of Netflix’s desire to push the state of the art forward. Developed by Netflix and two universities, this new objective metric allowed them to better evaluate the quality of videos using computer analysis and has continued to be the foundation of their work since.

One use of VMAF has been to verify the results of Netflix’s Per-Shot Encoding method which alters encoding parameters for each shot of the film rather than using a fixed set of parameters for the whole film. The Broadcast Knowledge has featured talks on their previous technique, per-title encoding (among others).

AV1, however must be the most famous innovation that Neflix is behind. A founding member of the Alliance for Open Media (AoM), Netflix saw a need a for a better codec and by making an open one, which also played to the needs of other internet giants such as Google, was a good way to create a vibrant community around it driving submissions to the codec itself but also, it is hoped, in the implementation and adoption.

In this two-part talk, LiWei Guo starts off by explaining the ways in which AV1 will be used by Netflix. Since this talk took place, Netflix has started streaming in AV1 to Android clients. LiWei points out that AV1 supports 10-bit video as standard – a notable difference from other codecs like AVC and HEVC. This allows Netflix to use 10-bit without worrying about decoder compatibility and he shows examples of skies and water which are significantly by the use of 10-bit.

Another feature of AV1 is the Film Grain synthesis which seeks to improve encoding efficiency by removing the random film grain of the source during the encode process then inserting a similar random noise on top to recreate the same look and feel. As anything random can’t be predicted, noise such of this is very wasteful for a codec to try and encode, therefore it’s not <a surprise that this can result in as much as a 30% reduction in bitrate. Before concluding, LiWei briefly explains per-shot encoding then shows data showing the overall improvements.

Andrey Norkin, also from Netflix explains their work with Intel on the SVT-AV1 software video encoder which leverages Intel’s SVT technology, a framework optimised for Xeon chips for video encoding and analysis. Netflix’s motivations are to further increase adoption by delivering a data centre-ready, optimised encoder and to create an AV1 encoder they can use to support their own internal research activities (did someone say AV2?). SVT allows for parallelisation, important for any computer nowadays with so many cores available.

Finishing up, Andrey points us to the Github repository, lets us know the development statement (as of November 2019) and looks at the speed increases that have taken off, comparing SVT-AV1 against the reference libaom encoder.

Andrey Norkin
Senior Research Scientist, Netflix
LiWei Guo
Senior Software Engineer, Netflix