VMAF Archives – The Broadcast Knowledge

Video: The early days of Netflix Streaming Days and Perspective

Posted on 9th July 2021 by Russell Trafford-Jones

David Ronca has had a long history in the industry and is most known for his time at Netflix where he was pivotal in the inception and implementation of many technologies. Because Netflix was one of the first companies streaming video on the internet, and at a global scale, they are responsible for many innovations that the industry as a whole benefits from today and are the recipient of 7 technical Emmys. David is often pictured holding an Emmy awarded to Netflix for their role in the standardisation and promotion of Japanese subtitles one of the less-talked-about innovations in contrast to VMAF, Per-Title encoding and per-shot encoding.

In this video, talking to John Porterfield, David talks about the early days at Netflix when it was pivoting from emailing DVDs to streaming. He talks about the move from Windows-based applications to cross-platform technologies, at the time Microsoft Silverlight which was a big direction shift for Netflix and for him. The first Silverlight implementation within Netflix was also the first adaptive bitrate (ABR) version of Netflix which is where David found he next calling within Netflix writing code to synchronise the segments after DRM.

The PS3, David recalls, was the worlds most powerful Blu-ray player and part of the Blu-ray spec is a Java implementation. David recounts the six months he spent in a team of three working to implement a full adaptive bitrate streaming application within Blu-ray’s Java implementation. This was done in order to get around some contractual issues and worked by extending the features which were built into Blu-ray for downloading new trailers to show instead of those on disc. This YouTube review from 2009 shows a slick interface slowed down by the speed of the internet connection.

David also talks about his close work with and respect for Netflix colleague Anne Aaron who has been featured previously on The Broadcast Knowledge. He goes on to talk about the inception of VMAF which is a metric for computationally determining the quality of video developed by Netflix as they didn’t feel that any of the current metrics such as PSRN and MS-SSIM captured the human opinion of video well enough. It’s widely understood that PSNR has its place but can give very different results to subjective evaluations. And, indeed, VMAF also is not perfect as David mentions. However, using VMAF well and understanding its limits results in a much more accurate description of quality than with many other metrics and unlike competing metrics such as SSIMWAVE’s SSIMPLUS, is open source and royalty-free.

David concludes his talk with John saying that high-quality, well-delivered streaming is now everywhere. The struggles of the early years have resulted in a lot of well-learned lessons by the industry at large. This commoditisation is welcome and shows a maturity in the industry that begs the question about where the puck is going to next. For David, he sees environmental sustainability to be one of the key goals. Both environmentally and financially, he says that streaming providers will now want to maximise the output-per-watt of their data centres. Data centre power is currently 3% of all global power consumption and is forecast to reach up to 20%. Looking to newer codecs is one way to achieve a reduction in power consumption. David spoke about AV1 last time he spoke with John which delivers lower bitrate with high computation requirements. At hyperscale, using dedicated ASIC chips to do the encoding is one way to drive down power consumption. An alternative route is new MPEG codec LCEVC which delivers better-than-AVC performance in software at much-reduced power consumption. With the prevalence of video – both for entertainment and outside, for example, body cams – moving to more power-efficient codecs and codec implementations seems the obvious and moral move.

Watch now!
Speakers

	David Ronca Director, Video Encoding, Facebook
	Freelance Video Webcast Producer and Tech Evangelist JP’sChalkTalks YouTube Channel

Video: Per-Title Encoding in the Wild

Posted on 5th March 2021 by Russell Trafford-Jones

How deep do you want to go to make sure viewers get the absolute best quality streamed video? It’s been common over the past few years not to just choose 7 bitrates for a streamed service and encode everything to those bitrates. Rather to at least vary the bitrate for each video. In this talk we examine why doing this is leaving bitrate savings on the table which, in turn, means bitrate savings for your viewers, faster time-to-play and an overall better experience.

Jan Ozer starts with a look at the evolution of bitrate optimisation. It started with Beamr and, everyone’s favourite, FFmpeg. Both of which re-encode every frame until they get the best quality. FFmpeg’s CRF mode will change the quantizer parameter for each frame to maintain the same quality throughout the whole file, though with a variable bitrate. Beamr would encode each frame repeatedly reducing the bitrate until it got the desired quality. These worked well but missed out on a big trick…

Over the years, it’s been clear that sometimes 720p at 1Mbps looks better than 1080p at 1Mbps. This isn’t always the case and depends on the source footage. Much rolling news will be different from premium sports content in terms of sharpness and temporal content. So, really, the resolution needs to be assessed alongside data rate. This idea was brought into Netflix’s idea of per-title encoding. By re-encoding a title hundreds of times with different resolutions and data rates, they were able to determine the ‘convex hull’ which is a graph showing the optimum balance between quality, bitrate and resolution. That was back in 2015. Moving beyond that, we’ve started to consider more factors.

The next evolution is fairly obvious really, and that’s to make these evaluations not for each video, but for each shot. Doing this, Jan explains, offers bitrate improvements of 28% for AVC and more for other codecs. This is more complex than per-title because the stream itself changes, for instance, GOP sizes, so whilst we know this is something Netflix is using, there are no available commercial implementations currently.

Pushing these ideas further, perhaps the streaming service should take into account the device on which you are viewing. Some TV’s typically only ever take the top two rungs on the ladder, yet many mobile devices have low-resolutions screens and never get around to pulling the higher bitrates. So profiling a device based on either its model or historic activity can allow you to offer different ABR ladders to allow for a better experience.

All of this needs to be enabled by automatic, objective metrics so the metrics need to look out for the right aspects of the video. Jan explains that PSNR and MS-SSIM, though tried and trusted in the industry, only measure spatial information. Jan gives an overview of the alternatives. VMAF, he says, ads a detail loss metric, but it’s not until we start using PW-SSIM from Bright cove where aspects such as device information is taken into account. SSIMPLUS does this and also considers wide colour gamut HDR and frame rates. Similarly ATEME’s ‘Quality Vector’ considers frame rate and HDR.

Dr. Abdul Rehman follows Jan with his introduction to SSIMWAVE’s technologies and focuses on their ability to understand what quality the viewer will see. This allows a provider to choose whether to deliver a quality of ’70’ or, say, ’80’. Each service is different and the demographics will expect different things. It’s important to meet viewer expectations to avoid churn, but it’s in everyone’s interest to keep the data rate as low as possible.

Abdul gives the example of banding which is something that is not easily picked up by many metrics and so can be introduced as the encode optimiser continues to reduce the bitrate oblivious to the obvious banding. He says that since SSIMPLUS is not referenced to a source, this can give an accurate viewer score no matter the source material. Remember that if you use PSNR, you are comparing against your source. If the source is poor, your PSNR score might end up close to the maximum. The trouble is, your viewers will still see the poor video you send them, not caring if this is due to encoding or a bad source.

The video ends with a Q&A.

Watch now!
Speakers

	Jan Ozer Principal, Stremaing Learning Center Contributing Editor, Streaming Media
	Abdul Rehman CEO, SSIMMWAVE

Video:Measuring Video Quality with VMAF – Why You Should Care

Posted on 19th November 2020 by Russell Trafford-Jones

VMAF, from Netflix, has become a popular tool for evaluating video quality since its launch as an Open Source project in 2017. Coming out of research from the University of Southern California and The University of Texas at Austin, it’s seen as one of the leading ways to automate video assessment.

Netflix’s Christos Bampis gives us a brief overview of VMAF’s origins and its aims. VMAF came about because other metrics such as MS-SSIM and, in particular, PSNR aren’t close enough indicators of quality. Indeed, Christos shows that when it comes to animated content (i.e. anime and cartoons) subjective scores can be very high, but if we look at the PSNR score it can be the same as the PSNR of score another live-action video clip which humans rate a lot lower, subjectively. Moreover, in less extreme examples, Christos explains. PSNR is often 5% or so away from the actual subjective score in either direction.

Source, Netflix/Alliance for Online Media

To a simple approximation, VMAF is a method of bringing out the spatial and temporal information from a video frame in a way which emphasises the types of things humans are attuned to such as contrast masking. Christos shows an example of a picture where artefacts in the trees are much harder to see than similar artefacts on a colour gradient such as a sky or still water. These extraction methods take account of situations like this and are then fed into a trained model which matches the results of the model with the numbers that humans would have given it. The idea being that when trained on many examples, it can correctly predict a human’s score given a set of data extracted from a picture. Christos shows examples of how well VMAF out-performs PSNR in gauging video quality.

Challenges are in focus in the second half of the talk. What are the things which still need working on to improve VMAF? Christos zooms in on two: design dimensionality and noise. By design dimensionality, he means how can VMAF be extended to be more general, delivering a number which has a consistent meaning in different scenarios? As the VMAF model has been trained on AVC, how can we deal with different artefacts which are seen with different codecs? Do we need a new model for HDR content instead of SDR and how should viewing conditions, whether ambient light or resolution and size of the display device, be brought into the metric? The second challenge Christos highlights is noise as he reveals VMAF tends to give lower scores than it should to noisy sources. Codecs like AV1 have film-grain synthesis tools and these need to be evaluated, so behaving correctly in the presence of video noise is important.

The talk finishes with Christos outlining that VMAF’s applicability to the industry is only increasing with new codecs coming out such as LCEVC, VCC, AV1 and more – such diversity in the codec ecosystem wasn’t an obvious prediction in 2014 when the initial research work was started. Christos underlines the fact that VMAF is a continually evolving metric which is Open Source and open to contributions. The Q&A covers failure cases, super-resolution and how to interpret close-call results which are only 1% different.

Watch now!
Download the presentation
Speaker

Christos Bampis
Senior Software Engineer,
Netflix

Video: Super Resolution – The scaler of tomorrow, here today!

Posted on 31st March 2020 by Russell Trafford-Jones

If we ever had a time when most displays were the same resolution, those days are long gone with smartphone and tablets with extremely high pixel density nestled in with laptop screens of various resolutions and 1080-line TVs which are gradually being replaced with UHD variants. This means that HD videos are nearly always being upscaled which makes ‘getting upscaling right’ a really worthwhile topic. The well-known basic up/downscaling algorithms have been around for a while, and even the best-performing Lanczos is well over 20 years old. The ‘new kid on the block’ isn’t another algorithm, it’s a whole technique of inferring better upscaling using machine learning called ‘super resolution’.

Nick Chadwick from Mux has been running the code and the numbers to see how well super resolution works. Taking to the stage at Demuxed SF, he starts by looking at where scaling is used and what type it is. The most common algorithms are nearest neighbour, bi-cubic, bi-linear and lanczos with nearest neighbour being the most basic and least-well performing. Nick shows, using VMAF that using these for up and downscaling, that the traditional opinions of how well these algorithms perform are valid. He then introduces some test videos which are designed to let you see whether your video path is using bi-linear or bi-cubic upscaling, presenting his results of when bi-cubic can be seen (Safari on a MacBook Pro) as opposed to bi-linear (Chrome on a MacBook Pro). The test videos are available here.

In the next part of the talk, Nick digs a little deeper into how super resolution works and how he tested ffmpeg’s implementation of super resolution. Though he hit some difficulties in using this young filter, he is able to present some videos and shows that they are, indeed, “better to view” meaning that the text looks sharper and is easier to see with details being more easy pick out. It’s certainly possible to see some extra speckling introduced by the process, but VMAF score is around 10 points higher matching with the subjective experience.

The downsides are a very significant increase in computational power needed which limits its use in live applications plus there is a need for good, if not very good, understanding of ML concepts and coding. And, of course, it wouldn’t be the online streaming community if clients weren’t already being developed to do super-resolution on the decode despite most devices not being practically capable of it. So Nick finishes off his talk discussing what’s in progress and papers relating to the implementation of super resolution and what it can borrow from other developing technologies.

Watch now!
Speaker

Nick Chadwick
Software Engineer,
Mux

Subscribe to get daily updates