Video: Per-Title Encoding in the Wild

How deep do you want to go to make sure viewers get the absolute best quality streamed video? It’s been common over the past few years not to just choose 7 bitrates for a streamed service and encode everything to those bitrates. Rather to at least vary the bitrate for each video. In this talk we examine why doing this is leaving bitrate savings on the table which, in turn, means bitrate savings for your viewers, faster time-to-play and an overall better experience.

Jan Ozer starts with a look at the evolution of bitrate optimisation. It started with Beamr and, everyone’s favourite, FFmpeg. Both of which re-encode every frame until they get the best quality. FFmpeg’s CRF mode will change the quantizer parameter for each frame to maintain the same quality throughout the whole file, though with a variable bitrate. Beamr would encode each frame repeatedly reducing the bitrate until it got the desired quality. These worked well but missed out on a big trick…

Over the years, it’s been clear that sometimes 720p at 1Mbps looks better than 1080p at 1Mbps. This isn’t always the case and depends on the source footage. Much rolling news will be different from premium sports content in terms of sharpness and temporal content. So, really, the resolution needs to be assessed alongside data rate. This idea was brought into Netflix’s idea of per-title encoding. By re-encoding a title hundreds of times with different resolutions and data rates, they were able to determine the ‘convex hull’ which is a graph showing the optimum balance between quality, bitrate and resolution. That was back in 2015. Moving beyond that, we’ve started to consider more factors.

The next evolution is fairly obvious really, and that’s to make these evaluations not for each video, but for each shot. Doing this, Jan explains, offers bitrate improvements of 28% for AVC and more for other codecs. This is more complex than per-title because the stream itself changes, for instance, GOP sizes, so whilst we know this is something Netflix is using, there are no available commercial implementations currently.

Pushing these ideas further, perhaps the streaming service should take into account the device on which you are viewing. Some TV’s typically only ever take the top two rungs on the ladder, yet many mobile devices have low-resolutions screens and never get around to pulling the higher bitrates. So profiling a device based on either its model or historic activity can allow you to offer different ABR ladders to allow for a better experience.

All of this needs to be enabled by automatic, objective metrics so the metrics need to look out for the right aspects of the video. Jan explains that PSNR and MS-SSIM, though tried and trusted in the industry, only measure spatial information. Jan gives an overview of the alternatives. VMAF, he says, ads a detail loss metric, but it’s not until we start using PW-SSIM from Bright cove where aspects such as device information is taken into account. SSIMPLUS does this and also considers wide colour gamut HDR and frame rates. Similarly ATEME’s ‘Quality Vector’ considers frame rate and HDR.

Dr. Abdul Rehman follows Jan with his introduction to SSIMWAVE’s technologies and focuses on their ability to understand what quality the viewer will see. This allows a provider to choose whether to deliver a quality of ’70’ or, say, ’80’. Each service is different and the demographics will expect different things. It’s important to meet viewer expectations to avoid churn, but it’s in everyone’s interest to keep the data rate as low as possible.

Abdul gives the example of banding which is something that is not easily picked up by many metrics and so can be introduced as the encode optimiser continues to reduce the bitrate oblivious to the obvious banding. He says that since SSIMPLUS is not referenced to a source, this can give an accurate viewer score no matter the source material. Remember that if you use PSNR, you are comparing against your source. If the source is poor, your PSNR score might end up close to the maximum. The trouble is, your viewers will still see the poor video you send them, not caring if this is due to encoding or a bad source.

The video ends with a Q&A.

Watch now!
Speakers

Jan Ozer Jan Ozer
Principal, Stremaing Learning Center
Contributing Editor, Streaming Media
Abdul Rehman Abdul Rehman
CEO,
SSIMMWAVE

Video: Machine Learning for Per-title Encoding

AI’s continues its march into streaming with this new approach to optimising encoder settings to keep down the bitrate and improve quality for viewers. By its more appropriate name, ‘machine learning’, computers learn how to characterise video to avoid hundreds of encodes whilst determining the best way to encode video assets.

Daniel Silhavy from Fraunhofer FOKUS takes the stand at Mile High Video 2020 to detail the latest technique in per-title and per-scene encoding. Daniel starts by outlining the problem with fixed ABR which is that efficiencies are gained by being flexible both with resolution and with bitrate.

Netflix were the best-known pioneers of the per-title encoding idea where, for each different video asset, many, many encodes are done to determine the best overall bitrate to choose. This is great because it will provide for animation-based files to be treated differently than action films or sports. Efficiency is gained.

However, per-title delivers an average benefit. There are still parts of the video which are simple and could see reduced bitrate and arts where complexity isn’t accounted for. When bitrate is higher than necessary to achieve a certain VMAF score, Danel calls this ‘wasted quality’. This means bitrate was used making the quality better than we needed it to be. Whilst better quality sounds like a boon, it’s not always possible for it to be seen, hence having a target VMAF at a lower level.

Naturally, rather than varying the resolution mix and bitrate for each file, it would be better to do it for each scene. Working this way, variations in complexity can be quickly accounted for. This can also be done without machine learning, but more encodes are needed. The rest of the talk looks at using machine learning to take a short-cut through some of that complexity.

The standard workflow is to perform a complexity analysis on the video, working out a VMAF score at various bitrate and resolution combinations. This produces a ‘Convex hull estimation’ allowing determination of the best parameters which then feed in to the production encoding stage.

Machine learning can replace the section which predicts the best bitrate-resolution pairs. Fed with some details on complexity, it can avoid multiple encodes and deliver a list of parameters to the encoding stage. Moreover, it can also receive feedback from the player allowing further optimisation of this prediction module.

Daniel shows a demo of this working where we see that the end result has fewer rungs on the ABS ladder, a lower-resolution top rung and fewer resolutions in general, some repeated at different bitrates. This is in common with the findings of Facebook which we covered last week who found that if they removed their ‘one bitrate per resolution rule’ they could improve viewers’ experience. In total, for an example Fraunhofer received from a customer, they saw a 53% reduction in storage needed.

Watch now!
Download the slides
Speakers

Daniel Silhavy
Scientist & Project Manager,
Fraunhofer FOKUS

Video: What to do after per-title encoding

Per-title encoding is a common method of optimising quality and compression by changing the encoding options on a file-by-file basis. Although some would say the start of per-scene encoding is the death knell for per-title encoding, either is much better than the more traditional plan of applying exactly the same settings to each video.

This talk with Mux’s Nick Chadwick and Ben Dodson looks at what per-title encoding is and how to go about doing it. The initial work involves doing many encodes of the same video and analysing each for quality. This allows you to out which resolutions and bitrates to encode at and how to deliver the best video.

Ben Dodson explains the way they implemented this at Mux using machine learning. This was done by getting computers to ‘watch’ videos and extract metadata. That metadata can then be used to inform the encoding parameters without the computer watching the whole of a new video.

Nick takes some time to explain MUX’s ‘convex hulls’ which give a shape to the content’s performance at different bitrates and helps visualise the optimum encoding parameters the content. Moreover, we see that using this technique, we can explore how to change resolution to create the best encode. This doesn’t always mean reducing the resolution; there are some surprising circumstances when it makes sense to start at high resolutions, even for low bitrates.

The next stage after per-title encoding is to segment the video and encode each segment differently which Nick explores and explains how to deliver different resolutions throughout the stream seamlessly switching between them. Ben takes over and explains how this can be implemented and how to chose the segment boundaries correctly, again, using a machine learning approach to analysis and decision making.

Watch now!
Speakers

Nick Chadwick Nick Chadwick
Software Engineer,
Mux
Ben Dodson Ben Dodson
Data Scientist,
Mux

Video: Per-Title Encoding, @Scale Conference

Per-title encoding with machine learning is the topic of this video from MUX.

Nick Chadwick explains that rather than using the same set of parameters to encode every video, the smart money is to find the best balance of bitrate and resolution for each video. By analysing a large number of combinations of bitrate and resolution, Nick shows you can build what he calls a ‘convex hull’ when graphing against quality. This allows you to find the optimal settings.

Doing this en mass is difficult, and Nick spends some time looking at the different ways of implementing it. In the end, Nick and data scientist Ben Dodson built a system which optimses bitrate for each title using neural nets trained on data sets. This resulted in 84% of videos looking better using this method rather than a static ladder.

Watch now!
Speaker

Nick Chadwick Nick Chadwick
Software Engineer,
Mux