Video: Decoder Complexity Aware AV1 Encoding Optimization

AV1’s been famous for very low encoding speed, but as we’ve seen from panel like this, AV1 encoding times have dropped into a practical range and it’s starting to gain traction. Zoe Liu, CEO of Visionular, is here to talk at Mile High Video 2020 about how careful use of encoding parameters can deliver faster encodes, smooth decodes, and yet balance that balance with codec efficiency.

Zoe starts by outlining the good work that’s been done with the SVT-AV1 encoder which leaves it ready for deployment, as we heard previously from David Ronca of Facebook. Similarly the Dav1d decoder has recently made many speed improvements, now being able to easily decode 24fps on mobiles using between 1.5 and 3 Snapdragon cores depending on resolution. Power consumption has been measured as higher than AVC decoding but less than HEVC. Further to that, hardware support is arriving in many devices like TVs.

Zoe then continues to show ways in which encoding can be sped up by reducing the calculations done which, in turn, increased decoder speed. Zoe’s work has exposed settings that significantly speed up decoding but have very little effect on the compression efficiency of the codec which opens up use cases where decoding was the blocker and a 5% reduction in the ability to compress is a price worth paying. One example cited is ignoring partition sizes of less than 8×8. These small partitions can be numerous and bog down calculations but their overall contribution to bitrate reduction is very low.

All of these techniques are brought together under the heading of Decoder Complexity Aware AV1 Encoding Optimization which, Zoe explains, can result in an encoding speed-up of over two times the original framerate i.e. twice real-time on an Intel i5. Zoe concludes that this creates a great opportunity to apply AV1 to VOD use cases.

Watch now!
Speaker

Zoe Liu Zoe Liu
CEO,
Visionular

Video: Video Vectorisation

Yesterday we learnt about machine learning improving VVC. But VVC has a fundamental property which limits its ability to compress: it’s raster-based. Vector graphics are infinitely scalable with no loss of quality and are very efficient. Instead of describing 100 individual pixels, you can just define a line 100 pixels long. This video introduces a vector-based video codec which dramatically reduces bitrate.

Sam Bhattacharyya from Vectorly introduces this technique which uses SVG graphics, a well-established graphics standard available in all major web browsers. It describes shapes with XML and is similar to WebGL. The once universal Adobe Flash was able to animate SVG shapes as part of its distinctive ‘flash animations’. The new aspect here is not to start with SVG shapes and animate them, but to create those shapes from video footage and recreate that same video but with vectors.

Sam isn’t shy to acknowledge that video vectorisation is a technique which works well on animation with solid colours; Peppa Pig being the example shown. But on more complex imagery without solid colours and sharp lines, this technique doesn’t result in useful compression. To deal with shaded animation, he explains a technique of using mesh gradients and diffusion curves to represent gradually changing colours and shades. Sam is interested in exploring a hybrid mode whereby traditional video had graphics overlayed using this low-bandwidth vector-based codec.

The technique uses machine learning/AI techniques to identify the shapes, track them and to put them in to keyframes. The codec plays this back by interpolating the motion. This can produce files playable at HD of only 100kbps. For the right content, this can be a great option given it’s based on established standards, is low bitrate and can be hardware accelerated.

Sam’s looking for interest from the community at large to help move this work forward.

Watch now!
Speaker

Sam Bhattacharyya Sam Bhattacharyya
CEO, Co-founder
Vectorly

Video: S-Frame in AV1: Enabling better compression for low latency live streaming.

Streaming is such a success because it manages to deliver video even as your network capacity varies while you are watching. Called ABR (Adaptive Bitrate), this short talk asks how we can allow low-latency streams to nimbly adapt to network conditions whilst keeping the bitrate low in the new AV1 codec.

Tarek Amara from Twitch explains the idea in AV1 of introducing S-Frames, sometimes called ‘switch frames’, which take the role of the more traditional I or IDR frames. If a frame is marked as an IDR frame, this means the decoder knows it can start decoding from this frame without worrying that it’s referencing some data that came before this frame. By doing this, you can allow frequent points at which a decoder can enter a stream. IDR frames are typically I frames which are the highest bandwidth frames, by a large proportion. This is because they are a complete rendition of a frame without any of the predictions you find in P and B frames.

Because IDR frames are so large, if you want to keep overall bandwidth down, you should reduce the number of them. However, reducing the number of frames reduces the number if ‘in points’ for for the stream meaning a decoder then has to wait longer before it can start displaying the stream to the viewer. An S-Frame brings the benefits of an IDR in that it still marks a place in the stream where the decoder can join, free of dependencies on data previously sent. But the S-Frame is takes up much less space.

Tarek looks at how an S-Frame is created, the parameters it needs to obey and explains how the frames are signalled. To finish off he presents tests run showing the bitrate improvements that were demonstrated.
Watch now!
Speaker

Tarek Amara Tarek Amara
Engineering Manager, Video Encoding,
Twitch

Video: Super Resolution – The scaler of tomorrow, here today!

If we ever had a time when most displays were the same resolution, those days are long gone with smartphone and tablets with extremely high pixel density nestled in with laptop screens of various resolutions and 1080-line TVs which are gradually being replaced with UHD variants. This means that HD videos are nearly always being upscaled which makes ‘getting upscaling right’ a really worthwhile topic. The well-known basic up/downscaling algorithms have been around for a while, and even the best-performing Lanczos is well over 20 years old. The ‘new kid on the block’ isn’t another algorithm, it’s a whole technique of inferring better upscaling using machine learning called ‘super resolution’.

Nick Chadwick from Mux has been running the code and the numbers to see how well super resolution works. Taking to the stage at Demuxed SF, he starts by looking at where scaling is used and what type it is. The most common algorithms are nearest neighbour, bi-cubic, bi-linear and lanczos with nearest neighbour being the most basic and least-well performing. Nick shows, using VMAF that using these for up and downscaling, that the traditional opinions of how well these algorithms perform are valid. He then introduces some test videos which are designed to let you see whether your video path is using bi-linear or bi-cubic upscaling, presenting his results of when bi-cubic can be seen (Safari on a MacBook Pro) as opposed to bi-linear (Chrome on a MacBook Pro). The test videos are available here.

In the next part of the talk, Nick digs a little deeper into how super resolution works and how he tested ffmpeg’s implementation of super resolution. Though he hit some difficulties in using this young filter, he is able to present some videos and shows that they are, indeed, “better to view” meaning that the text looks sharper and is easier to see with details being more easy pick out. It’s certainly possible to see some extra speckling introduced by the process, but VMAF score is around 10 points higher matching with the subjective experience.

The downsides are a very significant increase in computational power needed which limits its use in live applications plus there is a need for good, if not very good, understanding of ML concepts and coding. And, of course, it wouldn’t be the online streaming community if clients weren’t already being developed to do super-resolution on the decode despite most devices not being practically capable of it. So Nick finishes off his talk discussing what’s in progress and papers relating to the implementation of super resolution and what it can borrow from other developing technologies.

Watch now!
Speaker

Nick Chadwick Nick Chadwick
Software Engineer,
Mux