Video: Real-time AV1 in WebRTC

AV1 seems to be shaking off its reputation for slow encoding, now only 2x slower than HEVC. How practical, then is it to put AV1 into a real-time codec aiming for sub-second latency? This is exactly what the Alliance for Open Media are working on as parts of AV1 are perfectly suited for the use case.

Dr Alex from CoSMo Software took the podium at the Alliance for Open Media Research Symposium to lay out the whys and wherefores of updating WebRTC to deliver AV1. He started by outlining the different requirements of real-time vs VoD. With non-live content, encoding time is often unrestricted allowing for complex encoding methods to achieve lower bitrates. Even live CMAF streams aiming to achieve a relatively low 3-second latency have time enough for much more complex encoding than real-time. Encoding, ingest, storage and delivery can all be separated into different parts of the workflow for VoD, whereas real-time is forced to collapse logical blocks down as much as possible. Unsurprisingly, Dr Alex outlines latency as the most important driver in the WebRTC use case.

When streaming, ABR isn’t quite as simple as with chunked formats. The different bit rate streams need to be generated at the encoder to save any transcoding delays. There are two ways of delivering these streams. One is to deliver them as separate streams, the other is to deliver only one, layered stream. The latter method is known as Scalable Video Coding (SVC) which sends a base layer of a low-resolution version of the video which can be decoded on its own. Within that stream, is also the information which builds on top of that video to create a higher-resolution version of the same stream. You can have multiple layers and hence provide information for 3, 4 or more streams.

Managing which streams get to the decoder is done through an SFU (Selective Forwarding Unit) which is a server to which WebRTC clients connect to receive just the stream, or parts of a stream, they need for their current bandwidth capability. It’s important to remember that compared to video conferencing solutions based on WebRTC, that streaming using WebRTC scales linearly. Whilst it’s difficult to hold a meeting with 50 people in a room, it’s possible to optimise what video is sent to everyone by only showing the last 5 speakers in full resolution, the others as thumbnails. Such optimisations are not available for video distribution, rather SFUs and media servers need to be scaled and cascaded. This should be simple, but testing can be difficult but it’s necessary to ensure quality and network resilience at scale.

Cisco have already demonstrated the first real-time AV1-based WebRTC system, though without SVC support. Work is ongoing to deliver improvements to RTP encapsulation of AV1 in WebRTC. For instance, providing Decoding Target Information which embeds information about frames without needing to decode the video itself. This information explains how important each frame is and how it relates to the other video. Such metadata can be used by the SFU or the decoder to understand which frames to drop and send/decode.

Watch now!
Download the slides
Speaker

Alex Gouaillard Dr Alex Gouaillard
Video Codec Working Group – Real-time subgroup, Allience for Open Media
Founder, Directory & CEO, CoSMo Software Consulting Pte. Ltd.
Co-founder & CTO, Millicast

Video: AV1 – A Reality Check

Released in 2018, AV1 had been a little over two years in the making at the Alliance of Open Media founded by industry giants including Google, Amazon, Mozilla, Netflix. Since then work has continued to optimise the toolset to bring both encoding and decoding down to real-world levels.

This talk brings together AOM members Mozilla, Netflix, Vimeo and Bitmovin to discus where AV1’s up to and to answer questions from the audience. After some introductions, the conversation turns to 8K. The Olympics are the broadcast industry’s main driver for 8K at the moment, though it’s clear that Japan and other territories aim to follow through with further deployments and uses.

“AV1 is the 8K codec of choice” 

Paul MacDougall, Bitmovin
 CES 2020 saw a number of announcements like this from Samsung regarding AV1-enabled 8K TVs. In this talk from Google, Matt Frost from Google Chrome Media explains how YouTube has found that viewer retention is higher with VP9-delivered videos which he attributes to VP9’s improved compression over AVC which leads to quicker start times, less buffering and, often, a higher resolution being delivered to the user. AV1 is seen as providing these same benefits over AVC without the patent problems that come with HEVC.

 
It’s not all about resolution, however, points out Paul MacDougall from BitMovin. Resolution can be useful, for instance in animations. For animated content, resolution is worth having because it accentuates the lines which add intelligibility to the picture. For some content, with many similar textures, grass, for instance, then quality through bitrate may be more useful than adding resolution. Vittorio Giovara from Vimeo agrees, pointing out that viewer experience is a combination of many factors. Though it’s trivial to say that a high-resolution screen of unintended black makes for a bad experience, it is a great reminder of things that matter. Less obviously, Vittorio highlights the three pillars of spatial, temporal and spectral quality. Temporal refers to upping the bitrate, spatial is, indeed, the resolution and spectral refers to bit-depth and colour-depth know as HDR and Wide Colour Gamut (WCG).

Nathan Egge from Mozilla acknowledges that in their 2018 code release at NAB, the unoptimized encoder which was claimed by some to be 3000 times slower than HEVC, was ’embarrassing’, but this is the price of developing in the open. The panel discusses the fact that the idea of developing compression is to try out approaches until you find a combination that work well. While you are doing that, it would be a false economy to be constantly optimising. Moreover, Netflix’s Anush Moorthy points out, it’s a different set of skills and, therefore, a different set of people who optimise the algorithms.

Questions fielded by the panel cover whether there are any attempts to put AV1 encoding or decoding into GPU. Power consumption and whether TVs will have hardware or software AV1 decoding. Current in-production AV1 uses and AVC vs VVC (compression benefit Vs. royalty payments).

Watch now!
Speakers

Vittorio Giovara Vittorio Giovara
Manager, Engineering – Video Technology
Vimeo
Nathan Egge Nathan Egge
Video Codec Engineer,
Mozilla
Paul MacDougall Paul MacDougall
Principal Sales Engineer,
Bitmovin
Anush Moorthy Anush Moorthy
Manager, Video and Image Encoding
Netflix
Tim Siglin Tim Siglin
Founding Executive Director
Help Me Stream, USA

Video: S-Frame in AV1: Enabling better compression for low latency live streaming.

Streaming is such a success because it manages to deliver video even as your network capacity varies while you are watching. Called ABR (Adaptive Bitrate), this short talk asks how we can allow low-latency streams to nimbly adapt to network conditions whilst keeping the bitrate low in the new AV1 codec.

Tarek Amara from Twitch explains the idea in AV1 of introducing S-Frames, sometimes called ‘switch frames’, which take the role of the more traditional I or IDR frames. If a frame is marked as an IDR frame, this means the decoder knows it can start decoding from this frame without worrying that it’s referencing some data that came before this frame. By doing this, you can allow frequent points at which a decoder can enter a stream. IDR frames are typically I frames which are the highest bandwidth frames, by a large proportion. This is because they are a complete rendition of a frame without any of the predictions you find in P and B frames.

Because IDR frames are so large, if you want to keep overall bandwidth down, you should reduce the number of them. However, reducing the number of frames reduces the number if ‘in points’ for for the stream meaning a decoder then has to wait longer before it can start displaying the stream to the viewer. An S-Frame brings the benefits of an IDR in that it still marks a place in the stream where the decoder can join, free of dependencies on data previously sent. But the S-Frame is takes up much less space.

Tarek looks at how an S-Frame is created, the parameters it needs to obey and explains how the frames are signalled. To finish off he presents tests run showing the bitrate improvements that were demonstrated.
Watch now!
Speaker

Tarek Amara Tarek Amara
Engineering Manager, Video Encoding,
Twitch

Video: Let’s be hAV1ng you

AV1 is now in use for some YouTube feeds, Netflix also can deliver AV1 to Android devices so we are no longer talking about “if AV1 happens” or “when AV1’s finished”. AV1 is here to stay, but in a landscape of 3 new MPEG codecs, VVC, EVC and LCEVC, the question moves to “when is AV1 the right move?”

In this talk from Derek Buitenhuis, we delve behind the scenes of AV1 to see which AV1 terms can be, more-or-less, mapped to which MPEG terms. AV1 is promoted as a royalty-free codec, although notably a patent pool has appeared to try and claim money from users. Because it’s not reusing ideas from other technologies, the names and specific functions of parts of the spec are both not identical to other codecs, but are similar in function.

Derek starts by outlining some of the terms we need to understand before delving in further such as “Temporal Unit” which of course is called a TU and is analogous to a GOP. Then he moves on to highlight the many ways in which previous DCT-style work has been extended meaning the sizes and types of DCT have been increased, and the prediction modes have changed. All of this is possible but increases computation.

Derek then highlights several major tools which have been added. One is the prediction of the Chroma from the Luma signal. Another is the ‘Constrained Direction Enhancement Filter’ which improves the look of diagonal hard edges. The third is ‘switch frames’ which are similar to IDR frames or, as Derek puts it ‘a fancy P-frame.’ There is also a Multi-Symbolic Arithmetic Codec which is a method of guessing a future binary digit which, based on probability, allows you to encode a subset of the number but just enough to ensure that the algorithm will come out with the full number,

After talking about the Loop Restoration Filter Derek then critiques a BBC article which drew, it seems, incorrect conclusions based on not enabling the appropriate functions needed for good compression and also suggesting that there was not enough information provided for anyone else to replicate the experiment. Derek then finishes with MS-SIM plots of different encoders.

Watch now!
Download the slides.
Speaker

Derek Buitenhuis Derek Buitenhuis
Senior Video Encoding Engineer,
Vimeo