Video: AV1 Commercial Readiness Panel

With two years of development and deployments under its belt, AV1 is still emerging on to the codec scene. That’s not to say that it’s no in use billions of times a year, but compared to the incumbents, there’s still some distance to go. Known as very slow to encode and computationally impractical, today’s panel is here to say that’s old news and AV1 is now a real-time codec.

Brought together by Jill Boyce with Intel, we hear from Amazon, Facebook, Googles, Amazon, Twitch, Netflix and Tencent in this panel. Intel and Netflix have been collaborating on the SVT-AV1 encoder and decoder framework for two years. The SVT-AV1 encoder’s goal was to be a high-performance and scalable encoder and decoder, using parallelisation to achieve this aim.

Yueshi Shen from Amazon and Twitch is first to present, explaining that for them, AV1 is a key technology in the 5G area. They have put together a 1440p, 120fps games demo which has been enabled by AV1. They feel that this resolution and framerate will be a critical feature for Twitch in the next two years as computer games increasingly extend beyond typical broadcast boundaries. Another key feature is achieving an end-to-end latency of 1.5 seconds which, he says, will partly be achieved using AV1. His company has been working with SOC vendors to accelerate the adoption of AV1 decoders as their proliferation is key to a successful transition to AV1 across the board. Simultaneously, AWS has been adding AV1 capability to MediaConvert and is planning to continue AV1 integration in other turnkey content solutions.

David Ronca from Facebook says that AV1 gives them the opportunity to reduce video egress bandwidth whilst also helping increase quality. For them, SVT-AV1 has brought using AV1 into the practical domain and they are able to run AV1 payloads in production as well as launch a large-scale decoder test across a large set of mobile devices.

Matt Frost represent’s Google Chrome and Android’s point of view on AV1. Early adopters, having been streaming partly using AV1 since 2018 in resolution small and large, they have recently added support in Duo, their Android video-conferencing application. As with all such services, the pandemic has shown how important they can be and how important it is that they can scale. Their move to AV1 streaming has had favourable results which is the start of the return on their investment in the technology.

Google’s involvement with the Alliance for Open Media (AOM), along with the other founding companies, was born out of a belief that in order to achieve the scales needed for video applications, the only sensible future was with cheap-to-deploy codecs, so it made a lot of sense to invest time in the royalty-free AV1.

Andrey Norkin from Netflix explains that they believe AV1 will bring a better experience to their members. Netflix has been using AV1 in streaming since February 2020 on android devices using a software decoder. This has allowed them to get better quality at lower bitrates than VP9 Testing AV1 on other platforms. Intent on only using 10-bit encodes across all devices, Andrey explains that this mode gives the best efficiency. As well as being founding members of AoM, Netflix has also developed AVIF which is an image format based on AV1. According to Andrey, they see better performance than most other formats out there. As AVIF works better with text on pictures than other formats, Netflix are intending to use it in their UI.

Tencent’s Shan Liu explains that they are part of the AoM because video compression is key for most Tencent businesses in their vast empire. Tencent cloud has already launched an AV1 transcoding service and support AV1 in VoD.

The panel discusses low-latency use of AV1, with Dave Ronca explaining that, with the performance improvements of the encoder and decoders along-side the ability to tune the decode speed of AV1 by turning on and off certain tools, real-time AV1 are now possible. Amazon is paying attention to low-end, sub $300 handsets, according to Yueshi, as they believe this will be where the most 5G growth will occur so site recent tests showing decoding AV1 in only 3.5 cores on a mobile SOC as encouraging as it’s standard to have 8 or more. They have now moved to researching battery life.

The panel finishes with a Q&A touching on encoding speed, the VVC and LCEVC codecs, the Sisvel AV1 patent pool, the next ramp-up in deployments and the roadmap for SVT-AV1.

Watch now!
Please note: After free registration, this video is located towards the bottom of the page
Speakers

Yueshi Shen Yueshi Shen
Principle Engineer
AWS & Twitch
David Ronca David Ronca
Video Infrastructure Team,
Facebook
Matt Frost Matt Frost
Product Manager, Chome Media Technologies,
Google
Andrey Norkin Andrey Norkin
Emerging Technologies Team
Netflix
Shan Liu Dr Shan Liu
Chief Scientist & General Manager,
Tencent Media Lab
Jill Boyce Jill Boyce
Intel

Video: Encoding Vs Compute Efficiency in Video Coding

Ioannis Katsavounidis from Facebook joins us to talk us through his work finding the best balance between computation and encoding. He explains how encoding has moved from real-time, hardware-based encoding in the late 80s and 1990s through to file encoding, chunk-based encoding and now shot-based encoding. Each of these stages has brought opportunities to speed up encoding, but there has always been a fundamental reason why encoding can’t simply be sped up by the advance of IT.

Moore’s law posits that every year, the number of transistors in chips doubles. Whilst this has continued to be true until recent years, transistors have always been a proxy for processing power. For many years now, the way to keep the computational ability of CPUs high has been not to increase clock-speed as it was twenty years ago, but to add cores to the chip. As each core acts as its own CPU, this gives the ability to execute code in parallel with a thread of code running separately on each core. Whilst 12-20 cores are typical for servers, there are CPUs which deliver up to 128 cores.

Ioannis explains why DCT-based codecs are resistant to multi-thread encoding by showing how some of the encoding decisions are based on the previously decoded video frame so the encoder needs to decode the video before it has the information it needs to make the next encode decisions. An example of this motion estimation where you need to understand what a macroblock looks like in order to detail if and how it can be moved to form part of the macroblock currently being encoded.

It turns out that some of the information you need to calculate can be found from the original video. Whilst this doesn’t provide full parallelisation, it does help in freeing some of the computation to be done in parallel thus reducing the length of time spent on the linear encoding stage. As the design of the codec itself is limited in its ability to be parallelised, the best way to speed up encoding has been to split up the original video and encode these, now separate, sections independently.

Speeding up video encoding has therefore focused on splitting up the video into different sections and encoding those in parallel rather than trying to parallelise the encoding itself due. Encoding each frame separately is one way to do this, but sacrifices encoding efficiency. Splitting each frame up into sections (tiles or slices) is another way, though this also sacrifices either quality or bitrate. The most successful encoding parallelisation has been chunked encoding. As streaming applications use chunks, typically around 2 seconds nowadays, there’s no reason not to just cut your video up into small sections and encode those separately; the whole of this video focuses on non-live video.

Direct link

If there’s a shot change in the middle of your chunk, this is likely to look very bad since the motion estimation will fail to produce good results and there may not be enough bitrate budget to compensate. Therefore it’s best to drop in an IDR frame at the shot change or to actually change your video chunks to match shot changes. Simply encoding these chunks in parallel would speed up the encoding, however, it misses an opportunity to optimise quality vs bitrate.

Ioannis explains an experiment to determine the best operating point for chunks. He does that by reminding us that all encoders have certain ‘speed’ settings which control how much computation, and therefore time, is required for each encode. The ‘very fast’ setting in x264 will encode at the highest speed possible, but the quality will be worse or a certain bitrate compared to the ‘very slow’ setting. Ioannis’s experiment encoded each chunk at every speed setting for a variety of resolutions and bitrates. Each encode was then analysed for quality using PSNR, MS-SSIM and VMAF.

From Ioannis’ work, we can see how the bitrate setting affects both the encode time and the quality and we can observe that the slower speeds tend to have minimal quality advantages for the significant extra time involved in the encoding. Each curve has a steep part and a shallow section with the transition between known as the ‘convex hull’. Choosing a setting on the convex hull portion of the line is the optimal balance between quality and encoding time and is where, says Ioannis, most people should aim to operate.

The talk finishes with a summary of the conclusions which can be drawn from this work looking at the use of convex-hull which we’ve just discussed, the best type of parallel processing, whether oversubscription of CPU cores is helpful or not and an interesting observation that it’s often the metrics which put a significant burden on encoding rather than the video encoding itself, particularly for lower resolutions.

Watch now!
Speakers

Ioannis Katsavounidis Ioannis Katsavounidis
Research Scientist,
Facebook

Video: TV Sport Innovation – Staying Ahead of the Game

Sports has always led innovation in many areas of broadcast, but during COVID not only did they have to adapt nearly every workflow and redeploy staff, but they then had to brace to deliver 100 games in 40 days. Gordon Roxburgh sums it up: “I’ve been at Sky twenty years, and I think [these have] been the most challenging six months…we’ve faced.”

In this session from the DTG’s Future Vision 2020 conference, Carl Hibbert from Futuresource Consulting talks to Sky, Arsenal TV and Facebook to find how their businesses have adapted. Melissa Lawton from Facebook explains how their live streaming, both for user-generated footage and produced sport have adapted to the changing needs. When COVID hit, Facebook lost some very valuable content. Their response was to double down on fan engagement, with challenges to fans to create content and also staging events which were produced and commentated as real sports events, but all shots were people at home exercising but being brought into the narrative of an Iron Man competition. Facebook have also invested in their user-facing tools and dashboards to help expose and monitor contribution via live streaming.

Gordon Roxburgh from Sky explains the seachange he’s seen in production. “The first thing was to keep channels on air…and keep staff safe.” They moved rapidly from a fully staffed office to just three or four people on-site and a presenter. In order to mix, they created a Virtual Production suite which allowed people to create content in the cloud.

For content, Gordon says that watch-alongs proved very popular where key sports personalities talk through what they were thinking during key sporting moments. This was just one of the many content ideas that keep programming going until “Project restart” commenced where the whole sports ecosystem asked itself ‘How can we deliver 100 games in 40 days?’ Once they knew the season would start, Gordon says, this opened up a 3-week build period during which BT Media and Broadcast, NEP, NEP Connect and multiple internal departments collaborated to produce rapid turnarounds.

“As an industry, we came together.” The working practices developed at Sky were shared with other major broadcasters who also shared their best practice – always putting staff first. Sky even went to the extent of building a technical space in a large studio floor to keep people apart and co-opted a set of training rooms to become a self-contained graphics unit. These ideas kept graphics operators together but not mixing with the rest of the production.

The view from Arsenal TV is explained by John Dollin. They worked quickly very early on and were able to be back in the office from February. Whilst Arsenal TV doesn’t have the rights to stream live, they produce their programmes live for transmission later. This used to be done in a crowded room but was soon transferred to a virtual mixer in the cloud with remote editors. John highlights the challenge of involving freelancers into the system and providing them with appropriate supervision. More importantly, he feels that their current ability to maintain the pre-covid production quality is due to the continued dedication of certain personnel who are putting in long hours which is not a sustainable situation to be in.

Watch now!
Free Registration
Speakers

Gordon Roxburgh Gordon Roxburgh
Technical Manager,
Sky Sports
Melissa Lawton Melissa Lawton
Live Sports Production Strategy,
Facebook
John Dollin John Dollin
Senior Product & Engineering Manager,
Arsenal Footballl Club
Carl Hibbert Carl Hibbert
Head of Consumer Media & Technology,
Futuresource Consulting

Video: Doing Better Congestion Control with BBR & Copa

In networking there are many possible bottlenecks, but the most pervasive is congestion caused by links operating at capacity and saturating the buffers. Full buffers are unable to fully adapt to the incoming traffic, increasing the chances of dropped packets, but the extra latency added by full buffer after full buffer quickly adds up and this extra latency further degrades the quality of the connection for the data that does make it through.

It’s no surprise then, that a lot of work goes into finding the best ‘congestion’ algorithms to allow data senders to back off when a link stops responding well. This talk, from Facebook engineer Nitin Garg, examines old and new approaches to keeping streams fast and responsive by running a 4-million-data-point test of three contenders, Cubic, BBR and Copa.


Nitin starts by introducing what we mean by ‘congestion’, how and why it occurs. The simple example is that your computer can send data, typically, at up to 1Gbps, yet your uplink to the internet is likely below this number. So congestion control is a feedback mechanism which lets your computer realise that sending at 1Gbps isn’t working and allows it to throttle back to a speed which fits within your upload bandwidth. The same is true further down the pipe. If you have 50Mbps uplink to the internet, but you are sending to a server which only has 10Mbps left, not only does your computer need to throttle below 50, but also 10Mbps.

We then walk through how Cubic, BBR and Copa work with Nitin explaining the differences. <a href=”https://web.mit.edu/copa/” rel=”noopener” target=”_blank>Copa is the newest of the protocols comes from MIT and comes with the unique ability to tune it to your need; throughput or low latency. As discussed above, to keep latency down, buffer size needs to be minimised which stops you being aggressive in loading up links which leads to latency and throughput being at opposite ends of a see-saw.

Nitin’s test was on mobile phones using Facebook’s Live streaming app on Android and iOS for live streaming with ABR where the app itself adapts to ensure that it is streaming with as high a quality as possible, but willing to reduce the bitrate when needed. Testing from global markets, they measured round trip times and the amount of delivered data. Nitin walks through the results both for latency and throughput and shows that when Copa is optimised for latency, in the worst conditions it leads the other two protocols in latency reduction.

Watch now!
Speakers

Nitin Garg Nitin Garg
Software Engineer, Videos Infra,
Facebook