A nuanced look at AV1. If we’ve learnt one thing about codecs over the last year or more, it’s that in the modern world pure bitrate efficiency isn’t the only game in town. JPEG 2000 and, now, JPEG XS, have always been excused their high bitrate compared to MPEG codecs because they deliver low latency and high fidelity. Now, it’s clear that we also need to consider the computational demand of codec when evaluating which to use in any one situation.
John Porterfield welcomes Facebook’s David Ronca to understand how AV1’s arriving on the market. David’s the director of Facebook’s video processing team, so is in pole position to understand how useful AV1 is in delivering video to viewers and how well it achieves its goals. The conversation looks at how to encode, the unexpected ways in which AV1 performs better than other codecs and the state of the hardware and software decoder ecosystem.
David starts by looking at the convex hull, explaining that it’s a way of encoding content multiple times at different resolutions and bitrates and graphing the results. This graph allows you to find the best combination of bitrate and resolution for a target quality. This works well, but the multiple encodes burdens the decision with a lot of extra computation to get the best set of encoding parameters. As proof of its effectiveness, David cites a time when a 200kbps max target was given for and encoder of video plus audio. The convex hull method gave a good experience for small screens despite the compromises made in encoding fidelity. The important part is being flexible on which resolution you choose to encode because by allowing the resolution to drift up or down as well as the bitrate, higher fidelity combinations can be found over keeping the resolution fixed. This is called per-title encoding and was pioneered by Netflix as discussed in the linked talk, where David previously worked and authored this blog post on the topic.
It’s an accepted fact that encoder complexity increases for every generation. Whilst this makes sense, particularly in the standard MPEG line where MPEG 2 gave way to AVC which gave way to HEVC which is now being superseded by VVC all of which achieved an approximately 50% compression improvement at the cost of a ten-fold computation increase. But David contends that this buries the lede. Whilst it’s true that the best (read: slowest) compression improves by 50% and has a 10% complexity increase, it’s often missed that at the other end of the curve, one of the fastest settings of the newer codec can now match the best of the old codec with a 90% reduction in computation. For companies working in the software world encoding, this is big news. David demonstrates this by graphing the SVT-AV1 encoder against the x265 HEVC encoder and that against x264.
David touches on an important point, that there is so much video encoding going on in the tech giants and distributed around the world, that it’s important for us to keep reducing the complexity year on year. As it is now, with the complexity increasing with each generation of encoder, something has to give in the future otherwise complexity will go off the scale. The Alliance for Open Media’s AV1 has something to say on the topic as it’s improved on HEVC with only a 5% increase in complexity. Other codecs such as MPEG’s LCEVC also deliver improved bitrate but at lower complexity. There is a clear environmental impact from video encoding and David is focused on reducing this.
AOM is also fighting the commercial problem that codecs have. Companies don’t mind paying for codecs, but they do mind uncertainty. After all, what’s the point in paying for a codec if you still might be approached for more money. Whilst MPEG’s implementation of VVC and EVC aims to give more control to companies to help them control their risk, AOM’s royalty-free codec with a defence fund against legal attacks, arguably, gives the most predictable risk of all. AOM’s aim, David explains, is to allow the web to expand without having to worry about royalty fees.
Next is some disappointing news for AV1 fans. Hardware decoder deployments have been delayed until 2023/24 which probably means no meaningful mobile penetration until 2026/27. In the meantime the very good dav1d decoder and also gav1 are expected to fill the gap. Already quite fast, the aim is for them to be able to do 720p60 decoding for average android devices by 2024.
In the penultimate look back at the top articles of 2020, we recognise the continued focus on new codecs. Let’s not shy away from saying 2020 was generous giving us VVC, LCEVC and EVC from MPEG. AV1 was actually delivered in 2018 with an update (Errata 1) in 2019. However, the industry has avidly tracked the improved speeds of the encoder and decoder implementations.
Lastly, no codec discussion has much relevance without comparing to AV1, HEVC and VP9.
So with all these codecs spinning around it’s no surprise that one of the top views of 2020 was a video entitled “VVC, EVC, LCEVC, WTF? – An update on the next hot codecs from MPEG”. This video was from 2019 and since these have all been published now, this extensive roundup from SMPTE is a much better resource to understand these codecs in detail and in context with their predecessors.
The article explains many of the features of the new codecs: both how they work and also why there are three. Afterall, if VVC is so good, why release EVC? We learn that they optimise for different features such as computation, bitrate and patent licensing among other aspects.
Director, Video Strategy and Standards,
Director, Image Technologies,
The codec arena is a lot more complex than before. Gone is the world of 5 years ago with AVC doing nearly everything. Whilst AVC is still a major force, we now have AV1 and VP9 being used globally with billions of uses a year, HEVC is not the force majeure it was once expected to be, but is now seeing significant use on iPhones and overall adoption continues to grow. And now, in 2020 we see three new codecs on the scene, VVC, EVC and LCEVC.
To help us make sense of this SMPTE has invited Walt Husak and Sean McCarthy to take us through what the current codecs are, what makes them different, how well they work, how to compare them and what the future roadmaps hold.
Sean starts by explaining which codecs are maintained by which bodies, with the IEC, ITU and MPEG being involved, not to mention the corporate codecs (VP8, and VP9 from Google) and the Chinese AVS series of codecs. Sean explains that these share major common elements and are each evolutions of each other. But why are all these codecs needed? Next, we see the use-cases that have brought these codecs into existence. Granted, AVC and HEVC entered the scene to reduce bit rate in an effort to make HD and UHD practical, respectively, but EVC and LC-EVC have different aims.
Sean gives a brief overview of the basics of encoding starting with partitioning the image, predicting parts of it, applying transformations, refining it (also known as applying ‘loop filters) and finishing with entropy codings. All of these blocks are briefly explained and exist in all the codecs covered in this talk. The evolutions which make the newer codecs better are therefore evolutions of each of these elements. For instance, explains Sean, splitting the image into different sections, known as partitioning, has become more sophisticated in recent codecs allowing for larger sections to be considered at once but, at the same time, smaller partitions created within each.
All codecs have profiles whereby the tools in use, or the complexity of their implementation, is standardised for certain types of video: 8-bit, 10-bit, HDR etc. This allows hardware implementers to understand the upper bounds of computation so they don’t end up over-provisioning hardware resources and increasing the cost. Sean looks at how VVC uses the same tools throughout all of its four profiles with only a few exceptions. Screen content sees two extra tools come for 4:2:2 formats and above. AV1 has the same tools throughout all the profiles but, deliberately, EVC doesn’t. Essential Video Coding has a royalty-free base layer which uses techniques that are not subject to any use payments. Using this layer gives you AVC-quality encoding, approximately. Using the main profile, however, gets you similar to HEVC encoding albeit with royalty payments.
The next part of the talk examines two main reasons for the increase in compression over recent codec generation, block size and partitioning, before highlighting some new tools in VVC and AV1. Block size refers to the size of the blocks that an image is split up into for processing. By using a larger block, the algorithms can spot patterns more efficiently so the continued increase from 16×16 in AVC to 128×128 now in VVC drives an increase in computation but also in compression. Once you have your block, splitting it up following the features of the images is the next stage. Called partitioning, we see the number of ways that the codecs can mathematically split a block has grown significantly. VVC can also partition chroma separately to luma. VVC and AV1 also include 64 and 16 ways, respectively, to diagonally partition rather than the typical vertical and horizontal partitioning modes.
Screen content coding tools are increasingly important, pandemics aside, there has long been growth in the amount of computer-generated content being shared online whether that’s through esports, video conference screen sharing or elsewhere. Truth be told, HEVC has support for screen-content encoding but it’s not in the main profile so many implementations don’t support it. VVC not only evolves the screen-content tools, but it also makes it present as default. AV1, also, was designed to work well with screen content. Sean takes some time to look at the IBC tool, intra-block copy, which allows the encoder to relate parts of the current frame to other sections. Working at the prediction stage, with screen content which contains, for instance, lots of text, parts of that text will look similar and to a first approximation, one part of the image can be duplicated in another. This is similar to motion compensation where a macroblock is ‘copied’ to another frame in a different position, but all the work is done on the present frame for Intra BC. Palette mode is another screen content tool which allows the colour of a section of the image to be described as a palette of colours rather than using the full RGB value for each and every pixel.
Sean covers the scaled prediction between resolutions in VVC and super-resolution in AV1, VVC’s 360-degree video optimisations and luma mapping before handing over to Walt Husak who goes into more detail on how the newer codecs work, starting with LCEVC.
LCEVC is a codec which improves the performance of already-deployed codecs, typically used to enhance the spatial resolution. If you wanted to encode HD, the codec would downsample the HD to an SD resolution and encode that with AVC, HEVC or another codec. At the same time, it would upsample that encoded video again and generate to correction layers which correct for artefacts and add sharpness. This information is added into to the base codec and sent to the decoder. This can allow a software-only enhancement to a hardware deployment fully utilising the hardware which has already been deployed. Walt notes that the enhancement layers are much the same technology as has already been standardised by SMPTE as VC6 (ST 2117). LCEVC has been found to be computationally efficient allowing it to address markets such as embedded devices where hardware restrictions would otherwise prohibit use of higher resolutions than for which it was originally designed. Very low bitrate performance is also very good.
Sean introduces us to his “Dos and Don’ts” of codec comparisons. The theme running through them is to take care that you are comparing like for like. Codecs can be set to run ‘fast’ or ‘slow’ each of which holds its own compromises in terms of encoding time and resulting quality. Similarly, there are some implementations which are made simply to implement the standard as rigorously as possible which is an invaluable tool when developing the codec or an implementation. Such a reference implementation for codec X, clearly, shouldn’t be compared to production implementations of a codec Y as the times are guaranteed to be very different and you will not learn anything from the process. Similarly, there are different tools which give codecs much more time to optimise known as single- and double-pass which shouldn’t be cross-compared.
The talk draws to a close with a look at codec performance. Sean shows a number of graphs showing how VVC performs against HEVC. Interestingly the metrics clearly show a 40% increase in efficiency of VVC over HEVC, but when seen in subjective tests, the ratings show a 50% improvement. VVC’s encoder is approximately 10x as complex as HEVC’s.
HEVC and AV1 perform similarly for the same bit rate. Overall, Sean says, AV1 is a little blurrier in regions of spatial detail and can have some temporal flickering. HEVC is more likely to have blocking and ringing artefacts. EVC’s main profile is up to 29% better than HEVC. LCEVC performs up to 8% better than AVC when using an AVC base layer and also slightly better than HEVC when using an HEVC base codec. Sean makes the point that the AVC has been continually updated since its initial release and is now on version 27, so it’s not strictly true to simply say it’s an ‘old’ codec. HEVC similarly is on version 7. Sean runs down part of the roadmap for AVC which leads on to the use of AI in codecs.
Finishing the video, Walt looks at the use of Deep Learning in codecs. Deep learning is also known as machine learning and referred to as AI (Artificial Intelligence). For most people, these terms are interchangeable and refer to the ability of a signal to be manipulated not by a fixed equation or algorithm (such as Lanczos scaling) but by a computer that has been trained through many millions of examples to recognise what looks ‘right’ and to replicate that effect in new scenarios.
Walt talks about JPEG’s AI learning research on still images who are aiming to complete an ‘end-to-end’ study of compression with AI tools. There’s also MPEG’s Deep Neural Network-based Video Coding which is looking at which tools within codecs can be replaced with AI. Also, recently we have seen the foundation of the MPAI (Moving Picture, Audio and Data Coding by Artificial Intelligence) organisation by Leonardo Chiariglione, an industry body devoted to the use of AI in compression. With all this activity, it’s clear that future advances in compression will be driven by the increasing use of these techniques.
If the tyranny of frame buffers is let to continue, line-latency I/O is rendered impossible without increasing frame-rate to 60fps or, preferably, beyond. In SDI, hardware was able to process video line-by-line. Now, with uncompressed SDI, is the same possible with IT hardware?
Kieran Kunhya from Open Broadcast Systems explains how he has been able to develop line-latency video I/O with SMPTE 2110, how he’s coupled that with low-latency AVC and HEVC encoding and the challenges his company has had to overcome.
The commercial drivers are fairly well known for reducing the latency. Firstly, for standard 1080i50, typically treated as 25fps, if you have a single frame buffer, you are treated to a 40ms delay. If you need multiple buffers for a workflow, this soon stacks up so whatever the latency of your codec – uncompressed or JPEG XS, for example – the latency will be far above it. In today’s covid world, companies are looking for cutting the latency so people can work remotely. This has only intensified the interest that was already there for the purposes of remote production (REMIs) in having low-latency feeds. In the Covid world, low latency allows full engagement in conversations which is vital for news anchors to conduct interviews as well as they would in person.
IP, itself, has come into its own during recent times where there has been no-one around to move an SDI cable, being able to log in and scale up, or down, SMPTE ST 2110 infrastructure remotely is a major benefit. IT equipment has been shown to be fairly resilient to supply chain disruption during the pandemic, says Kieran, due to the industry being larger and being used to scaling up.
Kieran’s approach to receiving ST 2110 deals in chunks of 5 to 10 lines. This gives you time to process the last few lines whilst you are waiting for the next to arrive. This processing can be de-encapsulation, processing the pixel values to translate to another format or to modify the values to key on graphics.
As the world is focussed on delivering in and out of unusual and residential places, low-bitrate is the name of the game. So Kieran looks at low-latency HEVC/AVC encoding as part of an example workflow which takes in ST 2110 video at the broadcaster and encodes to MPEG to deliver to the home. In the home, the video is likely to be decoded natively on a computer, but Kieran shows an SDI card which can be used to deliver in traditional baseband if necessary.
Kieran talks about the dos and don’ts of encoding and decoding with AVC and HEVC with low latency targetting an end-to-end budget of 100ms. The name of the game is to avoid waiting for whole frames, so refreshing the screen with I-frame information in small slices, is one way of keeping the decoder supplied with fresh information without having to take the full-frame hit of 40ms (for 1080i50). Audio is best sent uncompressed to ensure its latency is lower than that of the video.
Decoding requires carefully handling the slice boundaries, ensuring deblocking i used so there are no artefacts seen. Compressed video is often not PTP locked which does mean that delivery into most ST 2110 infrastructures requires frame synchronising and resampling audio.
Kieran foresees increasing use of 2110 to MPEG Transport Stream back to 2110 workflows during the pandemic and finishes by discussing the tradeoffs in delivering during Covid.
CEO & Founder, Open Broadcast Systems
Subscribe to get daily updates
Views and opinions expressed on this website are those of the author(s) and do not necessarily reflect those of SMPTE or SMPTE Members.
This website is presented for informational purposes only. Any reference to specific companies, products or services does not represent promotion, recommendation, or endorsement by SMPTE