AV1’s royalty-free status continues to be very appealing, but in raw compression is it losing ground now to the newer codecs such as VVC? EVC has also introduced a royalty-free model which could also detract from AV1’s appeal and certainly is an improvement over HEVC’s patent debacle. We have very much moved into an ecosystem of patents rather than the MPEG2/AVC ‘monoculture’ of the 90s within broadcast. What better way to get a feel for the codecs but to put them to the test?
Dan Grois from Comcast has been looking at the new codecs VVC and EVC against AV1 and HEVC. VVC and EVC were both released last year and join LCEVC as the three most recent video codecs from MPEG (VVC was a collaboration between MPEG and IRU). In the same way, HEVC is known as H.265, VVC can be called H.266 and it draws its heritage from the HEVC too. EVC, on the other hand, is a new beast whose roots are absolutely shared with much of MPEG’s previous DCT-based codecs, but uniquely it has a mode that is totally royalty-free. Moreover, its high-performant mode which does include patented technology can be configured to exclude any individual patents that you don’t wish to use thus adding some confidence that businesses remain in control of their liabilities.
Dan starts by outlining the main features of the four codecs discussing their partitioning methods and prediction capabilities which range from inter-picture, intra-picture and predicting chroma from the luma picture. Some of these techniques have been tackled in previous talks such as this one, also from Mile High Video and this EVC overview and, finally, this excellent deep dive from SMPTE in to all of the codecs discussed today plus LCEVC.
Dan explains the testing he did which was based on the reference encoder models. These are encoders that implement all of the features of a codec but are not necessarily optimised for speed like a real-world implementation would be. Part of the work delivering real-world implementations is using sophisticated optimisations to get the maths done quickly and some is choosing which parts of the standard to implement. A reference encoder doesn’t skimp on implementation complexity, and there is seldom much time to optimise speed. However, they are well known and can be used to benchmark codecs against each other. AV1 was tested in two configurations since
AV1 needs special treatment in this comparison. Dan explains that AV1 doesn’t have the same approach to GOPs as MEPG so it’s well known that fixing it’s QP will make it inefficient, however, this is what’s necessary for a fair comparison so, in addition to this, it’s also run in VBR mode which allows it to use its GOP structure to the full such as AV1’s invisible frames which carry data which can be referenced by other frames but which are never actually displayed.
The videos tested range from 4K 10bit down to low resolution 8 bit. As expected VVC outperforms all other codecs. Against HEVC, it’s around 40% better though carrying with it a factor of 10 increase in encoding complexity. Note that these objective metrics tend to underrepresent subjective metrics by 5-10%. EVC consistently achieved 25 to 30% improvements over HEVC with only 4.5x the encoder complexity. As expected AV1’s fixed QP mode underperformed and increased data rate on anything which wasn’t UHD material but when run in VBR mode managed 20% over HEVC with only a 3x increase in complexity.
A nuanced look at AV1. If we’ve learnt one thing about codecs over the last year or more, it’s that in the modern world pure bitrate efficiency isn’t the only game in town. JPEG 2000 and, now, JPEG XS, have always been excused their high bitrate compared to MPEG codecs because they deliver low latency and high fidelity. Now, it’s clear that we also need to consider the computational demand of codec when evaluating which to use in any one situation.
John Porterfield welcomes Facebook’s David Ronca to understand how AV1’s arriving on the market. David’s the director of Facebook’s video processing team, so is in pole position to understand how useful AV1 is in delivering video to viewers and how well it achieves its goals. The conversation looks at how to encode, the unexpected ways in which AV1 performs better than other codecs and the state of the hardware and software decoder ecosystem.
David starts by looking at the convex hull, explaining that it’s a way of encoding content multiple times at different resolutions and bitrates and graphing the results. This graph allows you to find the best combination of bitrate and resolution for a target quality. This works well, but the multiple encodes burdens the decision with a lot of extra computation to get the best set of encoding parameters. As proof of its effectiveness, David cites a time when a 200kbps max target was given for and encoder of video plus audio. The convex hull method gave a good experience for small screens despite the compromises made in encoding fidelity. The important part is being flexible on which resolution you choose to encode because by allowing the resolution to drift up or down as well as the bitrate, higher fidelity combinations can be found over keeping the resolution fixed. This is called per-title encoding and was pioneered by Netflix as discussed in the linked talk, where David previously worked and authored this blog post on the topic.
It’s an accepted fact that encoder complexity increases for every generation. Whilst this makes sense, particularly in the standard MPEG line where MPEG 2 gave way to AVC which gave way to HEVC which is now being superseded by VVC all of which achieved an approximately 50% compression improvement at the cost of a ten-fold computation increase. But David contends that this buries the lede. Whilst it’s true that the best (read: slowest) compression improves by 50% and has a 10% complexity increase, it’s often missed that at the other end of the curve, one of the fastest settings of the newer codec can now match the best of the old codec with a 90% reduction in computation. For companies working in the software world encoding, this is big news. David demonstrates this by graphing the SVT-AV1 encoder against the x265 HEVC encoder and that against x264.
David touches on an important point, that there is so much video encoding going on in the tech giants and distributed around the world, that it’s important for us to keep reducing the complexity year on year. As it is now, with the complexity increasing with each generation of encoder, something has to give in the future otherwise complexity will go off the scale. The Alliance for Open Media’s AV1 has something to say on the topic as it’s improved on HEVC with only a 5% increase in complexity. Other codecs such as MPEG’s LCEVC also deliver improved bitrate but at lower complexity. There is a clear environmental impact from video encoding and David is focused on reducing this.
AOM is also fighting the commercial problem that codecs have. Companies don’t mind paying for codecs, but they do mind uncertainty. After all, what’s the point in paying for a codec if you still might be approached for more money. Whilst MPEG’s implementation of VVC and EVC aims to give more control to companies to help them control their risk, AOM’s royalty-free codec with a defence fund against legal attacks, arguably, gives the most predictable risk of all. AOM’s aim, David explains, is to allow the web to expand without having to worry about royalty fees.
Next is some disappointing news for AV1 fans. Hardware decoder deployments have been delayed until 2023/24 which probably means no meaningful mobile penetration until 2026/27. In the meantime the very good dav1d decoder and also gav1 are expected to fill the gap. Already quite fast, the aim is for them to be able to do 720p60 decoding for average android devices by 2024.
In the penultimate look back at the top articles of 2020, we recognise the continued focus on new codecs. Let’s not shy away from saying 2020 was generous giving us VVC, LCEVC and EVC from MPEG. AV1 was actually delivered in 2018 with an update (Errata 1) in 2019. However, the industry has avidly tracked the improved speeds of the encoder and decoder implementations.
Lastly, no codec discussion has much relevance without comparing to AV1, HEVC and VP9.
So with all these codecs spinning around it’s no surprise that one of the top views of 2020 was a video entitled “VVC, EVC, LCEVC, WTF? – An update on the next hot codecs from MPEG”. This video was from 2019 and since these have all been published now, this extensive roundup from SMPTE is a much better resource to understand these codecs in detail and in context with their predecessors.
The article explains many of the features of the new codecs: both how they work and also why there are three. Afterall, if VVC is so good, why release EVC? We learn that they optimise for different features such as computation, bitrate and patent licensing among other aspects.
Director, Video Strategy and Standards,
Director, Image Technologies,
The codec arena is a lot more complex than before. Gone is the world of 5 years ago with AVC doing nearly everything. Whilst AVC is still a major force, we now have AV1 and VP9 being used globally with billions of uses a year, HEVC is not the force majeure it was once expected to be, but is now seeing significant use on iPhones and overall adoption continues to grow. And now, in 2020 we see three new codecs on the scene, VVC, EVC and LCEVC.
To help us make sense of this SMPTE has invited Walt Husak and Sean McCarthy to take us through what the current codecs are, what makes them different, how well they work, how to compare them and what the future roadmaps hold.
Sean starts by explaining which codecs are maintained by which bodies, with the IEC, ITU and MPEG being involved, not to mention the corporate codecs (VP8, and VP9 from Google) and the Chinese AVS series of codecs. Sean explains that these share major common elements and are each evolutions of each other. But why are all these codecs needed? Next, we see the use-cases that have brought these codecs into existence. Granted, AVC and HEVC entered the scene to reduce bitrate in an effort to make HD and UHD practical, respectively, but EVC and LC-EVC have different aims.
Sean gives a brief overview of the basics of encoding starting with partitioning the image, predicting parts of it, applying transformations, refining it (also known as applying ‘loop filters) and finishing with entropy codings. All of these blocks are briefly explained and exist in all the codecs covered in this talk. The evolutions which make the newer codecs better are therefore evolutions of each of these elements. For instance, explains Sean, splitting the image into different sections, known as partitioning, has become more sophisticated in recent codecs allowing for larger sections to be considered at once but, at the same time, smaller partitions created within each.
All codecs have profiles whereby the tools in use, or the complexity of their implementation, is standardised for certain types of video: 8-bit, 10-bit, HDR etc. This allows hardware implementers to understand the upper bounds of computation so they don’t end up over-provisioning hardware resources and increasing the cost. Sean looks at how VVC uses the same tools throughout all of its four profiles with only a few exceptions. Screen content sees two extra tools come for 4:2:2 formats and above. AV1 has the same tools throughout all the profiles but, deliberately, EVC doesn’t. Essential Video Coding has a royalty-free base layer that uses techniques that are not subject to any use payments. Using this layer gives you AVC-quality encoding, approximately. Using the main profile, however, gets you similar to HEVC encoding albeit with royalty payments.
The next part of the talk examines two main reasons for the increase in compression over recent codec generation, block size and partitioning, before highlighting some new tools in VVC and AV1. Block size refers to the size of the blocks that an image is split up into for processing. By using a larger block, the algorithms can spot patterns more efficiently so the continued increase from 16×16 in AVC to 128×128 now in VVC drives an increase in computation but also in compression. Once you have your block, splitting it up following the features of the images is the next stage. Called partitioning, we see the number of ways that the codecs can mathematically split a block has grown significantly. VVC can also partition chroma separately to luma. VVC and AV1 also include 64 and 16 ways, respectively, to diagonally partition rather than the typical vertical and horizontal partitioning modes.
Screen content coding tools are increasingly important, pandemics aside, there has long been growth in the amount of computer-generated content being shared online whether that’s through esports, video conference screen sharing or elsewhere. Truth be told, HEVC has support for screen-content encoding but it’s not in the main profile so many implementations don’t support it. VVC not only evolves the screen-content tools, but it also makes it present as default. AV1, also, was designed to work well with screen content. Sean takes some time to look at the IBC tool, intra-block copy, which allows the encoder to relate parts of the current frame to other sections. Working at the prediction stage, with screen content that contains, for instance, lots of text, parts of that text will look similar and to a first approximation, one part of the image can be duplicated in another. This is similar to motion compensation where a macroblock is ‘copied’ to another frame in a different position, but all the work is done on the present frame for Intra BC. Palette mode is another screen content tool that allows the colour of a section of the image to be described as a palette of colours rather than using the full RGB value for each and every pixel.
Sean covers the scaled prediction between resolutions in VVC and super-resolution in AV1, VVC’s 360-degree video optimisations and luma mapping before handing over to Walt Husak who goes into more detail on how the newer codecs work, starting with LCEVC.
LCEVC is a codec that improves the performance of already-deployed codecs, typically used to enhance spatial resolution. If you wanted to encode HD, the codec would downsample the HD to an SD resolution and encode that with AVC, HEVC or another codec. At the same time, it would upsample that encoded video again and generate to correction layers that correct for artefacts and add sharpness. This information is added into the base codec and sent to the decoder. This can allow a software-only enhancement to a hardware deployment fully utilising the hardware which has already been deployed. Walt notes that the enhancement layers are much the same technology as has already been standardised by SMPTE as VC6 (ST 2117). LCEVC has been found to be computationally efficient allowing it to address markets such as embedded devices where hardware restrictions would otherwise prohibit the use of higher resolutions than for which it was originally designed. Very low bitrate performance is also very good.
Sean introduces us to his “Dos and Don’ts” of codec comparisons. The theme running through them is to take care that you are comparing like for like. Codecs can be set to run ‘fast’ or ‘slow’ each of which holds its own compromises in terms of encoding time and resulting quality. Similarly, there are some implementations that are made simply to implement the standard as rigorously as possible which is an invaluable tool when developing the codec or an implementation. Such a reference implementation for codec X, clearly, shouldn’t be compared to production implementations of a codec Y as the times are guaranteed to be very different and you will not learn anything from the process. Similarly, there are different tools that give codecs much more time to optimise known as single- and double-pass which shouldn’t be cross-compared.
The talk draws to a close with a look at codec performance. Sean shows a number of graphs showing how VVC performs against HEVC. Interestingly the metrics clearly show a 40% increase in efficiency of VVC over HEVC, but when seen in subjective tests, the ratings show a 50% improvement. VVC’s encoder is approximately 10x as complex as HEVC’s.
HEVC and AV1 perform similarly for the same bit rate. Overall, Sean says, AV1 is a little blurrier in regions of spatial detail and can have some temporal flickering. HEVC is more likely to have blocking and ringing artefacts. EVC’s main profile is up to 29% better than HEVC. LCEVC performs up to 8% better than AVC when using an AVC base layer and also slightly better than HEVC when using an HEVC based codec. Sean makes the point that the AVC has been continually updated since its initial release and is now on version 27, so it’s not strictly true to simply say it’s an ‘old’ codec. HEVC similarly is on version 7. Sean runs down part of the roadmap for AVC which leads on to the use of AI in codecs.
Finishing the video, Walt looks at the use of Deep Learning in codecs. Deep learning is also known as machine learning and referred to as AI (Artificial Intelligence). For most people, these terms are interchangeable and refer to the ability of a signal to be manipulated not by a fixed equation or algorithm (such as Lanczos scaling) but by a computer that has been trained through many millions of examples to recognise what looks ‘right’ and to replicate that effect in new scenarios.
Walt talks about JPEG’s AI learning research on still images who are aiming to complete an ‘end-to-end’ study of compression with AI tools. There’s also MPEG’s Deep Neural Network-based Video Coding which is looking at which tools within codecs can be replaced with AI. Also, recently we have seen the foundation of the MPAI (Moving Picture, Audio and Data Coding by Artificial Intelligence) organisation by Leonardo Chiariglione, an industry body devoted to the use of AI in compression. With all this activity, it’s clear that future advances in compression will be driven by the increasing use of these techniques.
Director, Video Strategy and Standards,
Director, Image Technologies,
Subscribe to get daily updates
Views and opinions expressed on this website are those of the author(s) and do not necessarily reflect those of SMPTE or SMPTE Members.
This website is presented for informational purposes only. Any reference to specific companies, products or services does not represent promotion, recommendation, or endorsement by SMPTE