AV1’s royalty-free status continues to be very appealing, but in raw compression is it losing ground now to the newer codecs such as VVC? EVC has also introduced a royalty-free model which could also detract from AV1’s appeal and certainly is an improvement over HEVC’s patent debacle. We have very much moved into an ecosystem of patents rather than the MPEG2/AVC ‘monoculture’ of the 90s within broadcast. What better way to get a feel for the codecs but to put them to the test?
Dan Grois from Comcast has been looking at the new codecs VVC and EVC against AV1 and HEVC. VVC and EVC were both released last year and join LCEVC as the three most recent video codecs from MPEG (VVC was a collaboration between MPEG and IRU). In the same way, HEVC is known as H.265, VVC can be called H.266 and it draws its heritage from the HEVC too. EVC, on the other hand, is a new beast whose roots are absolutely shared with much of MPEG’s previous DCT-based codecs, but uniquely it has a mode that is totally royalty-free. Moreover, its high-performant mode which does include patented technology can be configured to exclude any individual patents that you don’t wish to use thus adding some confidence that businesses remain in control of their liabilities.
Dan starts by outlining the main features of the four codecs discussing their partitioning methods and prediction capabilities which range from inter-picture, intra-picture and predicting chroma from the luma picture. Some of these techniques have been tackled in previous talks such as this one, also from Mile High Video and this EVC overview and, finally, this excellent deep dive from SMPTE in to all of the codecs discussed today plus LCEVC.
Dan explains the testing he did which was based on the reference encoder models. These are encoders that implement all of the features of a codec but are not necessarily optimised for speed like a real-world implementation would be. Part of the work delivering real-world implementations is using sophisticated optimisations to get the maths done quickly and some is choosing which parts of the standard to implement. A reference encoder doesn’t skimp on implementation complexity, and there is seldom much time to optimise speed. However, they are well known and can be used to benchmark codecs against each other. AV1 was tested in two configurations since
AV1 needs special treatment in this comparison. Dan explains that AV1 doesn’t have the same approach to GOPs as MEPG so it’s well known that fixing it’s QP will make it inefficient, however, this is what’s necessary for a fair comparison so, in addition to this, it’s also run in VBR mode which allows it to use its GOP structure to the full such as AV1’s invisible frames which carry data which can be referenced by other frames but which are never actually displayed.
The videos tested range from 4K 10bit down to low resolution 8 bit. As expected VVC outperforms all other codecs. Against HEVC, it’s around 40% better though carrying with it a factor of 10 increase in encoding complexity. Note that these objective metrics tend to underrepresent subjective metrics by 5-10%. EVC consistently achieved 25 to 30% improvements over HEVC with only 4.5x the encoder complexity. As expected AV1’s fixed QP mode underperformed and increased data rate on anything which wasn’t UHD material but when run in VBR mode managed 20% over HEVC with only a 3x increase in complexity.
The codec arena is a lot more complex than before. Gone is the world of 5 years ago with AVC doing nearly everything. Whilst AVC is still a major force, we now have AV1 and VP9 being used globally with billions of uses a year, HEVC is not the force majeure it was once expected to be, but is now seeing significant use on iPhones and overall adoption continues to grow. And now, in 2020 we see three new codecs on the scene, VVC, EVC and LCEVC.
To help us make sense of this SMPTE has invited Walt Husak and Sean McCarthy to take us through what the current codecs are, what makes them different, how well they work, how to compare them and what the future roadmaps hold.
Sean starts by explaining which codecs are maintained by which bodies, with the IEC, ITU and MPEG being involved, not to mention the corporate codecs (VP8, and VP9 from Google) and the Chinese AVS series of codecs. Sean explains that these share major common elements and are each evolutions of each other. But why are all these codecs needed? Next, we see the use-cases that have brought these codecs into existence. Granted, AVC and HEVC entered the scene to reduce bitrate in an effort to make HD and UHD practical, respectively, but EVC and LC-EVC have different aims.
Sean gives a brief overview of the basics of encoding starting with partitioning the image, predicting parts of it, applying transformations, refining it (also known as applying ‘loop filters) and finishing with entropy codings. All of these blocks are briefly explained and exist in all the codecs covered in this talk. The evolutions which make the newer codecs better are therefore evolutions of each of these elements. For instance, explains Sean, splitting the image into different sections, known as partitioning, has become more sophisticated in recent codecs allowing for larger sections to be considered at once but, at the same time, smaller partitions created within each.
All codecs have profiles whereby the tools in use, or the complexity of their implementation, is standardised for certain types of video: 8-bit, 10-bit, HDR etc. This allows hardware implementers to understand the upper bounds of computation so they don’t end up over-provisioning hardware resources and increasing the cost. Sean looks at how VVC uses the same tools throughout all of its four profiles with only a few exceptions. Screen content sees two extra tools come for 4:2:2 formats and above. AV1 has the same tools throughout all the profiles but, deliberately, EVC doesn’t. Essential Video Coding has a royalty-free base layer that uses techniques that are not subject to any use payments. Using this layer gives you AVC-quality encoding, approximately. Using the main profile, however, gets you similar to HEVC encoding albeit with royalty payments.
The next part of the talk examines two main reasons for the increase in compression over recent codec generation, block size and partitioning, before highlighting some new tools in VVC and AV1. Block size refers to the size of the blocks that an image is split up into for processing. By using a larger block, the algorithms can spot patterns more efficiently so the continued increase from 16×16 in AVC to 128×128 now in VVC drives an increase in computation but also in compression. Once you have your block, splitting it up following the features of the images is the next stage. Called partitioning, we see the number of ways that the codecs can mathematically split a block has grown significantly. VVC can also partition chroma separately to luma. VVC and AV1 also include 64 and 16 ways, respectively, to diagonally partition rather than the typical vertical and horizontal partitioning modes.
Screen content coding tools are increasingly important, pandemics aside, there has long been growth in the amount of computer-generated content being shared online whether that’s through esports, video conference screen sharing or elsewhere. Truth be told, HEVC has support for screen-content encoding but it’s not in the main profile so many implementations don’t support it. VVC not only evolves the screen-content tools, but it also makes it present as default. AV1, also, was designed to work well with screen content. Sean takes some time to look at the IBC tool, intra-block copy, which allows the encoder to relate parts of the current frame to other sections. Working at the prediction stage, with screen content that contains, for instance, lots of text, parts of that text will look similar and to a first approximation, one part of the image can be duplicated in another. This is similar to motion compensation where a macroblock is ‘copied’ to another frame in a different position, but all the work is done on the present frame for Intra BC. Palette mode is another screen content tool that allows the colour of a section of the image to be described as a palette of colours rather than using the full RGB value for each and every pixel.
Sean covers the scaled prediction between resolutions in VVC and super-resolution in AV1, VVC’s 360-degree video optimisations and luma mapping before handing over to Walt Husak who goes into more detail on how the newer codecs work, starting with LCEVC.
LCEVC is a codec that improves the performance of already-deployed codecs, typically used to enhance spatial resolution. If you wanted to encode HD, the codec would downsample the HD to an SD resolution and encode that with AVC, HEVC or another codec. At the same time, it would upsample that encoded video again and generate to correction layers that correct for artefacts and add sharpness. This information is added into the base codec and sent to the decoder. This can allow a software-only enhancement to a hardware deployment fully utilising the hardware which has already been deployed. Walt notes that the enhancement layers are much the same technology as has already been standardised by SMPTE as VC6 (ST 2117). LCEVC has been found to be computationally efficient allowing it to address markets such as embedded devices where hardware restrictions would otherwise prohibit the use of higher resolutions than for which it was originally designed. Very low bitrate performance is also very good.
Sean introduces us to his “Dos and Don’ts” of codec comparisons. The theme running through them is to take care that you are comparing like for like. Codecs can be set to run ‘fast’ or ‘slow’ each of which holds its own compromises in terms of encoding time and resulting quality. Similarly, there are some implementations that are made simply to implement the standard as rigorously as possible which is an invaluable tool when developing the codec or an implementation. Such a reference implementation for codec X, clearly, shouldn’t be compared to production implementations of a codec Y as the times are guaranteed to be very different and you will not learn anything from the process. Similarly, there are different tools that give codecs much more time to optimise known as single- and double-pass which shouldn’t be cross-compared.
The talk draws to a close with a look at codec performance. Sean shows a number of graphs showing how VVC performs against HEVC. Interestingly the metrics clearly show a 40% increase in efficiency of VVC over HEVC, but when seen in subjective tests, the ratings show a 50% improvement. VVC’s encoder is approximately 10x as complex as HEVC’s.
HEVC and AV1 perform similarly for the same bit rate. Overall, Sean says, AV1 is a little blurrier in regions of spatial detail and can have some temporal flickering. HEVC is more likely to have blocking and ringing artefacts. EVC’s main profile is up to 29% better than HEVC. LCEVC performs up to 8% better than AVC when using an AVC base layer and also slightly better than HEVC when using an HEVC based codec. Sean makes the point that the AVC has been continually updated since its initial release and is now on version 27, so it’s not strictly true to simply say it’s an ‘old’ codec. HEVC similarly is on version 7. Sean runs down part of the roadmap for AVC which leads on to the use of AI in codecs.
Finishing the video, Walt looks at the use of Deep Learning in codecs. Deep learning is also known as machine learning and referred to as AI (Artificial Intelligence). For most people, these terms are interchangeable and refer to the ability of a signal to be manipulated not by a fixed equation or algorithm (such as Lanczos scaling) but by a computer that has been trained through many millions of examples to recognise what looks ‘right’ and to replicate that effect in new scenarios.
Walt talks about JPEG’s AI learning research on still images who are aiming to complete an ‘end-to-end’ study of compression with AI tools. There’s also MPEG’s Deep Neural Network-based Video Coding which is looking at which tools within codecs can be replaced with AI. Also, recently we have seen the foundation of the MPAI (Moving Picture, Audio and Data Coding by Artificial Intelligence) organisation by Leonardo Chiariglione, an industry body devoted to the use of AI in compression. With all this activity, it’s clear that future advances in compression will be driven by the increasing use of these techniques.
These next 12 months are going to see 3 new MPEG standards being released. What does this mean for the industry? How useful will they be and when can we start using them? MPEG’s coming to the market with a range of commercial models to show it’s learning from the mistakes of the past so it should be interesting to see the adoption levels in the year after their release. This is part of the second session of the Vienna Video Tech Meetup and delves into startup time for streaming services.
In the first talk, Dr. Christian Feldmann explains the current codec landscape highlighting the ubiquitous AVC (H.264), UHD’s friend, HEVC (H.265), and the newer VP9 & AV1. The latter two differentiate themselves by being free to used and are open, particularly AV1. Whilst slow, both the latter are seeing increasing adoption in streaming, but no one’s suggesting that AVC isn’t still the go-to codec for most online streaming.
Christian then introduces the three new codecs, EVC (Essential Video Coding), LCEVC (Low-Complexity Enhancement Video Coding) and VVC (Versatile Video Coding) all of which have different aims. We start by looking at EVC whose aim is too replicate the encoding efficiency of HEVC, but importantly to produce a royalty-free baseline profile as well as a main profile which improves efficiency further but with royalties. This will be the first time that you’ve been able to use an MPEG codec in this way to eliminate your liability for royalty payments. There is further protection in that if any of the tools is found to have patent problems, it can be individually turned off, the idea being that companies can have more confidence in deploying the new technology.
The next codec in the spotlight is LCEVC which uses an enhancement technique to encode video. The aim of this codec is to enable lower-end hardware to access high resolutions and/or lower bitrates. This can be useful in set-top boxes and for online streaming, but also for non-broadcast applications like small embedded recorders. It can achieve a light improvement in compression over HEVC, but it’s well known that HEVC is very computationally heavy.
LCEVC reduces computational needs by only encoding a lower resolution version (say, SD) of the video in a codec of your choice, whether that be AVC, HEVC or otherwise. The decoder will then decode this and upscale the video back to the original resolution, HD in this example. This would look soft, normally, but LCEVC also sends enhancement data to add back in the edges and detail that would have otherwise been lost. This can be done in CPU whilst the other decoding could be done by the dedicated AVC/HEVC hardware and naturally encoding/decoding a quarter-resolution image is much easier than the full resolution.
Lastly, VVC goes under the spotlight. This is the direct successor to HEVC and is also known as H.266. VVC naturally has the aim of improving compression over HEVC by the traditional 50% target but also has important optimisations for more types of content such as 360 degree video and screen content such as video games.
To finish this first Vienna Video Tech Meetup, Christoph Prager lays out the reasons he thinks that everyone involved in online streaming should obsess about Video Startup Time. After defining that he means the time between pressing play and seeing the first frame of video. The longer that delay, the assumption is that the longer the wait, the more users won’t bother watching. To understand what video streaming should be like, he examines Spotify’s example who have always had the goal of bringing the audio start time down to 200ms. Christophe points to this podcast for more details on what Spotify has done to optimise this metric which includes activating GUI elements before, strictly speaking, they can do anything because the audio still hasn’t loaded. This, however, has an impact of immediacy with perception being half the battle.
“for every additional second of startup delay, an additional 5.8% of your viewership leaves”
Christophe also draws on Akamai’s 2012 white paper which, among other things, investigated how startup time puts viewers off. Christophe also cites research from Snap who found that within 2 seconds, the entirety of the audience for that video would have gone. Snap, of course, to specialise in very short videos, but taken with the right caveats, this could indicate that Akamai’s numbers, if the research was repeated today, may be higher for 2020. Christophe finishes up by looking at the individual components which go towards adding latency to the user experience: Player startup time, DRM load time, Ad load time, Ad tag load time.
Whilst there are plenty of videos explaining the basics streaming, few of them talk you through the basics of actually implementing a video player on your website. The principles taught in this hands-on Bitmovin webinar are transferable to many players, but importantly at the end of this talk you’ll have your own implementation of a video player which you can make in real time using their remix project at glitch.com which allows you to edit code and run it immediately in the browser to see your changes.
Ahead of the tutorial, the talk both explains the basics of compression and OTT led by Kieran Farr, Bitmovin’s VP of marketing and Andrea Fassina, Developer Evangelist. Andrea outlines a simplified OTT architecture where he looks at the ‘ingest’ stage which, in this example, is getting the videos from Instagram either via the API or manually. It then looks at the encoding step which compresses the input further and creates a range of different bitrates. Andrea explains that MPEG standards such as H.264, H.265 are commonly used to do this making the point that MPEG standards typically require royalty payments. This year, we are expecting to see VVC released by MPEG (H.266).
Andrea then explains the relationship between resolution, frame rate and file sizes. Clearly smaller files are better as they require less time to download leading to quicker downloads so faster startup times. Andrea discusses how the resolutions match the display resolutions with TVs having 1920×1080 resolution or 2160×3840 resolution. Given that higher resolutions have more picture detail, there is more information to be sent leading to larger file sizes.
Source: Bitmovin https://bit.ly/2VwStwC
When you come to set up your transcoder and player, there are a number of options you need to set. These are determined by these basics, so before launching into the code, Andrea looks further into the fundamental concepts. He next looks at video compression to explain the ways in which compression is achieved and the compromises within. Andrea starts from the first MJPEG codecs where each frame was its own JPEG image and they simply animated from one JPEG to another to show the video – not unlike animated GIFs used on the internet. However by treating each frame on its own ignores a lot of compression opportunity. When looking at one frame to the next, there are a lot of parts of the image which are the same or very similar. This allowed MPEG to step up their efforts and look across a number of frames to spot the similarities. This is typically referred to as temporal compression as is it uses time as part of the process.
In order to achieve this, MPEG splits all frames into blocks, squares in AVC, which are called macro blocks which be compared between frames. They then have 3 types of frame called ‘I’, ‘P’ and ‘B’ frames. The I frames have a complete description of that frame, similar to a JPEG photograph. P frames don’t have a complete description of the frame, rather they some blocks which have new information and some information saying that ‘this block is the same as this block in this other frame. B frames have no complete new image parts, but create the frame purely out of frames from the recent future and recent past; the B stands for ‘bi-directional’.
Ahead of launching into the code, we then look at the different video codecs available. He talks about AVC (discussed in detail here), HEVC (detailed in this talk) and compares the two. One difference is HEVC uses much more flexible macro block sizes. Whilst this increases computational complexity, it reduces the need to send redundant information so is an important part of the achieving the 50% bitrate reduction that HEVC typically shows over AVC. VP9 and AV1 complete the line-up as Andrea gives an overview of which platforms can support these different codecs.
Source: Bitmovin https://bit.ly/2VwStwC
Andrea then introduces the topic of Adaptive bitrate, ABR. This is vital in the effective delivery of video to the home or mobile phones where bandwidth varies over time. It requires creating several different renditions of your content at various bitrates, resolutions and even frame rate. Whilst these multiple encodes put a computational burden on the transcode stage, it’s not acceptable to allow a viewer’s player to go black, so it’s important to keep the low bitrate version. However there is a lot of work which can go into optimising the number and range of bitrates you choose.
Lastly we look at container formats such as MP4 used in both HLS and MPEG-DASH and is based on the file format ISO BMFF. Streaming MP4 is usually called fragmented MP4 (fMP4) as it is split up into chunks. Similarly MPEG2 Transport Streams (TS files) can be used as a wrapper around video and audio codecs. Andrea explains how the TS file is built up and the video, audio and other data such as captions are multiplexed together.
The last half of the video is the hands-on section during which Andrea talks us through how to implement a video player in realtime on the glitch project allowing you to follow along and do the same edits, seeing the results in your browser as you go. He explains how to create a list of source files, get the player working and styled correctly.
Views and opinions expressed on this website are those of the author(s) and do not necessarily reflect those of SMPTE or SMPTE Members.
This website is presented for informational purposes only. Any reference to specific companies, products or services does not represent promotion, recommendation, or endorsement by SMPTE