Video: Let’s be hAV1ng you

AV1 is now in use for some YouTube feeds, Netflix also can deliver AV1 to Android devices so we are no longer talking about “if AV1 happens” or “when AV1’s finished”. AV1 is here to stay, but in a landscape of 3 new MPEG codecs, VVC, EVC and LCEVC, the question moves to “when is AV1 the right move?”

In this talk from Derek Buitenhuis, we delve behind the scenes of AV1 to see which AV1 terms can be, more-or-less, mapped to which MPEG terms. AV1 is promoted as a royalty-free codec, although notably a patent pool has appeared to try and claim money from users. Because it’s not reusing ideas from other technologies, the names and specific functions of parts of the spec are both not identical to other codecs, but are similar in function.

Derek starts by outlining some of the terms we need to understand before delving in further such as “Temporal Unit” which of course is called a TU and is analogous to a GOP. Then he moves on to highlight the many ways in which previous DCT-style work has been extended meaning the sizes and types of DCT have been increased, and the prediction modes have changed. All of this is possible but increases computation.

Derek then highlights several major tools which have been added. One is the prediction of the Chroma from the Luma signal. Another is the ‘Constrained Direction Enhancement Filter’ which improves the look of diagonal hard edges. The third is ‘switch frames’ which are similar to IDR frames or, as Derek puts it ‘a fancy P-frame.’ There is also a Multi-Symbolic Arithmetic Codec which is a method of guessing a future binary digit which, based on probability, allows you to encode a subset of the number but just enough to ensure that the algorithm will come out with the full number,

After talking about the Loop Restoration Filter Derek then critiques a BBC article which drew, it seems, incorrect conclusions based on not enabling the appropriate functions needed for good compression and also suggesting that there was not enough information provided for anyone else to replicate the experiment. Derek then finishes with MS-SIM plots of different encoders.

Watch now!
Download the slides.
Speaker

Derek Buitenhuis Derek Buitenhuis
Senior Video Encoding Engineer,
Vimeo

Video: Video Compression Basics

Video compression is used everywhere we look. So often is it not practical to use uncompressed video, that everything in the consumer space video is delivered compressed so it pays to understand how this works, particularly if part of your job involves using video formats such as AVC, also known as H.264 or HEVC, AKA H.265.

Gisle Sælensminde from Vizrt takes us on this journey of creating compressed video. He starts by explaining why we need uncompressed video and then talks about containers such as MPEG-2 Transport Streams, mp4, MOV and others. He explains that the container’s job is partly to hold metadata such as the framerate, resolution and timestamps among a long list of other things.

Gisle takes some time to look at the past timeline of codecs in order to understand where we’re going from what went before. As many use the same principles, Gisle looks at the different type of frames inside most compressed formats – I, P and B frames which are used in set patterns known as GOPs – Group(s) of Pictures. A GOP defines how long is between I frames. In the talk we learn that I frames are required for a decoder to be able to tune in part way through a feed and still start seeing some pictures. This is because it’s the I frame which holds a whole picture rather than the other types o frame which don’t.

Colours are important, so Gisle looks at the way that colours are represented. Many people know about defining colours by looking at the values of Red, Green and Blue, but fewer about YUV. This is all covered in the talk so we know about conversion between the two types.

Almost synonymous with codecs such as HEVC and AVC are Macroblocks. This is the name given to the parts of the raster which have been spit up into squares, each of which will be analysed independently. We’ll look at who these macro blocks are used, but Gisle also spends some time looking to the future as both HEVC, VP9 and now AV1 use variable-size macro block analysis.

A process which happens throughout broadcast is chroma subsampling. This topic, whereby we keep more of the luminance channel than colours, is explored ahead of looking at DCTs – Discrete Cosine Transforms – which are foundational to most video codecs. We see that by analysing these macro blocks with DCTs. we can express the image in a different way and even cut down on some of the detail we get from DCTs in order to reduce the bitrate.

Before some very useful demos looking at the result of varying quantisation across a picture, the difference signal between the source and encoded picture plus deblocking technology to hide some of the artefacts which can arise from DCT-based codecs when they are pushed for bandwidth.

Gisle finishes this talk at Media City Bergen by taking a number of questions from the floor.

Watch now!
Speaker

Gisle Sælensminde Gisle Sælensminde
Senior Software Engineer,
Vizrt

Video: LCEVC – The Latest MPEG Standard

Video is so pervasive in our world that we need to move past thinking of codecs and compression being about reducing bitrate. That will always be a major consideration, but speed of compression and the computation needed can also be deal breakers. Millions of embedded devices need to encode video which don’t have the grunt available to the live AV1-encoding clusters in the cloud. Further more, the structure of the final data itself can be important for later processing and decoding. So we can see how use-cases can arise out needs of various industries, far beyond broadcast, which mean that codecs need to do more than make files small.

This year LCEVC from MPEG will be standardised. Called Low Complexity Enhancement Video Coding, this codec provides compression both where computing is constrained and where it is plentiful. Guido Meardi, CEO of V-Nova, talks us through what LCEVC is starting with a chart showing how computation has increased vastly as compression has improved. It’s this trend that this codec intends to put an end to by adding, Guido explains, an enhancement layer over some lower-resolution video. By encoding a lower-resolution, computational processing is minimised. When displayed, an enhancement layer allows this low resolution video to be sharpened again to bring it back to the original.

After demonstrating the business benefits, we see the block diagram of the encoder and decoder which helps visualise how this enhancement might be calculated and work. Guido then shows us what the enhancement layer looks like – a fairy flat image with lots of thin edges on it but, importantly, it also captures a lot of almost random detail which can’t be guessed by upsamplers. This, of course, is the point. If it were possible to upscale the low-resolution video and guess/infer all the data, then we would always do that. Rather, downscaling and upscaling is a lossy process. Here, that loss is worth it because of the computational gains and because the enhancement layer will put back much of what was once lost.

In order to demonstrate LCEVC’s ability, Guido shows graphs comparing LCEVC at UHD for x264 showing improvements of between 20 and 45% and image examples of artefacts which are avoided using LCEVC. We then see that when applied to AVC, HEVC and VVC it speeds up encodes at least two fold. Guido finishes this presentation showing how you can test out the encoder and decoder yourself.

The last segment of this video, Tarek Amara from Twitch sits down to talk with Guido about the codec and the background behind it. Their talk covers V-Nova’s approach to open source, licensing, LCEVC’s gradual improvements as it went through the proving process as part of MPEG standardisation plus questions from the floor.

Watch now!
Speakers

Guido Meardi Guido Meardi
CEO & Co-Founder,
V-Nova
Tarek Amara Tarek Amara
Principal Video Specialist,
Twitch

Video: AV1 at Netflix

Netflix have continually been pushing forward video compression and analysis because their assets are played so many times that every bit saved is real money saved. VMAF is a great example of Netflix’s desire to push the state of the art forward. Developed by Netflix and two universities, this new objective metric allowed them to better evaluate the quality of videos using computer analysis and has continued to be the foundation of their work since.

One use of VMAF has been to verify the results of Netflix’s Per-Shot Encoding method which alters encoding parameters for each shot of the film rather than using a fixed set of parameters for the whole film. The Broadcast Knowledge has featured talks on their previous technique, per-title encoding (among others).

AV1, however must be the most famous innovation that Neflix is behind. A founding member of the Alliance for Open Media (AoM), Netflix saw a need a for a better codec and by making an open one, which also played to the needs of other internet giants such as Google, was a good way to create a vibrant community around it driving submissions to the codec itself but also, it is hoped, in the implementation and adoption.

In this two-part talk, LiWei Guo starts off by explaining the ways in which AV1 will be used by Netflix. Since this talk took place, Netflix has started streaming in AV1 to Android clients. LiWei points out that AV1 supports 10-bit video as standard – a notable difference from other codecs like AVC and HEVC. This allows Netflix to use 10-bit without worrying about decoder compatibility and he shows examples of skies and water which are significantly by the use of 10-bit.

Another feature of AV1 is the Film Grain synthesis which seeks to improve encoding efficiency by removing the random film grain of the source during the encode process then inserting a similar random noise on top to recreate the same look and feel. As anything random can’t be predicted, noise such of this is very wasteful for a codec to try and encode, therefore it’s not <a surprise that this can result in as much as a 30% reduction in bitrate. Before concluding, LiWei briefly explains per-shot encoding then shows data showing the overall improvements.

Andrey Norkin, also from Netflix explains their work with Intel on the SVT-AV1 software video encoder which leverages Intel’s SVT technology, a framework optimised for Xeon chips for video encoding and analysis. Netflix’s motivations are to further increase adoption by delivering a data centre-ready, optimised encoder and to create an AV1 encoder they can use to support their own internal research activities (did someone say AV2?). SVT allows for parallelisation, important for any computer nowadays with so many cores available.

Finishing up, Andrey points us to the Github repository, lets us know the development statement (as of November 2019) and looks at the speed increases that have taken off, comparing SVT-AV1 against the reference libaom encoder.

Watch now!
Speakers

Andrey Norkin Andrey Norkin
Senior Research Scientist,
Netflix
LiWei Guo LiWei Guo
Senior Software Engineer,
Netflix