Video: Versatile Video Coding (VVC)

MPEG’s VVC is the next iteration along from HEVC (H.265). Whilst there are other codecs being finalised such as EVC and LCEVC, this talk looks at how VVC builds on HEVC, but also lends its hand to screen content and VR becoming a more versatile codec than HEVC, meeting the world’s changing needs. For an overview of these emerging codecs, this interview covers them all.

VVC is a joint project between ITU-T and MPEG (AKA ISO/IEC). Its aim is to create a 50% encoding efficiency in bitrate for the same quality picture, with the emphasis on higher resolutions, HDR and 10-bit video. At the same time, acknowledging that optimising codecs on natural video is no longer the core requirement for a lot of people. Its versatility comes from being able to encode screen content, independent sub-picture encoding, scalable encoding among others.

Gary Sullivan from Microsoft Technology & Research talks us through what all this means. He starts by outlining the case for a new codec, particularly the reach for another 50% bitrate saving which may come at further computational cost. Gary points out that, as video use continues to increase, anything that can be done to significantly reduce bitrates will either drive down costs or allow people to use video in better ways.

Any codec is a set of tools all working together to create the final product. Some tools are not always needed, say if you are running on a lower-power system, allowing the codec to be tuned for the situation. Gary puts up a list of some of the tools in VVC, many of which are an evolution of the same tool in HEVC, and highlights a few to give an insight into the improvements under the hood.

Gary’s pick of the big hitters in the tool-set are the Adaptive Loop Filter which reduces artefacts and prediction errors, affine motion compensation which provides better motion compensation, triangle partitioning mode which is a high-computation improvement in intra prediction, bi-directional optical flow (BIO) for motion prediction, intra-block copy which is useful for screen content where an identical block is found elsewhere in the same frame.

Gary highlights SCC, Screen Content Coding, which was in HEVC but not in the base profile, this has changed for VVC so all VVC implementations will have SCC whereas very few HEVC implementations do. Reference Picture Resampling (RPR) allows changing resolution from picture to picture where pictures can be stored at a different resolution from the current picture. And independent sub-pictures which allow parts of the video frame to be re-arranged or only for only one region to be decoded. This works well for VR, video conferencing and allows the creation of composite videos without intermediate decoding.

As usual, doing more thinking about how to compress a picture brings further computational demands. MPEG’s LCEVC is the standards body’s way of fighting against this, as notable bitrate improvements are possible even for low-power devices. With VVC, versatility is the aim, however. Decoders see a 60% increase in decode complexity. Whilst MPEG specifications are all about the decoder – hence allowing a lot of ongoing innovation in encoding techniques – current examples are about 8 or 9 times slower. Performance is better for screen content and on higher resolutions. Whilst the coding part of VVC is mature, versatility is still being worked on but the aim is to publishing within about 2 months.

The video finishes with a Q&A that covers implementing DASH into a low-latency video workflow. How CMAF will be specified to use VVC. Live workflows which Gary explains always come after the initial file-based work and is best understood after the first attempts at encoder implementations, noting that hardware lags by 2 years. He goes on to explain that chipmakers need to see the demand. At the moment, there is a lot of focus from implementors on AV1 by implementors, not to mention EVC, so the question is how much demand can be generated.

This talk is based on talk from Benjamin Bross originally given to an ITU workshop (PDF), then presented at Mile High Video by Benjamin and was updated by Gary for this conversation with the Seattle Video Tech community.

Bitmovin has an article highlighting many of the improvements in VVC written by Christian Feldmann who has given many talks on both AV1 and VVC.

Watch now!

Speakers

Gary Sullivan Gary Sullivan
Microsoft Technology & Research

Video: Advanced Video Coding Standards AVC

Whilst the encoding landscape is shifting, AVC (AKA H.264) still dominates many areas of video distribution so, for many, understanding what’s under the hood opens up a whole realm of diagnostics and fault finding that wouldn’t be possible without. Whilst many understand that MPEG video is built around I, B and P frames, this short talk offers deeper details which helps how it behaves both when it’s working well and otherwise.

Christian Timmerer, co-founder of Bitmovin, starts his lesson on AVC with the summary of improvements in AVC over the basic MPEG 2 model people tend to learn as a foundation. Improvements such as variable block size motion compensation, multiple reference frames and improved adaptive entropy coding. We see that, as we would expect the input can use 4:2:0 or 4:2:2 chroma sub-sampling as well as full 4:4:4 representation with 16×16 macroblocks for luminance (8×8 for chroma in 4:2:0). AVC can handle Pictures split into several slices which are self-contained sequences of macroblocks. Slices themselves can then be grouped.

Intra-prediction is the next topic where by an algorithm uses the information within the slice to predict a macroblock. This prediction is then subtracted from the actual block and coded thereby reducing the amount of data that needs to be transferred. The decoder can make the same prediction and reconstruct the full block from the data provided.

The next sections talk about motion prediction and the different sizes of macroblocks. A macroblock is a fixed area on the picture which can be described by a mixture of some basic patterns but the more complex the texture in the block, the more patterns need to be combined to recreate it. By splitting up the 16×16 block, we can often find a simpler way to describe the 8×8 or 8×16 shapes than if they had to encompass a whole 16×16 block.

 

B-frames are fairly well understood by many, but even if they are unfamiliar to you, Christian explains the concept whereby B-frames provide solely motion information of macroblocks both from frames before and after. This allows macroblocks which barely change to be ‘moved around the screen’ so to speak with minimal changes other than location. Whilst P and I frames provide new macroblocks, B-frames are intended just to provide this directional information. Christian explains some of the nuances of B-frame encoding including weighted prediction.

Quantisation is one of the most important parts of the MPEG process since quantisation is the process by which information is removed and the codec becomes lossy. Thus the way this happens, and the optimisations possible are key so Christian covers the way this happens before explaining the deblocking filter available. After splitting the picture up into so many macroblocks which are independently processed, edges between the blocks can become apparent so this filter helps smooth any artefacts to make them more pleasing to the eye. Christian finishes talking about AVC by exploring entropy encoding and thinking about how AVC encoding can and can’t be improved by adding more memory and computation to the encoder.

Watch now!
Speaker

Christian Timmerer Christian Timmerer
CIO & Cofounder, Bitmovin
Associate Professor, Universität Klagenfurt

Video: Let’s be hAV1ng you

AV1 is now in use for some YouTube feeds, Netflix also can deliver AV1 to Android devices so we are no longer talking about “if AV1 happens” or “when AV1’s finished”. AV1 is here to stay, but in a landscape of 3 new MPEG codecs, VVC, EVC and LCEVC, the question moves to “when is AV1 the right move?”

In this talk from Derek Buitenhuis, we delve behind the scenes of AV1 to see which AV1 terms can be, more-or-less, mapped to which MPEG terms. AV1 is promoted as a royalty-free codec, although notably a patent pool has appeared to try and claim money from users. Because it’s not reusing ideas from other technologies, the names and specific functions of parts of the spec are both not identical to other codecs, but are similar in function.

Derek starts by outlining some of the terms we need to understand before delving in further such as “Temporal Unit” which of course is called a TU and is analogous to a GOP. Then he moves on to highlight the many ways in which previous DCT-style work has been extended meaning the sizes and types of DCT have been increased, and the prediction modes have changed. All of this is possible but increases computation.

Derek then highlights several major tools which have been added. One is the prediction of the Chroma from the Luma signal. Another is the ‘Constrained Direction Enhancement Filter’ which improves the look of diagonal hard edges. The third is ‘switch frames’ which are similar to IDR frames or, as Derek puts it ‘a fancy P-frame.’ There is also a Multi-Symbolic Arithmetic Codec which is a method of guessing a future binary digit which, based on probability, allows you to encode a subset of the number but just enough to ensure that the algorithm will come out with the full number,

After talking about the Loop Restoration Filter Derek then critiques a BBC article which drew, it seems, incorrect conclusions based on not enabling the appropriate functions needed for good compression and also suggesting that there was not enough information provided for anyone else to replicate the experiment. Derek then finishes with MS-SIM plots of different encoders.

Watch now!
Download the slides.
Speaker

Derek Buitenhuis Derek Buitenhuis
Senior Video Encoding Engineer,
Vimeo

Video: Video Compression Basics

Video compression is used everywhere we look. So often is it not practical to use uncompressed video, that everything in the consumer space video is delivered compressed so it pays to understand how this works, particularly if part of your job involves using video formats such as AVC, also known as H.264 or HEVC, AKA H.265.

Gisle Sælensminde from Vizrt takes us on this journey of creating compressed video. He starts by explaining why we need uncompressed video and then talks about containers such as MPEG-2 Transport Streams, mp4, MOV and others. He explains that the container’s job is partly to hold metadata such as the framerate, resolution and timestamps among a long list of other things.

Gisle takes some time to look at the past timeline of codecs in order to understand where we’re going from what went before. As many use the same principles, Gisle looks at the different type of frames inside most compressed formats – I, P and B frames which are used in set patterns known as GOPs – Group(s) of Pictures. A GOP defines how long is between I frames. In the talk we learn that I frames are required for a decoder to be able to tune in part way through a feed and still start seeing some pictures. This is because it’s the I frame which holds a whole picture rather than the other types o frame which don’t.

Colours are important, so Gisle looks at the way that colours are represented. Many people know about defining colours by looking at the values of Red, Green and Blue, but fewer about YUV. This is all covered in the talk so we know about conversion between the two types.

Almost synonymous with codecs such as HEVC and AVC are Macroblocks. This is the name given to the parts of the raster which have been spit up into squares, each of which will be analysed independently. We’ll look at who these macro blocks are used, but Gisle also spends some time looking to the future as both HEVC, VP9 and now AV1 use variable-size macro block analysis.

A process which happens throughout broadcast is chroma subsampling. This topic, whereby we keep more of the luminance channel than colours, is explored ahead of looking at DCTs – Discrete Cosine Transforms – which are foundational to most video codecs. We see that by analysing these macro blocks with DCTs. we can express the image in a different way and even cut down on some of the detail we get from DCTs in order to reduce the bitrate.

Before some very useful demos looking at the result of varying quantisation across a picture, the difference signal between the source and encoded picture plus deblocking technology to hide some of the artefacts which can arise from DCT-based codecs when they are pushed for bandwidth.

Gisle finishes this talk at Media City Bergen by taking a number of questions from the floor.

Watch now!
Speaker

Gisle Sælensminde Gisle Sælensminde
Senior Software Engineer,
Vizrt