Video: High-Efficiency Video Coding (HEVC) Primer

HEVC continues to gain adoption thanks to its bitrate savings over AVC (H.264), though much stands in the balance this year as AV1 continues to gain momentum and MPEG’s VVC is released. Both of which promise greater compression. Compression, however, is a compromise between encoding complexity (computation), quality and speed. HEVC stands on the shoulders of AVC and this video explains the techniques it uses to be better.

Christian Timmerer, co-founder of Bitmovin, builds on his previous video about AVC as he details the tools and capabilities of HEVC (all known as H.265). He summarises the performance of HEVC as providing twice as much compression for the same video quality (or getting better quality for a higher number of bits). Whilst it’s decoder requirements have gone up by 50%, it provides better parallelisation opportunities. Amongst the features that create this are variable block-size motion compensation, improved interpolation method and more directions for spatial prediction. Most of the improvements are specifically an expansion of the abilities laid out in AVC. For instance, making size or direction variable or providing more options.

After outlining some of the details behind the new capabilities, we look at the performance improvements of some HEVC implementations over AVC implementations showing up to a 65% improvement of bitrate averaging out at around 50%. Christian finishes by looking at the newer codecs coming out soon such as VVC, LCEVC

Watch now!
Speakers

Christian Timmerer Christian Timmerer
CIO & Cofounder, Bitmovin
Associate Professor, Universität Klagenfurt

Video: Advanced Video Coding Standards AVC

Whilst the encoding landscape is shifting, AVC (AKA H.264) still dominates many areas of video distribution so, for many, understanding what’s under the hood opens up a whole realm of diagnostics and fault finding that wouldn’t be possible without. Whilst many understand that MPEG video is built around I, B and P frames, this short talk offers deeper details which helps how it behaves both when it’s working well and otherwise.

Christian Timmerer, co-founder of Bitmovin, starts his lesson on AVC with the summary of improvements in AVC over the basic MPEG 2 model people tend to learn as a foundation. Improvements such as variable block size motion compensation, multiple reference frames and improved adaptive entropy coding. We see that, as we would expect the input can use 4:2:0 or 4:2:2 chroma sub-sampling as well as full 4:4:4 representation with 16×16 macroblocks for luminance (8×8 for chroma in 4:2:0). AVC can handle Pictures split into several slices which are self-contained sequences of macroblocks. Slices themselves can then be grouped.

Intra-prediction is the next topic where by an algorithm uses the information within the slice to predict a macroblock. This prediction is then subtracted from the actual block and coded thereby reducing the amount of data that needs to be transferred. The decoder can make the same prediction and reconstruct the full block from the data provided.

The next sections talk about motion prediction and the different sizes of macroblocks. A macroblock is a fixed area on the picture which can be described by a mixture of some basic patterns but the more complex the texture in the block, the more patterns need to be combined to recreate it. By splitting up the 16×16 block, we can often find a simpler way to describe the 8×8 or 8×16 shapes than if they had to encompass a whole 16×16 block.

 

B-frames are fairly well understood by many, but even if they are unfamiliar to you, Christian explains the concept whereby B-frames provide solely motion information of macroblocks both from frames before and after. This allows macroblocks which barely change to be ‘moved around the screen’ so to speak with minimal changes other than location. Whilst P and I frames provide new macroblocks, B-frames are intended just to provide this directional information. Christian explains some of the nuances of B-frame encoding including weighted prediction.

Quantisation is one of the most important parts of the MPEG process since quantisation is the process by which information is removed and the codec becomes lossy. Thus the way this happens, and the optimisations possible are key so Christian covers the way this happens before explaining the deblocking filter available. After splitting the picture up into so many macroblocks which are independently processed, edges between the blocks can become apparent so this filter helps smooth any artefacts to make them more pleasing to the eye. Christian finishes talking about AVC by exploring entropy encoding and thinking about how AVC encoding can and can’t be improved by adding more memory and computation to the encoder.

Watch now!
Speaker

Christian Timmerer Christian Timmerer
CIO & Cofounder, Bitmovin
Associate Professor, Universität Klagenfurt

Video: Video Compression Basics

Video compression is used everywhere we look. So often is it not practical to use uncompressed video, that everything in the consumer space video is delivered compressed so it pays to understand how this works, particularly if part of your job involves using video formats such as AVC, also known as H.264 or HEVC, AKA H.265.

Gisle Sælensminde from Vizrt takes us on this journey of creating compressed video. He starts by explaining why we need uncompressed video and then talks about containers such as MPEG-2 Transport Streams, mp4, MOV and others. He explains that the container’s job is partly to hold metadata such as the framerate, resolution and timestamps among a long list of other things.

Gisle takes some time to look at the past timeline of codecs in order to understand where we’re going from what went before. As many use the same principles, Gisle looks at the different type of frames inside most compressed formats – I, P and B frames which are used in set patterns known as GOPs – Group(s) of Pictures. A GOP defines how long is between I frames. In the talk we learn that I frames are required for a decoder to be able to tune in part way through a feed and still start seeing some pictures. This is because it’s the I frame which holds a whole picture rather than the other types o frame which don’t.

Colours are important, so Gisle looks at the way that colours are represented. Many people know about defining colours by looking at the values of Red, Green and Blue, but fewer about YUV. This is all covered in the talk so we know about conversion between the two types.

Almost synonymous with codecs such as HEVC and AVC are Macroblocks. This is the name given to the parts of the raster which have been spit up into squares, each of which will be analysed independently. We’ll look at who these macro blocks are used, but Gisle also spends some time looking to the future as both HEVC, VP9 and now AV1 use variable-size macro block analysis.

A process which happens throughout broadcast is chroma subsampling. This topic, whereby we keep more of the luminance channel than colours, is explored ahead of looking at DCTs – Discrete Cosine Transforms – which are foundational to most video codecs. We see that by analysing these macro blocks with DCTs. we can express the image in a different way and even cut down on some of the detail we get from DCTs in order to reduce the bitrate.

Before some very useful demos looking at the result of varying quantisation across a picture, the difference signal between the source and encoded picture plus deblocking technology to hide some of the artefacts which can arise from DCT-based codecs when they are pushed for bandwidth.

Gisle finishes this talk at Media City Bergen by taking a number of questions from the floor.

Watch now!
Speaker

Gisle Sælensminde Gisle Sælensminde
Senior Software Engineer,
Vizrt

Video: LCEVC – The Latest MPEG Standard

Video is so pervasive in our world that we need to move past thinking of codecs and compression being about reducing bitrate. That will always be a major consideration, but speed of compression and the computation needed can also be deal breakers. Millions of embedded devices need to encode video which don’t have the grunt available to the live AV1-encoding clusters in the cloud. Further more, the structure of the final data itself can be important for later processing and decoding. So we can see how use-cases can arise out needs of various industries, far beyond broadcast, which mean that codecs need to do more than make files small.

This year LCEVC from MPEG will be standardised. Called Low Complexity Enhancement Video Coding, this codec provides compression both where computing is constrained and where it is plentiful. Guido Meardi, CEO of V-Nova, talks us through what LCEVC is starting with a chart showing how computation has increased vastly as compression has improved. It’s this trend that this codec intends to put an end to by adding, Guido explains, an enhancement layer over some lower-resolution video. By encoding a lower-resolution, computational processing is minimised. When displayed, an enhancement layer allows this low resolution video to be sharpened again to bring it back to the original.

After demonstrating the business benefits, we see the block diagram of the encoder and decoder which helps visualise how this enhancement might be calculated and work. Guido then shows us what the enhancement layer looks like – a fairy flat image with lots of thin edges on it but, importantly, it also captures a lot of almost random detail which can’t be guessed by upsamplers. This, of course, is the point. If it were possible to upscale the low-resolution video and guess/infer all the data, then we would always do that. Rather, downscaling and upscaling is a lossy process. Here, that loss is worth it because of the computational gains and because the enhancement layer will put back much of what was once lost.

In order to demonstrate LCEVC’s ability, Guido shows graphs comparing LCEVC at UHD for x264 showing improvements of between 20 and 45% and image examples of artefacts which are avoided using LCEVC. We then see that when applied to AVC, HEVC and VVC it speeds up encodes at least two fold. Guido finishes this presentation showing how you can test out the encoder and decoder yourself.

The last segment of this video, Tarek Amara from Twitch sits down to talk with Guido about the codec and the background behind it. Their talk covers V-Nova’s approach to open source, licensing, LCEVC’s gradual improvements as it went through the proving process as part of MPEG standardisation plus questions from the floor.

Watch now!
Speakers

Guido Meardi Guido Meardi
CEO & Co-Founder,
V-Nova
Tarek Amara Tarek Amara
Principal Video Specialist,
Twitch