HEVC continues to gain adoption thanks to its bitrate savings over AVC (H.264), though much stands in the balance this year as AV1 continues to gain momentum and MPEG’s VVC is released. Both of which promise greater compression. Compression, however, is a compromise between encoding complexity (computation), quality and speed. HEVC stands on the shoulders of AVC and this video explains the techniques it uses to be better.
Christian Timmerer, co-founder of Bitmovin, builds on his previous video about AVC as he details the tools and capabilities of HEVC (all known as H.265). He summarises the performance of HEVC as providing twice as much compression for the same video quality (or getting better quality for a higher number of bits). Whilst it’s decoder requirements have gone up by 50%, it provides better parallelisation opportunities. Amongst the features that create this are variable block-size motion compensation, improved interpolation method and more directions for spatial prediction. Most of the improvements are specifically an expansion of the abilities laid out in AVC. For instance, making size or direction variable or providing more options.
After outlining some of the details behind the new capabilities, we look at the performance improvements of some HEVC implementations over AVC implementations showing up to a 65% improvement of bitrate averaging out at around 50%. Christian finishes by looking at the newer codecs coming out soon such as VVC, LCEVC
Whilst the encoding landscape is shifting, AVC (AKA H.264) still dominates many areas of video distribution so, for many, understanding what’s under the hood opens up a whole realm of diagnostics and fault finding that wouldn’t be possible without. Whilst many understand that MPEG video is built around I, B and P frames, this short talk offers deeper details which helps how it behaves both when it’s working well and otherwise.
Christian Timmerer, co-founder of Bitmovin, starts his lesson on AVC with the summary of improvements in AVC over the basic MPEG 2 model people tend to learn as a foundation. Improvements such as variable block size motion compensation, multiple reference frames and improved adaptive entropy coding. We see that, as we would expect the input can use 4:2:0 or 4:2:2 chroma sub-sampling as well as full 4:4:4 representation with 16×16 macroblocks for luminance (8×8 for chroma in 4:2:0). AVC can handle Pictures split into several slices which are self-contained sequences of macroblocks. Slices themselves can then be grouped.
Intra-prediction is the next topic where by an algorithm uses the information within the slice to predict a macroblock. This prediction is then subtracted from the actual block and coded thereby reducing the amount of data that needs to be transferred. The decoder can make the same prediction and reconstruct the full block from the data provided.
The next sections talk about motion prediction and the different sizes of macroblocks. A macroblock is a fixed area on the picture which can be described by a mixture of some basic patterns but the more complex the texture in the block, the more patterns need to be combined to recreate it. By splitting up the 16×16 block, we can often find a simpler way to describe the 8×8 or 8×16 shapes than if they had to encompass a whole 16×16 block.
B-frames are fairly well understood by many, but even if they are unfamiliar to you, Christian explains the concept whereby B-frames provide solely motion information of macroblocks both from frames before and after. This allows macroblocks which barely change to be ‘moved around the screen’ so to speak with minimal changes other than location. Whilst P and I frames provide new macroblocks, B-frames are intended just to provide this directional information. Christian explains some of the nuances of B-frame encoding including weighted prediction.
Quantisation is one of the most important parts of the MPEG process since quantisation is the process by which information is removed and the codec becomes lossy. Thus the way this happens, and the optimisations possible are key so Christian covers the way this happens before explaining the deblocking filter available. After splitting the picture up into so many macroblocks which are independently processed, edges between the blocks can become apparent so this filter helps smooth any artefacts to make them more pleasing to the eye. Christian finishes talking about AVC by exploring entropy encoding and thinking about how AVC encoding can and can’t be improved by adding more memory and computation to the encoder.
How can we overcome one of the last, big, problems in making CMAF generally available: making ABR work properly.
ABR, Adaptive Bitrate is a technique which allows a video player to choose what bitrate video to download from a menu of several options. Typically, the highest bitrate will have the highest quality and/or resolution, with the smallest files being low resolution.
The reason a player needs to have the flexibility to choose the bitrate of the video is mainly due to changing network conditions. If someone else on your network starts watching some video, this may mean you can no longer download video quick enough to keep watching in full quality HD and you may need to switch down. If they stop, then you want your player to switch up again to make the most of the bitrate available.
Traditionally this is done fairly simply by measuring how long each chunk of the video takes to download. Simply put, if you download a file, it will come to you as quickly as it can. So measuring how long each video chunk takes to get to you gives you an idea of how much bandwidth is available; if it arrives very slowly, you know you are close to running out of bandwidth. But in low-latency streaming, your are receiving video as quickly as it is produced so it’s very hard to see any difference in download times and this breaks the ABR estimation.
He starts by explaining how players currently behave with low-latency ABR showing how they miss out on changing to higher/lower renditions. Then he looks at the differences on the server and for the player between non-low-latency and low-latency streams. This lays the foundation to discuss ACTE – ABR for Chunked Transfer Encoding.
ACTE is a method of analysing bandwidth with the assumption that some chunks will be delivered as fast as the network allows and some won’t be. The trick is detecting which chunks actually show the network speed and Ali explains how this is done and shows the results of their evaluation.
Ali C. Begen
Technical Consultant and
Computer Science Professor
Subscribe to get daily updates
Views and opinions expressed on this website are those of the author(s) and do not necessarily reflect those of SMPTE or SMPTE Members.
This website is presented for informational purposes only. Any reference to specific companies, products or services does not represent promotion, recommendation, or endorsement by SMPTE