Video: Advanced Video Coding Standards AVC

Whilst the encoding landscape is shifting, AVC (AKA H.264) still dominates many areas of video distribution so, for many, understanding what’s under the hood opens up a whole realm of diagnostics and fault finding that wouldn’t be possible without. Whilst many understand that MPEG video is built around I, B and P frames, this short talk offers deeper details which helps how it behaves both when it’s working well and otherwise.

Christian Timmerer, co-founder of Bitmovin, starts his lesson on AVC with the summary of improvements in AVC over the basic MPEG 2 model people tend to learn as a foundation. Improvements such as variable block size motion compensation, multiple reference frames and improved adaptive entropy coding. We see that, as we would expect the input can use 4:2:0 or 4:2:2 chroma sub-sampling as well as full 4:4:4 representation with 16×16 macroblocks for luminance (8×8 for chroma in 4:2:0). AVC can handle Pictures split into several slices which are self-contained sequences of macroblocks. Slices themselves can then be grouped.

Intra-prediction is the next topic where by an algorithm uses the information within the slice to predict a macroblock. This prediction is then subtracted from the actual block and coded thereby reducing the amount of data that needs to be transferred. The decoder can make the same prediction and reconstruct the full block from the data provided.

The next sections talk about motion prediction and the different sizes of macroblocks. A macroblock is a fixed area on the picture which can be described by a mixture of some basic patterns but the more complex the texture in the block, the more patterns need to be combined to recreate it. By splitting up the 16×16 block, we can often find a simpler way to describe the 8×8 or 8×16 shapes than if they had to encompass a whole 16×16 block.


B-frames are fairly well understood by many, but even if they are unfamiliar to you, Christian explains the concept whereby B-frames provide solely motion information of macroblocks both from frames before and after. This allows macroblocks which barely change to be ‘moved around the screen’ so to speak with minimal changes other than location. Whilst P and I frames provide new macroblocks, B-frames are intended just to provide this directional information. Christian explains some of the nuances of B-frame encoding including weighted prediction.

Quantisation is one of the most important parts of the MPEG process since quantisation is the process by which information is removed and the codec becomes lossy. Thus the way this happens, and the optimisations possible are key so Christian covers the way this happens before explaining the deblocking filter available. After splitting the picture up into so many macroblocks which are independently processed, edges between the blocks can become apparent so this filter helps smooth any artefacts to make them more pleasing to the eye. Christian finishes talking about AVC by exploring entropy encoding and thinking about how AVC encoding can and can’t be improved by adding more memory and computation to the encoder.

Watch now!

Christian Timmerer Christian Timmerer
CIO & Cofounder, Bitmovin
Associate Professor, Universität Klagenfurt

Video: Let’s be hAV1ng you

AV1 is now in use for some YouTube feeds, Netflix also can deliver AV1 to Android devices so we are no longer talking about “if AV1 happens” or “when AV1’s finished”. AV1 is here to stay, but in a landscape of 3 new MPEG codecs, VVC, EVC and LCEVC, the question moves to “when is AV1 the right move?”

In this talk from Derek Buitenhuis, we delve behind the scenes of AV1 to see which AV1 terms can be, more-or-less, mapped to which MPEG terms. AV1 is promoted as a royalty free codec, although notably a patent pool has appeared to try and claim money from users. Because it’s not reusing ideas from other technologies, the names and specific functions of parts of the spec are both not identical to other codecs, but are similar in function.

Derek starts by outlining some of the terms we need to understand before delving in further such as “Temporal Unit” which of course is called a TU and is analogous to a GOP. Then he moves on to highlighting the many ways in which previous DCT-style work has been extended meaning the sizes and types of DCT have been increased, and the prediction modes have changed. All of this is possible but increases computation.

Derek then highlights several major tools which have been added. One is the prediction of the Chroma from the Luma signal. Another is the ‘Constrained Direction Enhancement Filter’ which improves the look of diagonal hard edges. The third is ‘switch frames’ which are similar to IDR frames or, as Derek puts it ‘a fancy P-frame.’ There is also a Multi-Symbolic Arithmetic Codec which is a method of guessing a future binary digit which, based on probability, allows you to encode a sub-set of the number but just enough to ensure that the algorithm will come out with the full number,

After talking about the Loop Restoration Filter Derek then critiques a BBC article which drew, it seems, incorrect conclusions based on not enabling the appropriate functions needed for good compression and also suggesting that there was not enough information provided for anyone else to replicate the experiment. Derek then finishes with MS-SIM plots of different encoders.

Watch now!
Download the slides.

Derek Buitenhuis Derek Buitenhuis
Senior Video Encoding Engineer,

Video: Video Compression Basics

Video compression is used everywhere we look. So often is it not practical to use uncompressed video, that everything in the consumer space video is delivered compressed so it pays to understand how this works, particularly if part of your job involves using video formats such as AVC, also known as H.264 or HEVC, AKA H.265.

Gisle Sælensminde from Vizrt takes us on this journey of creating compressed video. He starts by explaining why we need uncompressed video and then talks about containers such as MPEG-2 Transport Streams, mp4, MOV and others. He explains that the container’s job is partly to hold metadata such as the framerate, resolution and timestamps among a long list of other things.

Gisle takes some time to look at the past timeline of codecs in order to understand where we’re going from what went before. As many use the same principles, Gisle looks at the different type of frames inside most compressed formats – I, P and B frames which are used in set patterns known as GOPs – Group(s) of Pictures. A GOP defines how long is between I frames. In the talk we learn that I frames are required for a decoder to be able to tune in part way through a feed and still start seeing some pictures. This is because it’s the I frame which holds a whole picture rather than the other types o frame which don’t.

Colours are important, so Gisle looks at the way that colours are represented. Many people know about defining colours by looking at the values of Red, Green and Blue, but fewer about YUV. This is all covered in the talk so we know about conversion between the two types.

Almost synonymous with codecs such as HEVC and AVC are Macroblocks. This is the name given to the parts of the raster which have been spit up into squares, each of which will be analysed independently. We’ll look at who these macro blocks are used, but Gisle also spends some time looking to the future as both HEVC, VP9 and now AV1 use variable-size macro block analysis.

A process which happens throughout broadcast is chroma subsampling. This topic, whereby we keep more of the luminance channel than colours, is explored ahead of looking at DCTs – Discrete Cosine Transforms – which are foundational to most video codecs. We see that by analysing these macro blocks with DCTs. we can express the image in a different way and even cut down on some of the detail we get from DCTs in order to reduce the bitrate.

Before some very useful demos looking at the result of varying quantisation across a picture, the difference signal between the source and encoded picture plus deblocking technology to hide some of the artefacts which can arise from DCT-based codecs when they are pushed for bandwidth.

Gisle finishes this talk at Media City Bergen by taking a number of questions from the floor.

Watch now!

Gisle Sælensminde Gisle Sælensminde
Senior Software Engineer,

Video: A Forensic Approach to Video

Unplayable media is everyone’s nightmare, made all the worse if it could be key evidence in a crimnial case. This is daily fight that Gareth Harbord from the Metropolitan Police has as he tries to render old CCTV footage and files from crashed dash cams playable, files from damaged SD cards and hard drives readable and recover video from old tape formats which have been obselete for years.

In terms of data recovery, there are two main elments: Getting the data off the device and then fixing the data to make it playable. Getting the data off a device tends to be difficult because either the device is damaged and/or connecting to the device requires some proprietary hardware/software which simply isn’t available any more. Pioneers in a field often have to come up with their own way of interfacing which, when the market becomes bigger, is often then improved by a standard way of doing things. Take, as an example, mobile phone cables. They used to be all sorts of shapes and sizes but are now much more uniform with 3 main types. The same was initially true with hard drives, however the first hard drives were so long ago that osolecence is much more of an issue.

Once you have the data on your own system, it’s then time to start analysing it to see why it won’t play. It may play because the data itself is of an old or proprietary format, which Gareth says is very common with CCTV manufacturers. While there are some poular formats, there are many variations from different companies including putting all, say, 4 cameras onto one image or into one file, running the data for the four cameras in parallel. After a while, you start to be able to get a feel for the formats but not without many hours of previous trial and error.

Gareth starts his talk explaining that he works in the download and data receovery function which is different from the people who make the evidence ready for presentation in a trial. Their job is to find the best way to show the relevant parts both in terms of presentation but also technically making sure it is easy to play for the technically uninitiated in court and that it is robust and reliable. Presentation covers the effort behind combining multiple sources of video evidence into one timeline and ensuring the correct chronology. Other teams also deal with enhancing the video and Gareth shows examples of deblurring an image and also using frame averaging to enhance the intelligability of the picture.

Gareth spends some time discussing CCTV where he calls the result of the lack of standardisation “a myriad of madness.” He says it’s not uncommon to have 15-year-old systems which are brought in but, since the hard drives have been spinning for one and half decades, don’t start again when they are repowered. On the otherhand the newer IP cameras are more complicated whereby each camera is generating its own time-stampped video going into a networked video recorder which also has a timestamp. What happens when all of the timestamps disagree?

Mobile devices cause problems due to variable frame rates which are used to deal with dim scenes, non-conformance with standards and who can forget the fun of CMOS videos where the CMOS sensors lead to wobbling of the image when the phone is panned left or right. Gareth highlights a few of the tools he and his colleagues use such as the ever-informative MediaInfo and FFProbe before discussing the formats that they transode to in order to share the videos internally.

Gareth walks us through an example file looking at the how data can be lined up to start understanding the structure and start to decode it. This can lead to the need to write some simple code in C#, or similar, to rework the data. When it’s not possible to get hold of the data in a partiular format to be playable in VLC, or similar, a proprietary player may be the only way forward. When this is the case, often a capture of the computer screen is the only way to excerpt the clip. Gareth looks at the pros and cons of this method.

Watch now!

Gareth Harbord Gareth Harbord
Senior Digital Forensic Specialist (Video)
Metropolitan Police Service