Dolby Archives – The Broadcast Knowledge

Video: The New Video Codec Landscape – VVC, EVC, HEVC, LC-EVC, AV1 and more

Posted on 25th November 2020 by Russell Trafford-Jones

The codec arena is a lot more complex than before. Gone is the world of 5 years ago with AVC doing nearly everything. Whilst AVC is still a major force, we now have AV1 and VP9 being used globally with billions of uses a year, HEVC is not the force majeure it was once expected to be, but is now seeing significant use on iPhones and overall adoption continues to grow. And now, in 2020 we see three new codecs on the scene, VVC, EVC and LCEVC.

To help us make sense of this SMPTE has invited Walt Husak and Sean McCarthy to take us through what the current codecs are, what makes them different, how well they work, how to compare them and what the future roadmaps hold.

Sean starts by explaining which codecs are maintained by which bodies, with the IEC, ITU and MPEG being involved, not to mention the corporate codecs (VP8, and VP9 from Google) and the Chinese AVS series of codecs. Sean explains that these share major common elements and are each evolutions of each other. But why are all these codecs needed? Next, we see the use-cases that have brought these codecs into existence. Granted, AVC and HEVC entered the scene to reduce bitrate in an effort to make HD and UHD practical, respectively, but EVC and LC-EVC have different aims.

Sean gives a brief overview of the basics of encoding starting with partitioning the image, predicting parts of it, applying transformations, refining it (also known as applying ‘loop filters) and finishing with entropy codings. All of these blocks are briefly explained and exist in all the codecs covered in this talk. The evolutions which make the newer codecs better are therefore evolutions of each of these elements. For instance, explains Sean, splitting the image into different sections, known as partitioning, has become more sophisticated in recent codecs allowing for larger sections to be considered at once but, at the same time, smaller partitions created within each.

All codecs have profiles whereby the tools in use, or the complexity of their implementation, is standardised for certain types of video: 8-bit, 10-bit, HDR etc. This allows hardware implementers to understand the upper bounds of computation so they don’t end up over-provisioning hardware resources and increasing the cost. Sean looks at how VVC uses the same tools throughout all of its four profiles with only a few exceptions. Screen content sees two extra tools come for 4:2:2 formats and above. AV1 has the same tools throughout all the profiles but, deliberately, EVC doesn’t. Essential Video Coding has a royalty-free base layer that uses techniques that are not subject to any use payments. Using this layer gives you AVC-quality encoding, approximately. Using the main profile, however, gets you similar to HEVC encoding albeit with royalty payments.

The next part of the talk examines two main reasons for the increase in compression over recent codec generation, block size and partitioning, before highlighting some new tools in VVC and AV1. Block size refers to the size of the blocks that an image is split up into for processing. By using a larger block, the algorithms can spot patterns more efficiently so the continued increase from 16×16 in AVC to 128×128 now in VVC drives an increase in computation but also in compression. Once you have your block, splitting it up following the features of the images is the next stage. Called partitioning, we see the number of ways that the codecs can mathematically split a block has grown significantly. VVC can also partition chroma separately to luma. VVC and AV1 also include 64 and 16 ways, respectively, to diagonally partition rather than the typical vertical and horizontal partitioning modes.

Screen content coding tools are increasingly important, pandemics aside, there has long been growth in the amount of computer-generated content being shared online whether that’s through esports, video conference screen sharing or elsewhere. Truth be told, HEVC has support for screen-content encoding but it’s not in the main profile so many implementations don’t support it. VVC not only evolves the screen-content tools, but it also makes it present as default. AV1, also, was designed to work well with screen content. Sean takes some time to look at the IBC tool, intra-block copy, which allows the encoder to relate parts of the current frame to other sections. Working at the prediction stage, with screen content that contains, for instance, lots of text, parts of that text will look similar and to a first approximation, one part of the image can be duplicated in another. This is similar to motion compensation where a macroblock is ‘copied’ to another frame in a different position, but all the work is done on the present frame for Intra BC. Palette mode is another screen content tool that allows the colour of a section of the image to be described as a palette of colours rather than using the full RGB value for each and every pixel.

Sean covers the scaled prediction between resolutions in VVC and super-resolution in AV1, VVC’s 360-degree video optimisations and luma mapping before handing over to Walt Husak who goes into more detail on how the newer codecs work, starting with LCEVC.

LCEVC is a codec that improves the performance of already-deployed codecs, typically used to enhance spatial resolution. If you wanted to encode HD, the codec would downsample the HD to an SD resolution and encode that with AVC, HEVC or another codec. At the same time, it would upsample that encoded video again and generate two correction layers that correct for artefacts and add sharpness. This information is added into the base codec and sent to the decoder. This can allow a software-only enhancement to a hardware deployment fully utilising the hardware which has already been deployed. Walt notes that the enhancement layers are much the same technology as has already been standardised by SMPTE as VC6 (ST 2117). LCEVC has been found to be computationally efficient allowing it to address markets such as embedded devices where hardware restrictions would otherwise prohibit the use of higher resolutions than for which it was originally designed. Very low bitrate performance is also very good.

Sean introduces us to his “Dos and Don’ts” of codec comparisons. The theme running through them is to take care that you are comparing like for like. Codecs can be set to run ‘fast’ or ‘slow’ each of which holds its own compromises in terms of encoding time and resulting quality. Similarly, there are some implementations that are made simply to implement the standard as rigorously as possible which is an invaluable tool when developing the codec or an implementation. Such a reference implementation for codec X, clearly, shouldn’t be compared to production implementations of a codec Y as the times are guaranteed to be very different and you will not learn anything from the process. Similarly, there are different tools that give codecs much more time to optimise known as single- and double-pass which shouldn’t be cross-compared.

The talk draws to a close with a look at codec performance. Sean shows a number of graphs showing how VVC performs against HEVC. Interestingly the metrics clearly show a 40% increase in efficiency of VVC over HEVC, but when seen in subjective tests, the ratings show a 50% improvement. VVC’s encoder is approximately 10x as complex as HEVC’s.

HEVC and AV1 perform similarly for the same bit rate. Overall, Sean says, AV1 is a little blurrier in regions of spatial detail and can have some temporal flickering. HEVC is more likely to have blocking and ringing artefacts. EVC’s main profile is up to 29% better than HEVC. LCEVC performs up to 8% better than AVC when using an AVC base layer and also slightly better than HEVC when using an HEVC based codec. Sean makes the point that the AVC has been continually updated since its initial release and is now on version 27, so it’s not strictly true to simply say it’s an ‘old’ codec. HEVC similarly is on version 7. Sean runs down part of the roadmap for AVC which leads on to the use of AI in codecs.

Finishing the video, Walt looks at the use of Deep Learning in codecs. Deep learning is also known as machine learning and referred to as AI (Artificial Intelligence). For most people, these terms are interchangeable and refer to the ability of a signal to be manipulated not by a fixed equation or algorithm (such as Lanczos scaling) but by a computer that has been trained through many millions of examples to recognise what looks ‘right’ and to replicate that effect in new scenarios.

Walt talks about JPEG’s AI learning research on still images who are aiming to complete an ‘end-to-end’ study of compression with AI tools. There’s also MPEG’s Deep Neural Network-based Video Coding which is looking at which tools within codecs can be replaced with AI. Also, recently we have seen the foundation of the MPAI (Moving Picture, Audio and Data Coding by Artificial Intelligence) organisation by Leonardo Chiariglione, an industry body devoted to the use of AI in compression. With all this activity, it’s clear that future advances in compression will be driven by the increasing use of these techniques.

The video ends with a Q&A session.

Watch now!
Find out more on SMPTE’s site
Speakers

	Sean McCarthy Director, Video Strategy and Standards, Dolby Laboratories
	Walt Husak Director, Image Technologies, Dolby Laboratories

Video: Next-generation audio in the European market – The state of play

Posted on 12th November 2020 by Russell Trafford-Jones

Next-generation audio refers to a range of new technologies which allow for immersive audio like 3D sound, for increased accessibility, for better personalisation and anything which delivers a step-change in the lister experience. NGA technologies can stand on their own but are often part of next-generation broadcast technologies like ATSC 3.0 or UHD/8K transmissions.

This talk from the Sports Video Group and Dolby presents one case study from a few that have happened in 2020 which delivered NGA over the air to homes. First, though, Dolby’s Jason Power brings us up to date on how NGA has been deployed to date and looks at what it is.

Whilst ‘3D sound’ is an easy to understand feature, ‘increased personalisation’ is less so. Jason introduces ideas for personalisation such as choosing which team you’re interested in and getting a different crowd mix dependant on that. The possibilities are vast and we’re only just starting to experiment with what’s possible and determine what people actually want or to change where your mics are, on the pitch or in the stands.

What can I do if I want to hear next-generation audio? Jason explains that four out of five TVs are now shipping with NGA audio and all of the five top manufacturers have support for at least one NGA technology. Such technologies are Dolby’s AC-4 and sADM. AC-4 allows delivery of Dolby Atmos which is an object-based audio format which allows the receiver much more freedom to render the sound correctly based on the current speaker set up. Should you change how many speakers you have, the decoder can render the sound differently to ensure the ‘stereo’ image remains correct.

To find out more about the technologies behind NGA, take a look at this talk from the Telos Alliance.

Next, Matthieu Parmentier talks about the Roland Garros event in 2020 which was delivered using sADM plus Dolby AC-4. sADM is an open specification for metadata interchange, the aim of which is to help interoperability between vendors. The S-ADM metadata is embedded in the SDI and then transported uncompressed as SMPTE 302M.

ATEME’s Mickaël Raulet completes the picture by explaining their approach which included setting up a full end-to-end system for testing and diagnosis. The event itself, we see, had three transmission paths. An SDR satellite backup and two feeds into the DVB-T2 transmitter at the Eiffel Tower.

The session ends with an extensive Q&A session where they discuss the challenges they faced and how they overcame them as well as how their businesses are changing.

Watch now!
Speakers

	Jason Power Senior Director of Commercial Partnerships & Standards, Dolby
	Mickaël Raulet Vice President of Innovation, ATEME
	Matthieu Parmentier Head of Data & Artificial Intelligence France Television
	Moderator:Roger Charlesworth Charlesworth Media

Video: 5 Myths About Dolby Vision & HDR debunked

Posted on 10th September 2020 by Russell Trafford-Jones

There seem no let up in the number of technologies coming to market and whilst some, like HDR, have been slowly advancing on us for many years, the technologies that enable them such as Dolby Vision, HDR10+ and the metadata handling technologies further upstream are more recent. So it’s no surprise that there is some confusion over what’s possible and what’s not.

In this video, Bitmovin and Dolby the truth behind 5 myths surrounding the implementation and financial impact of Dolby Vision and HDR in general. Bitmovin sets the scene by with Sean McCarthy giving an overview on their research into the market. He explains why quality remains important, simply put to either keep up with competitors or be a differentiator. Sean then gives an overview of the ‘better pixels’ principle underlining that improving the pixels themselves is often more effective than higher resolution, technologies such as wide colour gamut (WCG) and HDR.

David Brooks then explains why HDR looks better, explaining the biology and psychology behind the effect as well as the technology itself. The trick with HDR is that there are no extra brightness values for the pixels, rather the brightness of each pixel is mapped onto a larger range. It’s this mapping which is the strength of the technology, altering the mapping gives different results, ultimately allowing you to run SDR and HDR workflows in parallel. David explains how HDR can be mapped down to low-brightness displays,

The last half of this video is dedicated to the myths. Each myth has several slides of explanation, for instance, the one suggests that the workflows are very complex. Hangen Last walks through a number of scenarios showing how dual (or even three-way) workflows can be achieved. The other myths, and the questions at the end, talk about resolution, licensing cost, metadata, managing dual SDR/HDR assets and live workflows with Dolby Vision.

Watch now!
Speakers

	David Brooks Senior Director, Professional Solutions, Dolby Laboratories
	Hagan Last Technology Manager, Content Distribution, Dolby Laboratories
	Sean McCarthy Senior Technical Product Marketing Manager, Bitmovin
	Moderator: Kieran Farr VP Marketing, Bitmovin

Video: Audio Metadata over IP

Posted on 6th August 2020 by Russell Trafford-Jones

Next-Generation Audio is gradually becoming this generation’s audio as new technologies seep into the mainstream. Dolby Atmos is one example of a technology which is being added to more and more services and which goes way beyond stereo and even 5.1 surround sound. But these technologies don’t just rely on audio, they need data, too to allow the decoders to understand the sound so they can apply the needed processing. It’s essential that this data, called metadata, keeps in step with the audio and, indeed, that it gets there in the first place.

Dolby have long used metadata along with surround sound to maintain the context in which the recording was mastered. There’s no way for the receiver to know what maximum audio level the recording was mixed to without being told, for instance. With NGA, the metadata needed can be much more complex. With Dolby Atmos, for example, the audio objects need position information along with the mastering information needed for surround sound.

Kent Terry from Dolby laboratories joins us to discuss the methods, both current and future, that we can use to convey metadata from point to point in the broadcast chain. He starts by looking at the tried and trusted methods of carrying data within the audio of SDI. This is the way that Dolby E and Dolby D are carried, as data within what appears to be an AES 3 stream. There are two SMPTE standards for this in a sample-accurate fashion, ST 2109 and ST 2116.

SMPTE 2109 allows for metadata to be carried over an AES 3 channel using SMPTE ST 337, 337 being the standard which defines how to put compressed audio over AES 3 which would normally expect PCM audio data. This allows for any metadata at all to be carried. SMPTE ST 2116, similarly, defines metadata transport over AES3 but specifically for ITU-R BS.2125 and BS.2076 which define how to carry the Audio Definition Model.

The motivation for these standards is to enable live workflows which don’t have a great way of delivering live metadata. There are a few types of metadata which are worth considering. Static metadata, which doesn’t change during the programme such as the number of channels or the sample rate. Dynamic metadata such as spacial location and dialogue levels. And importantly optional metadata and required metadata, the latter being essential for the functioning of the technology.

Kent says that live productions are held back in their choice of NGA technologies by the limitations of metadata carriage and this is one reason that work is being done in the IP space to create similar standards for all-IP programme production.

For IP there are two approaches. The first is to define a way to send metadata separately to the AES67 audio which is found within SMPTE ST 2110-30, which is done with the new AES standard AES-X242. The other way being developed is using SMPTE 2110-41 which allows for any metadata (not solely ST 292) to be carried in a time-synchronised way with the other 2110 essences. Both of these methods, Kent explains are actively being developed and are open to input from users.

Watch now!
Speaker

Kent Terry
Snr. Manager, Sound Technology, Office of the CTO,
Dolby Laboratories

Subscribe to get daily updates