Video: Encoding Vs Compute Efficiency in Video Coding

Ioannis Katsavounidis from Facebook joins us to talk us through his work finding the best balance between computation and encoding. He explains how encoding has moved from real-time, hardware-based encoding in the late 80s and 1990s through to file encoding, chunk-based encoding and now shot-based encoding. Each of these stages has brought opportunities to speed up encoding, but there has always been a fundamental reason why encoding can’t simply be sped up by the advance of IT.

Moore’s law posits that every year, the number of transistors in chips doubles. Whilst this has continued to be true until recent years, transistors have always been a proxy for processing power. For many years now, the way to keep the computational ability of CPUs high has been not to increase clock-speed as it was twenty years ago, but to add cores to the chip. As each core acts as its own CPU, this gives the ability to execute code in parallel with a thread of code running separately on each core. Whilst 12-20 cores are typical for servers, there are CPUs which deliver up to 128 cores.

Ioannis explains why DCT-based codecs are resistant to multi-thread encoding by showing how some of the encoding decisions are based on the previously decoded video frame so the encoder needs to decode the video before it has the information it needs to make the next encode decisions. An example of this motion estimation where you need to understand what a macroblock looks like in order to detail if and how it can be moved to form part of the macroblock currently being encoded.

It turns out that some of the information you need to calculate can be found from the original video. Whilst this doesn’t provide full parallelisation, it does help in freeing some of the computation to be done in parallel thus reducing the length of time spent on the linear encoding stage. As the design of the codec itself is limited in its ability to be parallelised, the best way to speed up encoding has been to split up the original video and encode these, now separate, sections independently.

Speeding up video encoding has therefore focused on splitting up the video into different sections and encoding those in parallel rather than trying to parallelise the encoding itself due. Encoding each frame separately is one way to do this, but sacrifices encoding efficiency. Splitting each frame up into sections (tiles or slices) is another way, though this also sacrifices either quality or bitrate. The most successful encoding parallelisation has been chunked encoding. As streaming applications use chunks, typically around 2 seconds nowadays, there’s no reason not to just cut your video up into small sections and encode those separately; the whole of this video focuses on non-live video.

Direct link

If there’s a shot change in the middle of your chunk, this is likely to look very bad since the motion estimation will fail to produce good results and there may not be enough bitrate budget to compensate. Therefore it’s best to drop in an IDR frame at the shot change or to actually change your video chunks to match shot changes. Simply encoding these chunks in parallel would speed up the encoding, however, it misses an opportunity to optimise quality vs bitrate.

Ioannis explains an experiment to determine the best operating point for chunks. He does that by reminding us that all encoders have certain ‘speed’ settings which control how much computation, and therefore time, is required for each encode. The ‘very fast’ setting in x264 will encode at the highest speed possible, but the quality will be worse or a certain bitrate compared to the ‘very slow’ setting. Ioannis’s experiment encoded each chunk at every speed setting for a variety of resolutions and bitrates. Each encode was then analysed for quality using PSNR, MS-SSIM and VMAF.

From Ioannis’ work, we can see how the bitrate setting affects both the encode time and the quality and we can observe that the slower speeds tend to have minimal quality advantages for the significant extra time involved in the encoding. Each curve has a steep part and a shallow section with the transition between known as the ‘convex hull’. Choosing a setting on the convex hull portion of the line is the optimal balance between quality and encoding time and is where, says Ioannis, most people should aim to operate.

The talk finishes with a summary of the conclusions which can be drawn from this work looking at the use of convex-hull which we’ve just discussed, the best type of parallel processing, whether oversubscription of CPU cores is helpful or not and an interesting observation that it’s often the metrics which put a significant burden on encoding rather than the video encoding itself, particularly for lower resolutions.

Watch now!
Speakers

Ioannis Katsavounidis Ioannis Katsavounidis
Research Scientist,
Facebook

Video: Outlook on the future codec landscape

VVC has now been released, MPEG’s successor to HEVC. But what is it? And whilst it brings 50% bitrate savings over HEVC, how does it compare to other codecs like AV1 and the other new MPEG standards? This primer answers these questions and more.

Christian Feldmann from Bitmovin starts by looking at four of the current codecs, AVC, HEVC, VP9 and AV1. VP9 isn’t often heard about in traditional broadcast circles, but it’s relatively well used online as it’s supported on Android phones and brings bitrate savings over AVC. Google use VP9 on Youtube for compatible players and see a higher retention rate. Netflix and Twitch also use it. AV1 is also in use by the tech giants, though its use outside of those who built it (Netflix, Facebook etc.) is not yet apparent. Christian looks at the compatibility of the codecs, hardware decoding, efficiency and cost.

Looking now at the other upcoming MPEG codecs, Christian examines MPEG-5 Essential Video Coding (EVC) which has two profiles: Baseline and Main. The baseline profile only uses technologies which are old enough to be outside of patent claims. This allows you to use the codec without the concern that you may be asked for a fee from a patent holder who comes out of the woodwork. The main profile, however, does have patented technology and performs better. Businesses which wish to use this codec can then pay licences but if an unexpected patent holder appears, each individual tool in the codec can be disabled, allowing you to protect continue using, albeit without that technology. Whilst it is a shame that patents are so difficult to account for, this shows MPEG has taken seriously the situation with HEVC which famously has hundreds of licensable patents with over a third of eligible companies not part of a patent pool. EVC performs 32% better than AVC using the baseline profile and 25% better than HEVC with the main profile.

Next under the magnifying glass is Low Complexity Enhancement Video Coding (LCEVC). We’ve already heard about this on The Broadcast Knowledge from Guido, CEO of V-Nova who gave a deeper look at Demuxed 2019 and more recently at Streaming Media West. Whilst those are detailed talks, this is a great overview of the technology which is actually a hybrid approach to encoding. It allows you to take any existing codec such as AVC, AV1 etc. and put LCEVC on top of it. Using both together allows you to run your base encoder at a lower resolution (say HD instead of UHD) and then deliver to the decoder this low-resolution encode plus a small stream of enhancement information which the decoder uses to bring the video back up to size and add back in the missing detail. The big win here, as the name indicates, is that this method is very flexible and can take advantage of all sorts of available computing power in embedded technology as and in servers. In set-top boxes, parts of the SoC which aren’t used can be put to use. In phones, both the onboard HEVC decoding chip and the CPU can be used. It’s also useful in for automated workflows as the base codec stream is smaller and hence easier to decode, plus the enhancement information concentrates on the edges of objects so can be used on its own by AI/machine learning algorithms to more readily analyse video footage. Encoding time drops by over a third for AVC and EVC.

Now, Christian looks at the codec-du-jour, Versatile Video Codec (VVC), explaining that its enhancements over HEVC come not just from bitrate improvements but techniques which better encode screen content (i.e. computer games), allow for better 360 degree video and reduce delay. Subjective results show up to 50% improvement. For more detail on VVC, check out this talk from Microsoft’s Gary Sullivan.

The talk finishes with answers so audience questions: Which will be the winner, what future device & hardware support will be and which is best for real-time streaming.

Watch now!
Speakers

Christian Feldmann Christian Feldmann
Team lead, Encoding,
Bitmovin

Video: AV1 – A Reality Check

Released in 2018, AV1 had been a little over two years in the making at the Alliance of Open Media founded by industry giants including Google, Amazon, Mozilla, Netflix. Since then work has continued to optimise the toolset to bring both encoding and decoding down to real-world levels.

This talk brings together AOM members Mozilla, Netflix, Vimeo and Bitmovin to discus where AV1’s up to and to answer questions from the audience. After some introductions, the conversation turns to 8K. The Olympics are the broadcast industry’s main driver for 8K at the moment, though it’s clear that Japan and other territories aim to follow through with further deployments and uses.

“AV1 is the 8K codec of choice” 

Paul MacDougall, Bitmovin
 CES 2020 saw a number of announcements like this from Samsung regarding AV1-enabled 8K TVs. In this talk from Google, Matt Frost from Google Chrome Media explains how YouTube has found that viewer retention is higher with VP9-delivered videos which he attributes to VP9’s improved compression over AVC which leads to quicker start times, less buffering and, often, a higher resolution being delivered to the user. AV1 is seen as providing these same benefits over AVC without the patent problems that come with HEVC.

 
It’s not all about resolution, however, points out Paul MacDougall from BitMovin. Resolution can be useful, for instance in animations. For animated content, resolution is worth having because it accentuates the lines which add intelligibility to the picture. For some content, with many similar textures, grass, for instance, then quality through bitrate may be more useful than adding resolution. Vittorio Giovara from Vimeo agrees, pointing out that viewer experience is a combination of many factors. Though it’s trivial to say that a high-resolution screen of unintended black makes for a bad experience, it is a great reminder of things that matter. Less obviously, Vittorio highlights the three pillars of spatial, temporal and spectral quality. Temporal refers to upping the bitrate, spatial is, indeed, the resolution and spectral refers to bit-depth and colour-depth know as HDR and Wide Colour Gamut (WCG).

Nathan Egge from Mozilla acknowledges that in their 2018 code release at NAB, the unoptimized encoder which was claimed by some to be 3000 times slower than HEVC, was ’embarrassing’, but this is the price of developing in the open. The panel discusses the fact that the idea of developing compression is to try out approaches until you find a combination that work well. While you are doing that, it would be a false economy to be constantly optimising. Moreover, Netflix’s Anush Moorthy points out, it’s a different set of skills and, therefore, a different set of people who optimise the algorithms.

Questions fielded by the panel cover whether there are any attempts to put AV1 encoding or decoding into GPU. Power consumption and whether TVs will have hardware or software AV1 decoding. Current in-production AV1 uses and AVC vs VVC (compression benefit Vs. royalty payments).

Watch now!
Speakers

Vittorio Giovara Vittorio Giovara
Manager, Engineering – Video Technology
Vimeo
Nathan Egge Nathan Egge
Video Codec Engineer,
Mozilla
Paul MacDougall Paul MacDougall
Principal Sales Engineer,
Bitmovin
Anush Moorthy Anush Moorthy
Manager, Video and Image Encoding
Netflix
Tim Siglin Tim Siglin
Founding Executive Director
Help Me Stream, USA

Video: Futuristic Codecs and a Healthy Obsession with Video Startup Time

These next 12 months are going to see 3 new MPEG standards being released. What does this mean for the industry? How useful will they be and when can we start using them? MPEG’s coming to the market with a range of commercial models to show it’s learning from the mistakes of the past so it should be interesting to see the adoption levels in the year after their release. This is part of the second session of the Vienna Video Tech Meetup and delves into startup time for streaming services.

In the first talk, Dr. Christian Feldmann explains the current codec landscape highlighting the ubiquitous AVC (H.264), UHD’s friend, HEVC (H.265), and the newer VP9 & AV1. The latter two differentiate themselves by being free to used and are open, particularly AV1. Whilst slow, both the latter are seeing increasing adoption in streaming, but no one’s suggesting that AVC isn’t still the go-to codec for most online streaming.

Christian then introduces the three new codecs, EVC (Essential Video Coding), LCEVC (Low-Complexity Enhancement Video Coding) and VVC (Versatile Video Coding) all of which have different aims. We start by looking at EVC whose aim is too replicate the encoding efficiency of HEVC, but importantly to produce a royalty-free baseline profile as well as a main profile which improves efficiency further but with royalties. This will be the first time that you’ve been able to use an MPEG codec in this way to eliminate your liability for royalty payments. There is further protection in that if any of the tools is found to have patent problems, it can be individually turned off, the idea being that companies can have more confidence in deploying the new technology.

The next codec in the spotlight is LCEVC which uses an enhancement technique to encode video. The aim of this codec is to enable lower-end hardware to access high resolutions and/or lower bitrates. This can be useful in set-top boxes and for online streaming, but also for non-broadcast applications like small embedded recorders. It can achieve a light improvement in compression over HEVC, but it’s well known that HEVC is very computationally heavy.

LCEVC reduces computational needs by only encoding a lower resolution version (say, SD) of the video in a codec of your choice, whether that be AVC, HEVC or otherwise. The decoder will then decode this and upscale the video back to the original resolution, HD in this example. This would look soft, normally, but LCEVC also sends enhancement data to add back in the edges and detail that would have otherwise been lost. This can be done in CPU whilst the other decoding could be done by the dedicated AVC/HEVC hardware and naturally encoding/decoding a quarter-resolution image is much easier than the full resolution.

Lastly, VVC goes under the spotlight. This is the direct successor to HEVC and is also known as H.266. VVC naturally has the aim of improving compression over HEVC by the traditional 50% target but also has important optimisations for more types of content such as 360 degree video and screen content such as video games.

To finish this first Vienna Video Tech Meetup, Christoph Prager lays out the reasons he thinks that everyone involved in online streaming should obsess about Video Startup Time. After defining that he means the time between pressing play and seeing the first frame of video. The longer that delay, the assumption is that the longer the wait, the more users won’t bother watching. To understand what video streaming should be like, he examines Spotify’s example who have always had the goal of bringing the audio start time down to 200ms. Christophe points to this podcast for more details on what Spotify has done to optimise this metric which includes activating GUI elements before, strictly speaking, they can do anything because the audio still hasn’t loaded. This, however, has an impact of immediacy with perception being half the battle.

“for every additional second of startup delay, an additional 5.8% of your viewership leaves”

Christophe also draws on Akamai’s 2012 white paper which, among other things, investigated how startup time puts viewers off. Christophe also cites research from Snap who found that within 2 seconds, the entirety of the audience for that video would have gone. Snap, of course, to specialise in very short videos, but taken with the right caveats, this could indicate that Akamai’s numbers, if the research was repeated today, may be higher for 2020. Christophe finishes up by looking at the individual components which go towards adding latency to the user experience: Player startup time, DRM load time, Ad load time, Ad tag load time.

Watch now!
Speakers

Christian Feldmann Dr. Christian Feldmann
Team Lead Encoding,
Bitmovin
Christoph Prager Christoph Prager
Product Manager, Analytics
Bitmovin
Markus Hafellner Markus Hafellner
Product Manager, Encoding
Bitmovin