Video: Futuristic Codecs and a Healthy Obsession with Video Startup Time

These next 12 months are going to see 3 new MPEG standards being released. What does this mean for the industry? How useful will they be and when can we start using them? MPEG’s coming to the market with a range of commercial models to show it’s learning from the mistakes of the past so it should be interesting to see the adoption levels in the year after their release. This is part of the second session of the Vienna Video Tech Meetup and delves into startup time for streaming services.

In the first talk, Dr. Christian Feldmann explains the current codec landscape highlighting the ubiquitous AVC (H.264), UHD’s friend, HEVC (H.265), and the newer VP9 & AV1. The latter two differentiate themselves by being free to used and are open, particularly AV1. Whilst slow, both the latter are seeing increasing adoption in streaming, but no one’s suggesting that AVC isn’t still the go-to codec for most online streaming.

Christian then introduces the three new codecs, EVC (Essential Video Coding), LCEVC (Low-Complexity Enhancement Video Coding) and VVC (Versatile Video Coding) all of which have different aims. We start by looking at EVC whose aim is too replicate the encoding efficiency of HEVC, but importantly to produce a royalty-free baseline profile as well as a main profile which improves efficiency further but with royalties. This will be the first time that you’ve been able to use an MPEG codec in this way to eliminate your liability for royalty payments. There is further protection in that if any of the tools is found to have patent problems, it can be individually turned off, the idea being that companies can have more confidence in deploying the new technology.

The next codec in the spotlight is LCEVC which uses an enhancement technique to encode video. The aim of this codec is to enable lower-end hardware to access high resolutions and/or lower bitrates. This can be useful in set-top boxes and for online streaming, but also for non-broadcast applications like small embedded recorders. It can achieve a light improvement in compression over HEVC, but it’s well known that HEVC is very computationally heavy.

LCEVC reduces computational needs by only encoding a lower resolution version (say, SD) of the video in a codec of your choice, whether that be AVC, HEVC or otherwise. The decoder will then decode this and upscale the video back to the original resolution, HD in this example. This would look soft, normally, but LCEVC also sends enhancement data to add back in the edges and detail that would have otherwise been lost. This can be done in CPU whilst the other decoding could be done by the dedicated AVC/HEVC hardware and naturally encoding/decoding a quarter-resolution image is much easier than the full resolution.

Lastly, VVC goes under the spotlight. This is the direct successor to HEVC and is also known as H.266. VVC naturally has the aim of improving compression over HEVC by the traditional 50% target but also has important optimisations for more types of content such as 360 degree video and screen content such as video games.

To finish this first Vienna Video Tech Meetup, Christoph Prager lays out the reasons he thinks that everyone involved in online streaming should obsess about Video Startup Time. After defining that he means the time between pressing play and seeing the first frame of video. The longer that delay, the assumption is that the longer the wait, the more users won’t bother watching. To understand what video streaming should be like, he examines Spotify’s example who have always had the goal of bringing the audio start time down to 200ms. Christophe points to this podcast for more details on what Spotify has done to optimise this metric which includes activating GUI elements before, strictly speaking, they can do anything because the audio still hasn’t loaded. This, however, has an impact of immediacy with perception being half the battle.

“for every additional second of startup delay, an additional 5.8% of your viewership leaves”

Christophe also draws on Akamai’s 2012 white paper which, among other things, investigated how startup time puts viewers off. Christophe also cites research from Snap who found that within 2 seconds, the entirety of the audience for that video would have gone. Snap, of course, to specialise in very short videos, but taken with the right caveats, this could indicate that Akamai’s numbers, if the research was repeated today, may be higher for 2020. Christophe finishes up by looking at the individual components which go towards adding latency to the user experience: Player startup time, DRM load time, Ad load time, Ad tag load time.

Watch now!
Speakers

Christian Feldmann Dr. Christian Feldmann
Team Lead Encoding,
Bitmovin
Christoph Prager Christoph Prager
Product Manager, Analytics
Bitmovin
Markus Hafellner Markus Hafellner
Product Manager, Encoding
Bitmovin

Video: OTT Fundamentals & hands-on video player lab

Whilst there are plenty of videos explaining the basics streaming, few of them talk you through the basics of actually implementing a video player on your website. The principles taught in this hands-on Bitmovin webinar are transferable to many players, but importantly at the end of this talk you’ll have your own implementation of a video player which you can make in real time using their remix project at glitch.com which allows you to edit code and run it immediately in the browser to see your changes.

Ahead of the tutorial, the talk both explains the basics of compression and OTT led by Kieran Farr, Bitmovin’s VP of marketing and Andrea Fassina, Developer Evangelist. Andrea outlines a simplified OTT architecture where he looks at the ‘ingest’ stage which, in this example, is getting the videos from Instagram either via the API or manually. It then looks at the encoding step which compresses the input further and creates a range of different bitrates. Andrea explains that MPEG standards such as H.264, H.265 are commonly used to do this making the point that MPEG standards typically require royalty payments. This year, we are expecting to see VVC released by MPEG (H.266).

Andrea then explains the relationship between resolution, frame rate and file sizes. Clearly smaller files are better as they require less time to download leading to quicker downloads so faster startup times. Andrea discusses how the resolutions match the display resolutions with TVs having 1920×1080 resolution or 2160×3840 resolution. Given that higher resolutions have more picture detail, there is more information to be sent leading to larger file sizes.

Source: Bitmovin https://bit.ly/2VwStwC

When you come to set up your transcoder and player, there are a number of options you need to set. These are determined by these basics, so before launching into the code, Andrea looks further into the fundamental concepts. He next looks at video compression to explain the ways in which compression is achieved and the compromises within. Andrea starts from the first MJPEG codecs where each frame was its own JPEG image and they simply animated from one JPEG to another to show the video – not unlike animated GIFs used on the internet. However by treating each frame on its own ignores a lot of compression opportunity. When looking at one frame to the next, there are a lot of parts of the image which are the same or very similar. This allowed MPEG to step up their efforts and look across a number of frames to spot the similarities. This is typically referred to as temporal compression as is it uses time as part of the process.

In order to achieve this, MPEG splits all frames into blocks, squares in AVC, which are called macro blocks which be compared between frames. They then have 3 types of frame called ‘I’, ‘P’ and ‘B’ frames. The I frames have a complete description of that frame, similar to a JPEG photograph. P frames don’t have a complete description of the frame, rather they some blocks which have new information and some information saying that ‘this block is the same as this block in this other frame. B frames have no complete new image parts, but create the frame purely out of frames from the recent future and recent past; the B stands for ‘bi-directional’.

Ahead of launching into the code, we then look at the different video codecs available. He talks about AVC (discussed in detail here), HEVC (detailed in this talk) and compares the two. One difference is HEVC uses much more flexible macro block sizes. Whilst this increases computational complexity, it reduces the need to send redundant information so is an important part of the achieving the 50% bitrate reduction that HEVC typically shows over AVC. VP9 and AV1 complete the line-up as Andrea gives an overview of which platforms can support these different codecs.

Source: Bitmovin https://bit.ly/2VwStwC

Andrea then introduces the topic of Adaptive bitrate, ABR. This is vital in the effective delivery of video to the home or mobile phones where bandwidth varies over time. It requires creating several different renditions of your content at various bitrates, resolutions and even frame rate. Whilst these multiple encodes put a computational burden on the transcode stage, it’s not acceptable to allow a viewer’s player to go black, so it’s important to keep the low bitrate version. However there is a lot of work which can go into optimising the number and range of bitrates you choose.

Lastly we look at container formats such as MP4 used in both HLS and MPEG-DASH and is based on the file format ISO BMFF. Streaming MP4 is usually called fragmented MP4 (fMP4) as it is split up into chunks. Similarly MPEG2 Transport Streams (TS files) can be used as a wrapper around video and audio codecs. Andrea explains how the TS file is built up and the video, audio and other data such as captions are multiplexed together.

The last half of the video is the hands-on section during which Andrea talks us through how to implement a video player in realtime on the glitch project allowing you to follow along and do the same edits, seeing the results in your browser as you go. He explains how to create a list of source files, get the player working and styled correctly.

Watch now!
Download the presentation
Speakers

Kieran Farr Kieran Farr
VP of Marketing,
Bitmovin
Andrea Fassina Andrea Fassina
Developer Evangelist,
Bitmovin

Video: Advanced Video Coding Standards AVC

Whilst the encoding landscape is shifting, AVC (AKA H.264) still dominates many areas of video distribution so, for many, understanding what’s under the hood opens up a whole realm of diagnostics and fault finding that wouldn’t be possible without. Whilst many understand that MPEG video is built around I, B and P frames, this short talk offers deeper details which helps how it behaves both when it’s working well and otherwise.

Christian Timmerer, co-founder of Bitmovin, starts his lesson on AVC with the summary of improvements in AVC over the basic MPEG 2 model people tend to learn as a foundation. Improvements such as variable block size motion compensation, multiple reference frames and improved adaptive entropy coding. We see that, as we would expect the input can use 4:2:0 or 4:2:2 chroma sub-sampling as well as full 4:4:4 representation with 16×16 macroblocks for luminance (8×8 for chroma in 4:2:0). AVC can handle Pictures split into several slices which are self-contained sequences of macroblocks. Slices themselves can then be grouped.

Intra-prediction is the next topic where by an algorithm uses the information within the slice to predict a macroblock. This prediction is then subtracted from the actual block and coded thereby reducing the amount of data that needs to be transferred. The decoder can make the same prediction and reconstruct the full block from the data provided.

The next sections talk about motion prediction and the different sizes of macroblocks. A macroblock is a fixed area on the picture which can be described by a mixture of some basic patterns but the more complex the texture in the block, the more patterns need to be combined to recreate it. By splitting up the 16×16 block, we can often find a simpler way to describe the 8×8 or 8×16 shapes than if they had to encompass a whole 16×16 block.

 

B-frames are fairly well understood by many, but even if they are unfamiliar to you, Christian explains the concept whereby B-frames provide solely motion information of macroblocks both from frames before and after. This allows macroblocks which barely change to be ‘moved around the screen’ so to speak with minimal changes other than location. Whilst P and I frames provide new macroblocks, B-frames are intended just to provide this directional information. Christian explains some of the nuances of B-frame encoding including weighted prediction.

Quantisation is one of the most important parts of the MPEG process since quantisation is the process by which information is removed and the codec becomes lossy. Thus the way this happens, and the optimisations possible are key so Christian covers the way this happens before explaining the deblocking filter available. After splitting the picture up into so many macroblocks which are independently processed, edges between the blocks can become apparent so this filter helps smooth any artefacts to make them more pleasing to the eye. Christian finishes talking about AVC by exploring entropy encoding and thinking about how AVC encoding can and can’t be improved by adding more memory and computation to the encoder.

Watch now!
Speaker

Christian Timmerer Christian Timmerer
CIO & Cofounder, Bitmovin
Associate Professor, Universität Klagenfurt

Video: Non-standard Codecs with Standard WebRTC

WebRTC is used on a massive scale thanks to Facebook messenger and Google, but when it comes to video streaming services, some find its open source codec VP8 too restrictive. WebRTC is actively evolving to adapt and become codec agnostic though this work is ongoing. In the meantime, Comcast is here to show us there is a way to inject the codec of your choice into WebRTC.

Finding that many of their video capture devices, CCTV cameras and the like, had hardware AVC encoders, Bryan Meissner explains Comcast didn’t feel it had much of a choice in codec, therefore they looked for a way to make WebRTC to carry AVC.

While forcing an unsupported codec into a protocol wasn’t ideal, they were able to leave much of WebRTC unchanged. The RTP and Data channels were established as normal and peering continued to work as ever. With control of both the send and receive side, the team found they could pick out the data from the WebRTC stack ahead of the normal decoder and feed that into Exoplayer using its API. This allowed playback on Android devices. Bryan goes on to explain the approach for iOS and web browsers. As WebRTC is ‘baked in’ to browsers, there really are very few ways to change the signal flow.

At the end of the day, Comcast made this work and used it in production or many years, Jeff Cardillo explains as he wraps up this video. But he also takes time to talk through some of the problems. Having to bypass parts of a program with parts of another library does increase complexity. Not only does the code become more complex but the code becomes platform specific, you need control over the source and keeping the individual parts synchronously up to date can be a balancing act.

Jeff finishes this talk from Demuxed SF 2019 by elaborating on the mobile and browser tradeoffs at play.

Watch now!
Speakers

Bryan Meissner Bryan Meissner
Sr. Director Software Development and Engineering,
Comcast
Jeff Cardillo Jeff Cardillo
Principal Software Engineer,
Comcast Interactive Media