Video: Deep Neural Networks for Video Coding

We know AI is going to stick around. Whether it’s AI, Machine Learning, Deep Learning or by another name, it all stacks up to the same thing: we’re breaking away from fixed algorithms where one equation ‘does it all’ to a much more nuanced approached with a better result. This is true across all industries. Within the Broadcast industry, one way it can be used is in video and audio compression. Want to make an image smaller? Downsample it with a Convolutional Neural Network and it will look better than Lanczos. No surprise, then, that this is coming in full force to a compression technology near you.

In this talk from Comcast’s Dan Grois, we hear the ongoing work to super-charge the recently released VVC by replacing functional blocks with neural-networks-based technologies. VVC has already achieved 40-50% improvements over HEVC. From the work Dan’s involved with, we hear that more gains are looking promising by using neural networks.

Dan explains that deep neural networks recognise images in layers. The brain does the same thing having one area sensitive to lines and edges, another to objects, another part of the brain to faces etc. A Deep Neural Network works in a similar way.
 

 

During the development of VVC, Dan explains, neural network techniques were considered but deemed too memory- or computationally-intensive. Now, 6 years on from the inception of VVC, these techniques are now practical and are likely to result in a VVC version 2 with further compression improvements.

Dan enumerates the tests so far swapping out each of the functional blocks in turn: intra- and inter-frame prediction, up- and down-scaling, in-loop filtering etc. He even shows what it would look like in the encoder. Some blocks show improvements of less than 5%, but added together, there are significant gains to be had and whilst this update to VVC is still in the early stages, it seems clear that it will provide real benefits for those that can implement these improvements which, Dan highlights at the end, are likely to require more memory and computation than the current version VVC. For some, this will be well worth the savings.

Watch now!
Speaker

Dan Grois Dan Grois
Principal Researcher,
Comcast

Video: Player Optimisations

If you’ve ever tried to implement your own player, you’ll know there’s a big gap between understanding the HLS/DASH spec and getting an all-round great player. Finding the best, most elegant, ways of dealing with problems like buffer exhaustion takes thought and experience. The same is true for low-latency playback.

Fortunately, Akamai’s Will Law is here to give us the benefit of his experience implementing his own and helping customers monitor the performance of their players. At the end of the day, the player is the ‘kingpin’ of streaming, comments Will. Without it, you have no streaming experience. All other aspects of the stream can be worked around or mitigated, but if the player’s not working, no one watches anything.

Will’s first tip is to implement ‘segment abandonment’. This is when a video player foresees that downloading the current segment is taking too long; if it continues, it will run out of video to play before the segment has arrived. A well-programmed player will sport this and try to continue the download of this segment from another server or CDN. However, Will says that many will simply continue to wait for the download and, in the meantime, the download will fail.

Tip two is about ABR switching in low-latency, chunked transfer streams. The playback buffer needs to be longer than the chunk duration. Without this precaution, there will not be enough time for the player to make the decision to switch down layers. Will shows a diagram of how a 3-second playback buffer can recover as long as it uses 2-second segments.

Will’s next two suggestions are to put your initial chunk in the manifest by base64-encoding it. This makes the manifest larger but removes the round-trip which would otherwise be used to request the chunk. This can significantly improve the startup performance as the RTT could be a quarter of a second which is a big deal for low-latency streams and anyone who wants a short time-to-play. Similarly, advises Will, make those initial requests in parallel. Don’t wait for the init file to be downloaded before requesting the media segment.

Whilst many of points in this talk focus on the player itself, Will says it’s wise for the player to provide metrics back to the CDN, hidden in the request headers or query args. This data can help the CDN serve media smarter. For instance, the player could send over the segment duration to the CDN. Knowing how long the segment is, the CDN can compare this to the download time to understand if it’s serving the data too slow. Perhaps the simplest idea is for the player to pass back a GUID which the CDN can put in the logs. This helps identify which of the millions of lines of logs are relevant to your player so you can run your own analysis on a player-by-player level.

Will’s other points include advice on how to avoid starting playing at the lowest bandwidth and working up. This doesn’t look great and is often unnecessary. The player could run its own speed test or the CDN could advise based on the initial requests. He advises never trusting the system clock; use an external clock instead.

Regarding playback latency, it pays to be wise when starting out. If you blindly start an HLS stream, then your latency will be variable within the duration of a segment. Will advocates HEAD requests to try to see when the next chunk is available and only then starting playback. Another technique is to vary your playback rate o you can ‘catch up’. The benefit of using rate adjustment is that you can ask all your players to be at a certain latency behind realtime so they can be close to synchronous.

Two great tips which are often overlooked: Request multiple GOPs at once. This helps open up the TCP windows giving you a more efficient download. For mobile, it can also help the battery allowing you to more efficiently cycle the radio on and off. Will mentions that when it comes to GOPs, for some applications its important to look at exactly how long your GOP should be. Usually aligning it with an integer number of audio frames is the way to choose your segment duration.

The talk finishes with an appeal to move to using CMAF containers for streaming ask they allow you to deliver HLS and DASH streams from the same media segments and move to a common DRM. Will says that CBCS encrypted content is now becoming nearly all-pervasive. Finally, Will gives some tips on how players are best to analyse which CDN to use in multi-CDN environments.

Watch now!
Speaker

Will Law Will Law
Chief Architect,
Akamai

Video: The Future of SSAI on OTT Devices

Server-Side AD Insertion sounds like a sure-fire way to insert ads without ad-blockers noticing, but it’s not without problems – particularly on OTT devices plugged into the living room TV. As people are used to watching broadcast television on the TV, some of those expectations of broadcast TV are associated with whatever they watch on TV. The quick channel changing, low latency and constant quality are expected even if the viewer is watching a mini OTT streaming device plugged into HDMI input 2.

Phil Cluff from Mux looks at the challenges that devices other than computers throw up when using SSAI at this talk from Mile High Video. In general, OTT devices don’t have much memory or CPU power which renders Client-Side ad insertion impractical. SSAI can be achieved by manipulating the manifest or by rewriting timestamps on video segments. The latter damages the ability to cache chunks, so Phil explores the challenges of the former technique. On the surface, just swapping out some chunks by changing the manifest sounds simple looking at games consoles, smart TVs, streaming boxes and set-top boxes. Unsurprisingly streaming boxes like Apple TV and Roku boxes support the features needed to pull off SSAI fairly well. TVs fair less well, but those relying on Android tend to have workable solutions, explains Phil. The biggest hurdle is getting things working on set-top boxes of which there are thousands of variations, few of which support DRM and DASH well.

Phil examines the rollout of smart TVs finding that most are more than 3 years old which typically means they are on old firmware supporting features that existed when the TV was released but nothing more recent…such as supporting manifest manipulation. With this bleak picture, Phil attempts to ground us saying that we don’t need to deliver ads on all devices. Most services are able to find a core set of devices which form 80% or more of their viewership which means that supporting ads on devices outside of that core is unlikely ever to be profitable. And if it’s not profitable, is there any need to ever show ads on that device? Initially, it doesn’t feel right to deliver without ads to some devices, but if you look at the numbers, you may well find that your development time will never be paid back. An alternative solution is to deliver ads to these people by getting them to watch on Chromecasts you provide instead of on their STB which is a more common option than you may expect.

Phil finishes his talk looking at the future which includes a HbbTV spec specifically aimed around SSAI and a continued battle to find a reliable way to delivering and recording beacons for SSAI.

Watch now!
Speaker

Phil Cluff Phil Cluff
Streaming Architect,
Mux

Video: Codec Comparison from TCO and Compression Efficiency Perspective

AVC, now 16 years old, is long in the tooth but supported by billions of devices. The impetus to replace it comes from the drive to serve customers with a lower cost/base and a more capable platform. Cue the new contenders VVC and AV1 – not to mention HEVC. It’s no surprise they comptes better then AVC (also known as MPEG 4 and h.264) but do they deliver a cost efficient, legally safe codec on which to build a business?

Thierry Fautier has done the measurements and presents them in this talk. Thierry explains that the tests were done using reference code which, though unoptimised for speed, should represent the best quality possible from each codec and compared 1080p video all of which is reproduced in the IBC conference paper.

Licensing is one important topic as, by some, HEVC is seen as a failed codec not in terms of its compression but rather in the réticente by many companies to deploy it which has been due to the business risk of uncertain licensing costs and/or the expense of the known licensing costs. VVC faces the challenge of entering the market and avoiding these concerns which MPEG is determined to do.

Thierry concludes by comparing AVC against HEVC, AV1 and VVC in terms of deployment dates, deployed devices and the deployment environment. He looks at the challenge of moving large video libraries over to high-complexity codecs due to cost and time required to re-compress. The session ends with questions from the audience.
Watch now!
Speaker

Thierry Fautier Thierry Fautier
President-Chair at Ultra HD Forum,
VP Video Strategy, Harmonic