Video: Bandwidth Prediction for Multi-Bitrate Streaming at Low Latency

Low latency protocols like CMAF are wreaking havoc with traditional ABR algorithms. We’re having to come up with new ways of assessing if we’re running out of bandwidth. Traditionally, this is done by looking at how long a video chunk takes to download and comparing that with its playback duration. If you’re downloading at the same speed it’s playing, it’s time consider changing stream to a lower-bandwidth one.

As latencies have come down, servers will now start sending data from the beginning of a chunk as it’s being written which means it’s can’t be downloaded any quicker. To learn more about this, look at our article on ISO BMFF and this streaming primer. Since the file can’t be downloaded any quicker, we can’t ascertain if we should move up in bitrate to a better quality stream, so while we can switch down if we start running out of bandwidth, we can’t find a time to go up.

Ali C. Begen and team have been working on a way around this. The problem is that with the newer protocols, you pre-request files which start getting sent when they are ready. As such you don’t actually know the time the chunk starts downloading to you. Whilst you know when it’s finished, you don’t have access, via javascript, to when the file started being sent to you robbing you of a way of determining the download time.

Ali’s algorithm uses the time the last chunk finished downloading in place of the missing timestamp figuring that the new chunk is going to load pretty soon after the old. Now, looking at the data, we see that the gap between one chunk finishing and the next one starting does vary. This lead Ali’s team to move to a sliding window moving average taking the last 3 download durations into consideration. This is assumed to be enough to smooth out some of those variances and provides the data to allow them to predict future bandwidth and make a decision to change bitrate or not. There have been a number of alternative suggestions over the last year or so, all of which perform worse than this technique called ACTE.

In the last section of this talk, Ali explores the entry he was part of into a Twitch-sponsored competition to keep playback latency close to a second in test conditions with varying bitrate. Playback speed is key to much work in low-latency streaming as it’s the best way to trim off a little bit of latency when things are going well and allows you to buy time if you’re waiting for data; the big challenge is doing it without the viewer noticing. The entry used a heuristics and a machine learning approach which worked so well, they were runners up in the contest.

Watch now!
Speaker

Ali C. Begen
Ali C. Begen,
Technical Consultant, Comcast
Professor, Computer Science, Özyeğin University

Video: HLS and DASH Multi-Codec Encoding & Packaging

As we saw yesterday, there’s an increasingly buoyant market for video codecs and whilst this is a breath of fresh air after AVC’s multi-decade dominance, we will likely never again see a market which isn’t fragmented with several dominant players, say AV1, AVC, VVC and VP9, each sharing 85% market share relatively equally and then ‘the rest’ bringing up the rear. So multi-codec distribution to home viewers is going to have to deal with delivering different codecs to different people.

fuboTV do this today and Nick Krzemienski is here to tell us how. Starting with an overview of fuboTV primarily streams both live and on VOD. Nick shows us the workflow they use and then explains how their AVC & HEVC combined workflow is set up. Starting with the ideal case where a single fmp4 is encoded into both AVD and HEVC, he proposes you would simply package both into an HLS and DASH manifest and let players work out the rest. Depending on your players, you may have to split out your manifests into single-codec files.

DRM’s very important for a sports broadcaster so Nick looks at how this might be achieved. CMAF allows you to deliver m3u8 and mpd files using CENC (Common ENCryption). This promises a single DRM process ahead of packaging, but the reality, we hear from Nick, is that you’ll need two sets of media for HLS and DASH if you’re going to use CENC.

When you’re delivering multiple manifest and, hence, multiple sources, how do you manage this? Nick outlines, and shows the code, of how he achieves this at the edge. Using Lamda, he’s able to look at the incoming requests and existing files at the CDN to deliver the right asset with the logic done close to the viewer. Nick closes by with his thoughts on the future for streaming and answering questions from the audience.

Watch now!
Download the presentation
Speakers

Nick Krzemienski Nick Krzemienski
Engineering Lead, VOD Encoding & Operations, fuboTV
Maintainer & Editor, awesome.video
Dom Robinson Host: Dom Robinson
Director and Creative Firestarter, id3as
Contributing Editor, StreamingMedia.com, UK

Video: Tech Talks: Low-Latency Live Streaming

There are a number of techniques for achieving low-latency streaming. This talk is one of the few which introduces them in easy to understand ways and then puts them in context briefly showing the manifests or javascript examples of how these would be seen in the wild. Whilst there are plenty of companies who don’t need low-latency streaming, for many it’s a key part of their offering or it’s part of the business model itself. Knowing the techniques in play is to better understand internet streaming in general.

Jameson Steiner from Bitmovin starts by explaining why there is a motivation to cut the latency. One big motivation, aside from the standard live sports examples, is user-generated content like on Twitch where it’s very clear to the streamer, and quite off-putting, when there is large amounts of delay. Whilst delay can be adapted to, the more there is the less interaction is possible. In this situation, it’s the ‘handwaving’ latency that comes in to play. You want the hand on the screen to wave pretty much at the same time as your hand waves in front of the camera. Jameson places different types of distribution on a chart showing latency and we see that low-latency of 5 seconds or less will not only match traditional TV broadcasts, but also work well for live streamers.

Naturally, to fix a problem you need to understand the problem, so Jameson breaks down the legacy methods of delivery to show why the latency exists. The issue comes down to how video is split into sections, say 6 seconds, so that the player downloads a section at a time, reassembles and plays them. Looking from the player’s perspective, if the network suddenly broke or reduced its throughput, it makes sense to have several chunks in reserve. Having three 6-second chunks, a sensible precaution, makes you 18 seconds behind the curve from the off.

Clearly reducing the segement size is a winner in this scenario. Three 3 second segments will give you just 9 seconds latency; why not go to 1 second? Well encoding inefficiency is one reason. If you reduce the amount of time a temporal codec has of a video, its efficiency will drop and bitrate will increase to maintain quality. Jameson explains the other knock-on effects such as CDN inefficiencies and network requests. The standardised way to avoid these problems is to use CMAF (Common Media Application Format) which is based on MPEG DASH and ISO BMFF. CMAF, and DASH in general, has the benefit of coming from a standards body whose aim was to remove vendor lock-in that may be felt with HLS and was certainly felt with RTMP. Check out MPEG’s short white paper on the topic (zipped .docx file)

CMAF uses chunked transfer meaning that as the encoder writes the data to the disk, the web server sends it to the client. This is different to the default where a file is only sent after it’s been completely written. This has the effect of the not having to wait up to 6 seconds to a 6-second chunk to start being sent; the download time also needs to be counted. Rather, almost as soon as the chunk has been finished by the encoder, it’s arrived at the destination. This is a feature of HTTP 1.1 and after so is not new, but it still needs to be enabled and considered as part of the delivery.

CMAF goes beyond simple HTTP 1.1 chunked transfer which is a technique used in low-latency HLS, covered later, by creating extra structure within the 6-second segment (until now, called a chunk in this article). This extra structure allows the segment to be downloaded in smaller chunks decoupling the segment length from the player latency. Chunked transfer does cause a notable problem however which has not yet been conclusively solved. Jameson explains how traditionally each large segment typically arrives faster than realtime. By measuring how fast it arrives, given the player knows the duration, it can estimate the bandwidth available at that time on the network. With chunked transfer, as we saw, we are receiving data as it’s being created. By definition, we are now getting it in realtime so there is no opportunity to receive it any quicker. The bandwidth estimation element, as shown the presentation, is used to work out if the player needs to go down or could go up to another stream at a different bitrate – part of standard ABR. So the catastrophe here is the going down in latency has hampered our ability to switch bitrates and whilst the viewer can see the video close to real-time, who’s to say if they are seeing it at the best quality?

Low-Latency HLS/DASH is a way of extending DASH and HLS without using CMAF. Jameson explains some techniques such as advertising segments in advance to allow players to pre-request. It also relies on finding the compromise point of encoding inefficiency vs segment length, typically held to be around 2 seconds, to minimise the latency. At this point we start seeing examples of the techniques in manifests and javascript allowing us to understand how this is actually signalled and implemented.

Apple is on its second major revision of LL-HLS which has responded to many of the initial complaints from the community. Whilst it can use HTTP/2 to help push segments out, this caused problems in practice so it can now preload hints, as Jameson explains in order to remove round-trip times from requests. Jameson looks at the other of Apple’s techniques and shows how they look in manifest files.

The final section looks at problems in implementing these features such as chunks being fragmented across TCP packets, the bandwidth estimation question and dealing with playback speed in order to adjust the players position in time – speed-ups and slow-downs of 5 to 10% can be possible depending on content.

Watch now!
Download the presentation
Speaker

Jameson Steiner Jameson Steiner
Software Engineer,
Bitmovin

Video: Versatile Video Coding (VVC)

MPEG’s VVC is the next iteration along from HEVC (H.265). Whilst there are other codecs being finalised such as EVC and LCEVC, this talk looks at how VVC builds on HEVC, but also lends its hand to screen content and VR becoming a more versatile codec than HEVC, meeting the world’s changing needs. For an overview of these emerging codecs, this interview covers them all.

VVC is a joint project between ITU-T and MPEG (AKA ISO/IEC). Its aim is to create a 50% encoding efficiency in bitrate for the same quality picture, with the emphasis on higher resolutions, HDR and 10-bit video. At the same time, acknowledging that optimising codecs on natural video is no longer the core requirement for a lot of people. Its versatility comes from being able to encode screen content, independent sub-picture encoding, scalable encoding among others.

Gary Sullivan from Microsoft Technology & Research talks us through what all this means. He starts by outlining the case for a new codec, particularly the reach for another 50% bitrate saving which may come at further computational cost. Gary points out that, as video use continues to increase, anything that can be done to significantly reduce bitrates will either drive down costs or allow people to use video in better ways.

Any codec is a set of tools all working together to create the final product. Some tools are not always needed, say if you are running on a lower-power system, allowing the codec to be tuned for the situation. Gary puts up a list of some of the tools in VVC, many of which are an evolution of the same tool in HEVC, and highlights a few to give an insight into the improvements under the hood.

Gary’s pick of the big hitters in the tool-set are the Adaptive Loop Filter which reduces artefacts and prediction errors, affine motion compensation which provides better motion compensation, triangle partitioning mode which is a high-computation improvement in intra prediction, bi-directional optical flow (BIO) for motion prediction, intra-block copy which is useful for screen content where an identical block is found elsewhere in the same frame.

Gary highlights SCC, Screen Content Coding, which was in HEVC but not in the base profile, this has changed for VVC so all VVC implementations will have SCC whereas very few HEVC implementations do. Reference Picture Resampling (RPR) allows changing resolution from picture to picture where pictures can be stored at a different resolution from the current picture. And independent sub-pictures which allow parts of the video frame to be re-arranged or only for only one region to be decoded. This works well for VR, video conferencing and allows the creation of composite videos without intermediate decoding.

As usual, doing more thinking about how to compress a picture brings further computational demands. MPEG’s LCEVC is the standards body’s way of fighting against this, as notable bitrate improvements are possible even for low-power devices. With VVC, versatility is the aim, however. Decoders see a 60% increase in decode complexity. Whilst MPEG specifications are all about the decoder – hence allowing a lot of ongoing innovation in encoding techniques – current examples are about 8 or 9 times slower. Performance is better for screen content and on higher resolutions. Whilst the coding part of VVC is mature, versatility is still being worked on but the aim is to publishing within about 2 months.

The video finishes with a Q&A that covers implementing DASH into a low-latency video workflow. How CMAF will be specified to use VVC. Live workflows which Gary explains always come after the initial file-based work and is best understood after the first attempts at encoder implementations, noting that hardware lags by 2 years. He goes on to explain that chipmakers need to see the demand. At the moment, there is a lot of focus from implementors on AV1 by implementors, not to mention EVC, so the question is how much demand can be generated.

This talk is based on talk from Benjamin Bross originally given to an ITU workshop (PDF), then presented at Mile High Video by Benjamin and was updated by Gary for this conversation with the Seattle Video Tech community.

Bitmovin has an article highlighting many of the improvements in VVC written by Christian Feldmann who has given many talks on both AV1 and VVC.

Watch now!

Speakers

Gary Sullivan Gary Sullivan
Microsoft Technology & Research