Video: Tech Talks: Low-Latency Live Streaming

There are a number of techniques for achieving low-latency streaming. This talk is one of the few which introduces them in easy to understand ways and then puts them in context briefly showing the manifests or javascript examples of how these would be seen in the wild. Whilst there are plenty of companies who don’t need low-latency streaming, for many it’s a key part of their offering or it’s part of the business model itself. Knowing the techniques in play is to better understand internet streaming in general.

Jameson Steiner from Bitmovin starts by explaining why there is a motivation to cut the latency. One big motivation, aside from the standard live sports examples, is user-generated content like on Twitch where it’s very clear to the streamer, and quite off-putting, when there is large amounts of delay. Whilst delay can be adapted to, the more there is the less interaction is possible. In this situation, it’s the ‘handwaving’ latency that comes in to play. You want the hand on the screen to wave pretty much at the same time as your hand waves in front of the camera. Jameson places different types of distribution on a chart showing latency and we see that low-latency of 5 seconds or less will not only match traditional TV broadcasts, but also work well for live streamers.

Naturally, to fix a problem you need to understand the problem, so Jameson breaks down the legacy methods of delivery to show why the latency exists. The issue comes down to how video is split into sections, say 6 seconds, so that the player downloads a section at a time, reassembles and plays them. Looking from the player’s perspective, if the network suddenly broke or reduced its throughput, it makes sense to have several chunks in reserve. Having three 6-second chunks, a sensible precaution, makes you 18 seconds behind the curve from the off.

Clearly reducing the segement size is a winner in this scenario. Three 3 second segments will give you just 9 seconds latency; why not go to 1 second? Well encoding inefficiency is one reason. If you reduce the amount of time a temporal codec has of a video, its efficiency will drop and bitrate will increase to maintain quality. Jameson explains the other knock-on effects such as CDN inefficiencies and network requests. The standardised way to avoid these problems is to use CMAF (Common Media Application Format) which is based on MPEG DASH and ISO BMFF. CMAF, and DASH in general, has the benefit of coming from a standards body whose aim was to remove vendor lock-in that may be felt with HLS and was certainly felt with RTMP. Check out MPEG’s short white paper on the topic (zipped .docx file)

CMAF uses chunked transfer meaning that as the encoder writes the data to the disk, the web server sends it to the client. This is different to the default where a file is only sent after it’s been completely written. This has the effect of the not having to wait up to 6 seconds to a 6-second chunk to start being sent; the download time also needs to be counted. Rather, almost as soon as the chunk has been finished by the encoder, it’s arrived at the destination. This is a feature of HTTP 1.1 and after so is not new, but it still needs to be enabled and considered as part of the delivery.

CMAF goes beyond simple HTTP 1.1 chunked transfer which is a technique used in low-latency HLS, covered later, by creating extra structure within the 6-second segment (until now, called a chunk in this article). This extra structure allows the segment to be downloaded in smaller chunks decoupling the segment length from the player latency. Chunked transfer does cause a notable problem however which has not yet been conclusively solved. Jameson explains how traditionally each large segment typically arrives faster than realtime. By measuring how fast it arrives, given the player knows the duration, it can estimate the bandwidth available at that time on the network. With chunked transfer, as we saw, we are receiving data as it’s being created. By definition, we are now getting it in realtime so there is no opportunity to receive it any quicker. The bandwidth estimation element, as shown the presentation, is used to work out if the player needs to go down or could go up to another stream at a different bitrate – part of standard ABR. So the catastrophe here is the going down in latency has hampered our ability to switch bitrates and whilst the viewer can see the video close to real-time, who’s to say if they are seeing it at the best quality?

Low-Latency HLS/DASH is a way of extending DASH and HLS without using CMAF. Jameson explains some techniques such as advertising segments in advance to allow players to pre-request. It also relies on finding the compromise point of encoding inefficiency vs segment length, typically held to be around 2 seconds, to minimise the latency. At this point we start seeing examples of the techniques in manifests and javascript allowing us to understand how this is actually signalled and implemented.

Apple is on its second major revision of LL-HLS which has responded to many of the initial complaints from the community. Whilst it can use HTTP/2 to help push segments out, this caused problems in practice so it can now preload hints, as Jameson explains in order to remove round-trip times from requests. Jameson looks at the other of Apple’s techniques and shows how they look in manifest files.

The final section looks at problems in implementing these features such as chunks being fragmented across TCP packets, the bandwidth estimation question and dealing with playback speed in order to adjust the players position in time – speed-ups and slow-downs of 5 to 10% can be possible depending on content.

Watch now!
Download the presentation
Speaker

Jameson Steiner Jameson Steiner
Software Engineer,
Bitmovin

Video: S-Frame in AV1: Enabling better compression for low latency live streaming.

Streaming is such a success because it manages to deliver video even as your network capacity varies while you are watching. Called ABR (Adaptive Bitrate), this short talk asks how we can allow low-latency streams to nimbly adapt to network conditions whilst keeping the bitrate low in the new AV1 codec.

Tarek Amara from Twitch explains the idea in AV1 of introducing S-Frames, sometimes called ‘switch frames’, which take the role of the more traditional I or IDR frames. If a frame is marked as an IDR frame, this means the decoder knows it can start decoding from this frame without worrying that it’s referencing some data that came before this frame. By doing this, you can allow frequent points at which a decoder can enter a stream. IDR frames are typically I frames which are the highest bandwidth frames, by a large proportion. This is because they are a complete rendition of a frame without any of the predictions you find in P and B frames.

Because IDR frames are so large, if you want to keep overall bandwidth down, you should reduce the number of them. However, reducing the number of frames reduces the number if ‘in points’ for for the stream meaning a decoder then has to wait longer before it can start displaying the stream to the viewer. An S-Frame brings the benefits of an IDR in that it still marks a place in the stream where the decoder can join, free of dependencies on data previously sent. But the S-Frame is takes up much less space.

Tarek looks at how an S-Frame is created, the parameters it needs to obey and explains how the frames are signalled. To finish off he presents tests run showing the bitrate improvements that were demonstrated.
Watch now!
Speaker

Tarek Amara Tarek Amara
Engineering Manager, Video Encoding,
Twitch

Video: DASH: from on-demand to large scale live for premium services

A bumper video here with 7 short talks from VideoLAN, Will Law and Hulu among others, all exploring the state of MPEG DASH today, the latest developments and the hot topics such as low latency, ad insertion, bandwidth prediction and one red letter feature of DASH – multi-DRM.

The first 10 minutes sets the scene introducing the DASH Industry Forum (DASH IF) and explaining who takes part and what it does. Thomas Stockhammer, who is chair of the Interoperability Working Group explains that DASH IF is made of companies, headline members including Google, Ericsson, Comcast and Thomas’ employer Qualcomm who are working to promote the adoption MPEG-DASH by working to imrove the specification, advise on how to put it into practice in real life, promote interoperability, and being a liaison point for other standards bodies. The remaining talks in this video exemplify the work which is being done by the group to push the technology forward.

Meeting Live Broadcast Requirements – the latest on DASH low latency!
Akamai’s Will Law takes to the mic next to look at the continuing push to make low-latency streaming available as a mainstream option for services to use. Will Law has spoken about about low latency at Demuxed 2019 when he discussed the three main file-based to deliver low latency DASH, LHLS and LL-HLS as well as his famous ‘Chunky Monkey’ talk where he explains how CMAF, an implementation of MPEG-DASH, works in light-hearted detail.

In today’s talk, Will sets out what ‘low latency’ is and revises how CMAF allows latencies of below 10 seconds to be achieved. A lot of people focus on the duration of the chunks in reducing latency and while it’s true that it’s hard to get low latency with 10 second chunk sizes, Will puts much more emphasis on the player buffer rather than the chunk size themselves in producing a low-latency stream. This is because even when you have very small chunk sizes, choosing when to start playing (immediately or waiting for the next chunk) can be an important part of keeping the latency down between live and your playback position. A common technique to manage that latency is to slightly increase and decrease playback speed in order to manage the gap without, hopefully, without the viewer noticing.

Chunk-based streaming protocols like HLS make Adaptive Bitrate (ABR) relatively easy whereby the player monitors the download of each chunk. If the, say, 5 second chunk arrives within 0.25 seconds, it knows it could safely choose a higher-bitrate chunk next time. If, however the chunk arrives in 4.8 seconds, it can choose to the next chunk to be lower-bitrate so as to receive the chunk with more headroom. With CMAF this is not easy to do since the segments all arrive in near real-time since the transferred files represent very small sections and are sent as soon as they are created. This problem is addressed in a later talk in this talk.

To finish off, Will talks about ‘Resync Elements’ which are a way of signalling mid-chunk IDRs. These help players find all the points which they can join a stream or switch bitrate which is important when some are not at the start of chunks. For live streams these are noted in the manifest file which Will walks through on screen.

Ad Insertion in Live Content:Pre-, Mid- and Post-rolling
Whilst not always a hit with viewers, ads are important to many services in terms of generating the revenue needed to continue delivering content to viewers. In order to provide targeted ads, to ensure they are available and to ensure that there is a record of which ads were played when, the ad-serving infrastructure is complex. Hulu’s Zachary Cava walks us through the parts of the infrastructure that are defined within DASH such as exchanging information on ‘Ad Decision Parameters’ and ad metadata.

In chunked streams, ads are inserted at chunk boundaries. This presents challenges in terms of making sure that certain parameters are maintained during this swap which is given the general name of ‘Content Splice Conditioning.’ This conditioning can align the first segment aligned with the period start time, for example. Zachary lays out the three options provided for this splice conditioning before finishing his talk covering prepared content recommendations, ad metadata and tracking.

Bandwidth Prediction for Multi-bitrate Streaming at Low Latency
Next up is Comcast’s Ali C. Begen who follows on from Will Law’s talk to cover bandwidth prediction when operating at low-latency. As an example of the problem, let’s look at HTTP/1.1 which allows us to download a file before it’s finished being written. This allows us to receive a 10-second chunk as it’s being written which means we’ll receive it at the same rate the live video is being encoded. As a consequence the time each chunk takes to arrive will be the same as the real-time chunk duration (in this example, 10 seconds.) When you are dealing with already-written chunks, your download time will be dependent on your bandwidth and therefore the time can be an indicator of whether your player should increase or decrease the bitrate of the stream it’s pulling. Getting back this indicator for low-latency streams is what Ali presents in this talk.

Based on this paper Ali co-authored with Christian Timmerer, he explains a way of looking at the idle time between consecutive chunks and using a sliding window to generate a bandwidth prediction.

Implementing DASH low latency in FFmpeg
Open-source developer Jean-Baptiste Kempf who is well known for his work on VLC discusses his work writing an MPEG-DASH implementation for FFmpeg called the DASH-LL. He explains how it works and who to use it with examples. You can copy and paste the examples from the pdf of his talk.

Managing multi-DRM with DASH
The final talk, ahead of Q&A is from NAGRA discussing the use of DRM within MPEG-DASH. MPEG-DASH uses Common Encryption (CENC) which allows the DASH protocol to use more than one DRM scheme and is typically seen to allow the use of ‘FairPlay’, ‘Widevine’ and ‘PlayReady’ encryption schemes on a single stream dependent on the OS of the receiver. There is complexity in having a single server which can talk to and negotiate signing licences with multiple DRM services which is the difficulty that Lauren Piron discusses in this final talk before the Q&A led by Ericsson’s VP of international standards, Per Fröjdh.

Watch now!
Speakers

Thomas Stockhammer Thomas Stockhammer
Director of Technical Standards,
Qualcomm
Will Law Will Law
Chief Architect,
Akamai
Zachary Cava Zachary Cava
Software Architect,
Hulu
Ali C. Begen Ali C. Begen
Technical Consultant, Video Architecture, Strategy and Technology group,
Comcast
Jean-Baptiste Kempf Jean-Baptiste Kempf
President & Lead VLC Developer
VideoLAN
Laurent Piron Laurent Piron
Principal Solution Architect
NAGRA
Per Fröjdh Moderator: Per Fröjdh
VP International Standards,
Ericsson

Video: Encoding and packaging for DVB-I services

There are many ways of achieving a hybrid of OTT-delivered and broadcast-delivered content, but they are not necessarily interoperable. DVB aims to solve the interoperability issue, along with the problem of service discovery with DVB-I. This specification was developed to bring linear TV over the internet up to the standard of traditional broadcast in terms of both video quality and user experience.

DVB-I supports any device with a suitable internet connection and media player, including TV sets, smartphones, tablets and media streaming devices. The medium itself can still be satellite, cable or DTT, but services are encapsulated in IP. Where both broadband and broadcast connections are available, devices can present an integrated list of services and content, combining both streamed and broadcast services.

DVB-I standard relies on three components developed separately within DVB: the low latency operation, multicast streaming and advanced service discovery. In this webinar, Rufael Mekuria from Unified Streaming focuses on low latency distributed workflow for encoding and packaging.

 

The process starts with an ABR (adaptive bit rate) encoder responsible for producing streams with multiple bit rates and clear segmentation – this allows clients to automatically choose the best video quality depending on available bandwidth. Next step is packaging where streaming manifests are added and content encryption is applied, then data is distributed through origin servers and CDNs.

Rufael explains that low latency mode is based on an enhancement to the DVB-DASH streaming specification known as DVB Bluebook A168. This incorporates the chunked transfer encoding of the MPEG CMAF (Common Media Application Format), developed to enable co-existence between the two principle flavors of adaptive bit rate streaming: HLS and DASH. Chunked transfer encoding is a compromise between segment size and encoding efficiency (shorter segments make it harder for encoders to work efficiently). The encoder splits the segments into groups of frames none of which requires a frame from a later group to enable decoding. The DASH packager then puts each group of frames into a CMAF chunk and pushes it to the CDN. DVB claims this approach can cut end-to-end stream latency from a typical 20-30 seconds down to 3-4 seconds.

The other topics covered are: encryption (exhanging key parameters using CPIX), content insertion, metadata, supplemental descriptors, TTML subitles and MPD proxy.

Watch now!

Download the slides.

Speaker

Rufael Mekuria Rufael Mekuria
Head of Research & Standardization
Unified Streaming