Video: Tech Talks: Low-Latency Live Streaming

There are a number of techniques for achieving low-latency streaming. This talk is one of the few which introduces them in easy to understand ways and then puts them in context briefly showing the manifests or javascript examples of how these would be seen in the wild. Whilst there are plenty of companies who don’t need low-latency streaming, for many it’s a key part of their offering or it’s part of the business model itself. Knowing the techniques in play is to better understand internet streaming in general.

Jameson Steiner from Bitmovin starts by explaining why there is a motivation to cut the latency. One big motivation, aside from the standard live sports examples, is user-generated content like on Twitch where it’s very clear to the streamer, and quite off-putting, when there is large amounts of delay. Whilst delay can be adapted to, the more there is the less interaction is possible. In this situation, it’s the ‘handwaving’ latency that comes in to play. You want the hand on the screen to wave pretty much at the same time as your hand waves in front of the camera. Jameson places different types of distribution on a chart showing latency and we see that low-latency of 5 seconds or less will not only match traditional TV broadcasts, but also work well for live streamers.

Naturally, to fix a problem you need to understand the problem, so Jameson breaks down the legacy methods of delivery to show why the latency exists. The issue comes down to how video is split into sections, say 6 seconds, so that the player downloads a section at a time, reassembles and plays them. Looking from the player’s perspective, if the network suddenly broke or reduced its throughput, it makes sense to have several chunks in reserve. Having three 6-second chunks, a sensible precaution, makes you 18 seconds behind the curve from the off.

Clearly reducing the segement size is a winner in this scenario. Three 3 second segments will give you just 9 seconds latency; why not go to 1 second? Well encoding inefficiency is one reason. If you reduce the amount of time a temporal codec has of a video, its efficiency will drop and bitrate will increase to maintain quality. Jameson explains the other knock-on effects such as CDN inefficiencies and network requests. The standardised way to avoid these problems is to use CMAF (Common Media Application Format) which is based on MPEG DASH and ISO BMFF. CMAF, and DASH in general, has the benefit of coming from a standards body whose aim was to remove vendor lock-in that may be felt with HLS and was certainly felt with RTMP. Check out MPEG’s short white paper on the topic (zipped .docx file)

CMAF uses chunked transfer meaning that as the encoder writes the data to the disk, the web server sends it to the client. This is different to the default where a file is only sent after it’s been completely written. This has the effect of the not having to wait up to 6 seconds to a 6-second chunk to start being sent; the download time also needs to be counted. Rather, almost as soon as the chunk has been finished by the encoder, it’s arrived at the destination. This is a feature of HTTP 1.1 and after so is not new, but it still needs to be enabled and considered as part of the delivery.

CMAF goes beyond simple HTTP 1.1 chunked transfer which is a technique used in low-latency HLS, covered later, by creating extra structure within the 6-second segment (until now, called a chunk in this article). This extra structure allows the segment to be downloaded in smaller chunks decoupling the segment length from the player latency. Chunked transfer does cause a notable problem however which has not yet been conclusively solved. Jameson explains how traditionally each large segment typically arrives faster than realtime. By measuring how fast it arrives, given the player knows the duration, it can estimate the bandwidth available at that time on the network. With chunked transfer, as we saw, we are receiving data as it’s being created. By definition, we are now getting it in realtime so there is no opportunity to receive it any quicker. The bandwidth estimation element, as shown the presentation, is used to work out if the player needs to go down or could go up to another stream at a different bitrate – part of standard ABR. So the catastrophe here is the going down in latency has hampered our ability to switch bitrates and whilst the viewer can see the video close to real-time, who’s to say if they are seeing it at the best quality?

Low-Latency HLS/DASH is a way of extending DASH and HLS without using CMAF. Jameson explains some techniques such as advertising segments in advance to allow players to pre-request. It also relies on finding the compromise point of encoding inefficiency vs segment length, typically held to be around 2 seconds, to minimise the latency. At this point we start seeing examples of the techniques in manifests and javascript allowing us to understand how this is actually signalled and implemented.

Apple is on its second major revision of LL-HLS which has responded to many of the initial complaints from the community. Whilst it can use HTTP/2 to help push segments out, this caused problems in practice so it can now preload hints, as Jameson explains in order to remove round-trip times from requests. Jameson looks at the other of Apple’s techniques and shows how they look in manifest files.

The final section looks at problems in implementing these features such as chunks being fragmented across TCP packets, the bandwidth estimation question and dealing with playback speed in order to adjust the players position in time – speed-ups and slow-downs of 5 to 10% can be possible depending on content.

Watch now!
Download the presentation
Speaker

Jameson Steiner Jameson Steiner
Software Engineer,
Bitmovin

Video: Bandwidth Prediction in Low-Latency Chunked Streaming

How can we overcome one of the last, big, problems in making CMAF generally available: making ABR work properly.

ABR, Adaptive Bitrate is a technique which allows a video player to choose what bitrate video to download from a menu of several options. Typically, the highest bitrate will have the highest quality and/or resolution, with the smallest files being low resolution.

The reason a player needs to have the flexibility to choose the bitrate of the video is mainly due to changing network conditions. If someone else on your network starts watching some video, this may mean you can no longer download video quick enough to keep watching in full quality HD and you may need to switch down. If they stop, then you want your player to switch up again to make the most of the bitrate available.

Traditionally this is done fairly simply by measuring how long each chunk of the video takes to download. Simply put, if you download a file, it will come to you as quickly as it can. So measuring how long each video chunk takes to get to you gives you an idea of how much bandwidth is available; if it arrives very slowly, you know you are close to running out of bandwidth. But in low-latency streaming, your are receiving video as quickly as it is produced so it’s very hard to see any difference in download times and this breaks the ABR estimation.

Making ABR work for low-latency is the topic covered by Ali in this talk at Mile High Video 2019 where he presents some of the findings from his recently published paper which he co-authored with, among others, Bitmovin’s Christian Timmerer and which won the DASH-IF Excellence in DASH award.

He starts by explaining how players currently behave with low-latency ABR showing how they miss out on changing to higher/lower renditions. Then he looks at the differences on the server and for the player between non-low-latency and low-latency streams. This lays the foundation to discuss ACTE – ABR for Chunked Transfer Encoding.

ACTE is a method of analysing bandwidth with the assumption that some chunks will be delivered as fast as the network allows and some won’t be. The trick is detecting which chunks actually show the network speed and Ali explains how this is done and shows the results of their evaluation.

Watch now!

Speaker

Ali C. Begen Ali C. Begen
Technical Consultant and
Computer Science Professor

Video: What’s the Deal with LL-HLS?

Low latency streaming was moving forward without Apple’s help – but they’ve published their specification now, so what does that mean for the community efforts that were already underway and, in some places, in use?

Apple is responsible for HLS, the most prevalent protocol for streaming video online today. In itself, it’s a great success story as HLS was ideal for its time. It relied on HTTP which was a tried and trusted technology of the day, but the fact it was file-based instead of a stream pushed from the origin was a key factor in its wide adoption.

As life has moved on and demands have moved from “I’d love to see some video – any video – on the internet!” to “Why is my HD stream arriving after my flat mate’s TV’s?” we see that HLS isn’t quite up to the task of low-latency delivery. Using pure HLS as originally specified, a latency of less than 20 seconds was an achievement.

Various methods were, therefore, employed to improve HLS. These ideas included cutting the duration of each piece of the video, introducing HTTP 1.1’s Chunked Transfer Encoding, early announcement of chunks and many others. Using these, and other, techniques, Low Latency HLS (LHLS) was able to deliver streams of 9 down to 4 seconds.

Come WWDC this year, Apple announced their specification on achieving low latency streaming which the community is calling ALHLS (Apple Low-latency HLS). There are notable differences in Apple’s approach to that already adopted by the community at large. Given the estimated 1.4 billion active iOS devices and the fact that Apple will use adherence to this specification to certify apps as ‘low latency’, this is something that the community can’t ignore.

Zac Shenker from Comcast explains some of this backstory and helps us unravel what this means for us all. Zac first explains what LHS is and then goes into detail on Apple’s version which includes interesting, mandatory, elements like using HTTP/2. Using HTTP/2 and the newer QUIC (which will become effectively HTTP/3) is very tempting for streaming applications but it requires work both on the server and the player side. Recent tests using QUIC have been, when taken as a whole, inconclusive in terms of working out whether this it has a positive or a negative impact on streaming performance; experiments have shown both results.

The talk is a detailed look at the large array of requirements in this specification. The conclusion is a general surprise at the amount of ‘moving parts’ given there is both significant work to be done on the server as well as the player. The server will have to remember state and due to the use of HTTP/2, it’s not clear that the very small playlist.m3u8 files can be served from a playlist-optimised CDN separately from the video as is often the case today.

There’s a whole heap of difference between serving a flood of large files and delivering a small, though continually updated, file to thousands of endpoints. As such, CDNs currently optimised separately for the text playlists and the media files they serve. They may even be delivered by totally separate infrastructures.

Zac explains why this changes with LL-HLS both in terms of separation but also in the frequency of updating the playlist files. He goes on to explore the other open questions like how easy it will be to integrate Server-Side Ad Insertion (SSAI) and even the appetite for adoption of HTTP/2.

Watch now!
Speaker

Zac Shenker Zac Shenker
Director of Engineering, Video Experience & Optimization,
CBS Interactive

Video: Using CMAF to Cut Costs, Simplify Workflows & Reduce Latency

There are two ways to stream video online, either pushing from the server to the device like WebRTC, MPEG transport streams and similar technologies, or allowing the receiving device to request chunks of the stream which is how the majority of internet streaming is done – using HLS and similar formats.

Chunk-based streaming is generally seen as more scalable of these two methods but suffers extra latency due to buffering several chunks each of which can represent between 1 and, typically, 10 seconds of video.

CMAF is one technology here to change that by allowing players to buffer less video. How does this achieve this? An, perhaps more important, can it really cut costs? Iraj Sodagar from NexTreams is here to explain how in this talk from Streaming Media West, 2018.

Iraj covers:

  • A brief history of CMAF (Common Media Format)
  • The core technologies (ISO BMFF, Codecs, captions etc.)
  • Media Data Object (Chunks, Fragments, Segments)
  • Different ways of video delivery
  • Switching Sets (for ABR)
  • Content Protection
  • CTA WAVE project
  • Wave content specifications
  • Live Linear Content with Wave & CMAF
  • Low-latency CMAF usage
  • HTTP 1.1 Chunked Transfer Encoding
  • MPEG DASH

Watch now!

Speaker

Iraj Sodagar Iraj Sodagar
Independant Consultant
Multimedia System Architect, NexTreams