Low latency can be a differentiator for a live streaming service, or just a way to ensure you’re not beaten to the punch by social media or broadcast TV. Either way, it’s seen as increasingly important for live streaming to be punctual breaking from the past where latencies of thirty to sixty seconds were not uncommon. As the industry has matured and connectivity has enough capacity for video, simply getting motion on the screen isn’t enough anymore.
Steve Heffernan from MUX takes us through the thinking about how we can deliver low latency video both into the cloud and out to the viewers. He starts by talking about the use cases for sub-second latency – anything with interaction/conversations – and how that’s different from low-latency streaming which is one to many, potentially very large scale distribution. If you’re on a video call with ten people, then you need sub-second latency else the conversation will suffer. But distributing to thousands or millions of people, the sacrifice in potential rebuffering of operating sub-second, isn’t worth it, and usually 3 seconds is perfectly fine.
Steve talks through the low-latency delivery chain starting with the camera and encoder then looking at the contribution protocol. RTMP is still often the only option, but increasingly it’s possible to use WebRTC or SRT, the latter usually being the best for streaming contribution. Once the video has hit the streaming infrastructure, be that in the cloud or otherwise, it’s time to look at how to build the manifest and send the video out. Steve talks us through the options of Low-Latency HLS (LHLS) CMAF DASH and Apple’s LL-HLS. Do note that since the talk, Apple removed the requirement for HTTP/2 push.
The talk finishes off with Steve looking at the players. If you don’t get the players logic right, you can start off much farther behind than necessary. This is becoming less of a problem now as players are starting to ‘bend time’ by speeding up and slowing down to bring their latency within a certain target range. But this only underlines the importance of the quality of your player implementation.
Jameson Steiner from Bitmovin starts by explaining why there is a motivation to cut the latency. One big motivation, aside from the standard live sports examples, is user-generated content like on Twitch where it’s very clear to the streamer, and quite off-putting, when there is large amounts of delay. Whilst delay can be adapted to, the more there is the less interaction is possible. In this situation, it’s the ‘handwaving’ latency that comes in to play. You want the hand on the screen to wave pretty much at the same time as your hand waves in front of the camera. Jameson places different types of distribution on a chart showing latency and we see that low-latency of 5 seconds or less will not only match traditional TV broadcasts, but also work well for live streamers.
Naturally, to fix a problem you need to understand the problem, so Jameson breaks down the legacy methods of delivery to show why the latency exists. The issue comes down to how video is split into sections, say 6 seconds, so that the player downloads a section at a time, reassembles and plays them. Looking from the player’s perspective, if the network suddenly broke or reduced its throughput, it makes sense to have several chunks in reserve. Having three 6-second chunks, a sensible precaution, makes you 18 seconds behind the curve from the off.
Clearly reducing the segement size is a winner in this scenario. Three 3 second segments will give you just 9 seconds latency; why not go to 1 second? Well encoding inefficiency is one reason. If you reduce the amount of time a temporal codec has of a video, its efficiency will drop and bitrate will increase to maintain quality. Jameson explains the other knock-on effects such as CDN inefficiencies and network requests. The standardised way to avoid these problems is to use CMAF (Common Media Application Format) which is based on MPEG DASH and ISO BMFF. CMAF, and DASH in general, has the benefit of coming from a standards body whose aim was to remove vendor lock-in that may be felt with HLS and was certainly felt with RTMP. Check out MPEG’s short white paper on the topic (zipped .docx file)
CMAF uses chunked transfer meaning that as the encoder writes the data to the disk, the web server sends it to the client. This is different to the default where a file is only sent after it’s been completely written. This has the effect of the not having to wait up to 6 seconds to a 6-second chunk to start being sent; the download time also needs to be counted. Rather, almost as soon as the chunk has been finished by the encoder, it’s arrived at the destination. This is a feature of HTTP 1.1 and after so is not new, but it still needs to be enabled and considered as part of the delivery.
CMAF goes beyond simple HTTP 1.1 chunked transfer which is a technique used in low-latency HLS, covered later, by creating extra structure within the 6-second segment (until now, called a chunk in this article). This extra structure allows the segment to be downloaded in smaller chunks decoupling the segment length from the player latency. Chunked transfer does cause a notable problem however which has not yet been conclusively solved. Jameson explains how traditionally each large segment typically arrives faster than realtime. By measuring how fast it arrives, given the player knows the duration, it can estimate the bandwidth available at that time on the network. With chunked transfer, as we saw, we are receiving data as it’s being created. By definition, we are now getting it in realtime so there is no opportunity to receive it any quicker. The bandwidth estimation element, as shown the presentation, is used to work out if the player needs to go down or could go up to another stream at a different bitrate – part of standard ABR. So the catastrophe here is the going down in latency has hampered our ability to switch bitrates and whilst the viewer can see the video close to real-time, who’s to say if they are seeing it at the best quality?
Apple is on its second major revision of LL-HLS which has responded to many of the initial complaints from the community. Whilst it can use HTTP/2 to help push segments out, this caused problems in practice so it can now preload hints, as Jameson explains in order to remove round-trip times from requests. Jameson looks at the other of Apple’s techniques and shows how they look in manifest files.
The final section looks at problems in implementing these features such as chunks being fragmented across TCP packets, the bandwidth estimation question and dealing with playback speed in order to adjust the players position in time – speed-ups and slow-downs of 5 to 10% can be possible depending on content.
With his usual entertaining vigour, Will Law explains the differences to the three approaches to low-latency streaming: DASH, LHLS and LL-HLS from Apple. Likening them partly to religions that all get you to the same end, we see how they differ and some of the reasons for that.
Please note: Since this video was recorded, Apple has released a new draft of LL-HLS. As described in this great article from Mux, the update’s changes are
“Delivering shorter sub-segments of the video stream (Apple call these parts) more frequently (every 0.3 – 0.5s)
Using HTTP/2 PUSH to deliver these smaller parts, pushed in response to a blocking playlist request
Blocking playlist requests, eliminating the current speculative manifest request polling behaviour in HLS
Smaller, delta rendition playlists, which reduces playlist size, which is important since playlists are requested more frequently
Faster rendition switching, enabled by rendition reports, which allows clients to see what is happening in another playlist without requesting it in its entirety”
Read the full article for the details and implications, some of which address some points made in the talk.
Anyone who saw last year’s Chunky Monkey video, will recognise Will’s near-Oscar-winning animation style as he sets the scene explaining the contenders to the low-latency streaming crown.
We then look at a bullet list of features across each of the three low latency technologies (note Apple’s recent update) which leads on to a discussion on chunked transfer delivery and the challenges of line-rate delivery. A simple view of the universe would say that the ideal way to have a live stream, encoded at a constant bitrate, would be to stream it constantly at that bitrate to the receiver. Whilst this is, indeed, the best way to go, when we stream we’re also keeping one eye on whether we need to change the bitrate. If we get more bandwidth available it might be best to upgrade to a better quality and if we suddenly have contested, slow wifi, it might be time for an emergency drop down to the lowest bitrate stream.
When you are delivered a stream as individual files, you can measure how long they take to download to estimate your available bandwidth. If a file can be downloaded at 1Gbps, then it should always arrive at 1Gbps. Therefore if it arrives at less than 1Gbps we know that there is a bandwidth restriction and can make adjustments. Will explains that for streams delivered with chunked transfer or in real time such as in LL-HLS, this estimation no longer works as the files simply are never available at 1Gbps. He then explains some of the work that has been undertaken to develop more nuanced ways of estimating available bandwidth. It’s well worth noting that the smaller the files you transfer, the less accurate the bandwidth estimation as TCP takes time to speed up to line rate so small 320ms-length video segments are not ideal for maximising throughput.
Continuing to look at the differences, we next look at request rates with DASH at 20 requests per second compared to LL-HLS at 720. This leads naturally to an analysis of the benefits of HTTP/2 PUSH technology used in LL-HLS and the savings that can offer. Will explores the implications, and some of the problems, with last year’s version of the LL-HLS spec, some of which have been mitigated since.
The talk concludes with some work Akamai has done to try and establish a single, common workflow with examples and a GitHub repository. Will shows how this works and the limitations of the approach and finishes with a look at the commonalities in approaches.
Low latency streaming was moving forward without Apple’s help – but they’ve published their specification now, so what does that mean for the community efforts that were already underway and, in some places, in use?
Apple is responsible for HLS, the most prevalent protocol for streaming video online today. In itself, it’s a great success story as HLS was ideal for its time. It relied on HTTP which was a tried and trusted technology of the day, but the fact it was file-based instead of a stream pushed from the origin was a key factor in its wide adoption.
As life has moved on and demands have moved from “I’d love to see some video – any video – on the internet!” to “Why is my HD stream arriving after my flat mate’s TV’s?” we see that HLS isn’t quite up to the task of low-latency delivery. Using pure HLS as originally specified, a latency of less than 20 seconds was an achievement.
Various methods were, therefore, employed to improve HLS. These ideas included cutting the duration of each piece of the video, introducing HTTP 1.1’s Chunked Transfer Encoding, early announcement of chunks and many others. Using these, and other, techniques, Low Latency HLS (LHLS) was able to deliver streams of 9 down to 4 seconds.
Come WWDC this year, Apple announced their specification on achieving low latency streaming which the community is calling ALHLS (Apple Low-latency HLS). There are notable differences in Apple’s approach to that already adopted by the community at large. Given the estimated 1.4 billion active iOS devices and the fact that Apple will use adherence to this specification to certify apps as ‘low latency’, this is something that the community can’t ignore.
Zac Shenker from Comcast explains some of this backstory and helps us unravel what this means for us all. Zac first explains what LHS is and then goes into detail on Apple’s version which includes interesting, mandatory, elements like using HTTP/2. Using HTTP/2 and the newer QUIC (which will become effectively HTTP/3) is very tempting for streaming applications but it requires work both on the server and the player side. Recent tests using QUIC have been, when taken as a whole, inconclusive in terms of working out whether this it has a positive or a negative impact on streaming performance; experiments have shown both results.
The talk is a detailed look at the large array of requirements in this specification. The conclusion is a general surprise at the amount of ‘moving parts’ given there is both significant work to be done on the server as well as the player. The server will have to remember state and due to the use of HTTP/2, it’s not clear that the very small playlist.m3u8 files can be served from a playlist-optimised CDN separately from the video as is often the case today.
There’s a whole heap of difference between serving a flood of large files and delivering a small, though continually updated, file to thousands of endpoints. As such, CDNs currently optimised separately for the text playlists and the media files they serve. They may even be delivered by totally separate infrastructures.
Zac explains why this changes with LL-HLS both in terms of separation but also in the frequency of updating the playlist files. He goes on to explore the other open questions like how easy it will be to integrate Server-Side Ad Insertion (SSAI) and even the appetite for adoption of HTTP/2.
Director of Engineering, Video Experience & Optimization,
Subscribe to get daily updates
Views and opinions expressed on this website are those of the author(s) and do not necessarily reflect those of SMPTE or SMPTE Members.
This website is presented for informational purposes only. Any reference to specific companies, products or services does not represent promotion, recommendation, or endorsement by SMPTE