Video: Tech Talks: Low-Latency Live Streaming

There are a number of techniques for achieving low-latency streaming. This talk is one of the few which introduces them in easy to understand ways and then puts them in context briefly showing the manifests or javascript examples of how these would be seen in the wild. Whilst there are plenty of companies who don’t need low-latency streaming, for many it’s a key part of their offering or it’s part of the business model itself. Knowing the techniques in play is to better understand internet streaming in general.

Jameson Steiner from Bitmovin starts by explaining why there is a motivation to cut the latency. One big motivation, aside from the standard live sports examples, is user-generated content like on Twitch where it’s very clear to the streamer, and quite off-putting, when there is large amounts of delay. Whilst delay can be adapted to, the more there is the less interaction is possible. In this situation, it’s the ‘handwaving’ latency that comes in to play. You want the hand on the screen to wave pretty much at the same time as your hand waves in front of the camera. Jameson places different types of distribution on a chart showing latency and we see that low-latency of 5 seconds or less will not only match traditional TV broadcasts, but also work well for live streamers.

Naturally, to fix a problem you need to understand the problem, so Jameson breaks down the legacy methods of delivery to show why the latency exists. The issue comes down to how video is split into sections, say 6 seconds, so that the player downloads a section at a time, reassembles and plays them. Looking from the player’s perspective, if the network suddenly broke or reduced its throughput, it makes sense to have several chunks in reserve. Having three 6-second chunks, a sensible precaution, makes you 18 seconds behind the curve from the off.

Clearly reducing the segement size is a winner in this scenario. Three 3 second segments will give you just 9 seconds latency; why not go to 1 second? Well encoding inefficiency is one reason. If you reduce the amount of time a temporal codec has of a video, its efficiency will drop and bitrate will increase to maintain quality. Jameson explains the other knock-on effects such as CDN inefficiencies and network requests. The standardised way to avoid these problems is to use CMAF (Common Media Application Format) which is based on MPEG DASH and ISO BMFF. CMAF, and DASH in general, has the benefit of coming from a standards body whose aim was to remove vendor lock-in that may be felt with HLS and was certainly felt with RTMP. Check out MPEG’s short white paper on the topic (zipped .docx file)

CMAF uses chunked transfer meaning that as the encoder writes the data to the disk, the web server sends it to the client. This is different to the default where a file is only sent after it’s been completely written. This has the effect of the not having to wait up to 6 seconds to a 6-second chunk to start being sent; the download time also needs to be counted. Rather, almost as soon as the chunk has been finished by the encoder, it’s arrived at the destination. This is a feature of HTTP 1.1 and after so is not new, but it still needs to be enabled and considered as part of the delivery.

CMAF goes beyond simple HTTP 1.1 chunked transfer which is a technique used in low-latency HLS, covered later, by creating extra structure within the 6-second segment (until now, called a chunk in this article). This extra structure allows the segment to be downloaded in smaller chunks decoupling the segment length from the player latency. Chunked transfer does cause a notable problem however which has not yet been conclusively solved. Jameson explains how traditionally each large segment typically arrives faster than realtime. By measuring how fast it arrives, given the player knows the duration, it can estimate the bandwidth available at that time on the network. With chunked transfer, as we saw, we are receiving data as it’s being created. By definition, we are now getting it in realtime so there is no opportunity to receive it any quicker. The bandwidth estimation element, as shown the presentation, is used to work out if the player needs to go down or could go up to another stream at a different bitrate – part of standard ABR. So the catastrophe here is the going down in latency has hampered our ability to switch bitrates and whilst the viewer can see the video close to real-time, who’s to say if they are seeing it at the best quality?

Low-Latency HLS/DASH is a way of extending DASH and HLS without using CMAF. Jameson explains some techniques such as advertising segments in advance to allow players to pre-request. It also relies on finding the compromise point of encoding inefficiency vs segment length, typically held to be around 2 seconds, to minimise the latency. At this point we start seeing examples of the techniques in manifests and javascript allowing us to understand how this is actually signalled and implemented.

Apple is on its second major revision of LL-HLS which has responded to many of the initial complaints from the community. Whilst it can use HTTP/2 to help push segments out, this caused problems in practice so it can now preload hints, as Jameson explains in order to remove round-trip times from requests. Jameson looks at the other of Apple’s techniques and shows how they look in manifest files.

The final section looks at problems in implementing these features such as chunks being fragmented across TCP packets, the bandwidth estimation question and dealing with playback speed in order to adjust the players position in time – speed-ups and slow-downs of 5 to 10% can be possible depending on content.

Watch now!
Download the presentation
Speaker

Jameson Steiner Jameson Steiner
Software Engineer,
Bitmovin

Video: Reducing peak bandwidth for OTT

‘Flattening the curve’ isn’t just about dealing with viruses, we learn from Will Law. Rather, this is one way to deal with network congestion brought on by the rise in broadband use during the global lockdown. This and other key ways such as per-title encoding and removing the top tier are just two other which are explored in this video from Akamai and Bitmovin.

Will starts the talk explaining why congestion happens in a world where ABR (adaptive bitrate streaming) is supposed to deal with this. With Akamai’s traffic up by around 300%, it’s perhaps not a surprise there’s a contest for bandwidth. As not all traffic is a video stream, congestion will still happen when fighting with other, static, data transfers. However deeper than that, even with two ABR streams, the congestion protocol in use has a big impact as will shows with a graph showing Akamai’s FastTCP and BBR where BBR steals all the bandwidth rather than ‘playing fair’.

Using a webpage constructed for the video, Will shows us a baseline video playback and the metrics associated with it such as data transferred and bitrate which he uses to demonstrate the different benefits of bitrate production techniques. The first is covered by Bitmovin’s Sean McCarthy who explains Bitmovin’s per-title encoding technology. This approach ensures that each asset has encoder settings tuned to get the best out of the content whilst reducing bandwidth as opposed to simply setting your encoder to a fairly-high, safe, static bitrate for all content no matter how complex it is. Will shows on the demo that the bitrate reduces by over 50%.

Swapping codec, is an obvious way to reduce bandwidth. Unlike per-title encoding which is transparent to the end user, using AV1, VP9 or HEVC requires support by the final device. Whilst you could offer multiple versions of your assets to make sure you still cover all your players despite fragmentation, this has the downside of extra encoding costs and time.

Will then looks at three ways to reduce bandwidth by stopping the highest-bitrate rendition from being used. Method one is to manually modify the manifest file. Method two demonstrates how to do so using the Bitmovin player API, and method three uses the CDN itself to manipulate the manifests. The advantage of doing this in the CDN is because this allows much more flexibility as you can use geolocation rules, for example, to deliver different manifests to different locations.

The final method to reduce peak bandwidth is to use the CDN to throttle download speed of the stream chunks. This means that while you may – if you are lucky – have the ability to download at 100Mbps, the CDN only delivers 3- or 5-times the real-time bitrate. This goes a long way to smoothing out the peaks which is better for the end user’s equipment and for the CDN. Seen in isolation, this does very little, as the video bitrate and the data transferred remain the same. However delivering the video in this much more co-operative way is much more likely to cause knock-on problems for other traffic. It can, of course, be used in conjunction with the other techniques. The video concludes with a Q&A.

Watch now!
Speakers

Will Law Will Law
Chief Architect,
Akamai
Sean McCarthy Sean McCarthy
Technical Product Marketing Manager,
Bitmovin

Video: How CBS Sports Digital streams live events at scale

Delivering high scale in streaming really exposes the weaknesses of every point of your workflow, so even those of us who are not streaming at maximum scale, there are many lessons to be learnt. CBS Sports Digital delivered the Super Bowl using the principles of ‘practice, practice, practice’, keeping the solution as simple as possible and making mitigation of problems primary to solving them.

Taylor Busch tells walks us through their solution explaining how it supported their key principles and highlighting the technology used. Starting with Acquisition, he covers the SDI fibre delivery to a backup facility as well as the AWS Direct Connect links for their Elemental Live encoders. The origin servers were in two different regions and both received data from both sets of encoders.

CBS used ‘Output locking’ which ensures that the TS segments are all aligned even across different encoders which is done by respecting the timecode in the SDI and helps in encoder failover situations. QVBR encoding is a method of encoding up to a quality level rather than simply saying ‘7000 kbps’. QVBR provides a more efficient use a bandwidth since in the situations where a scene doesn’t require a lot of bandwidth, it won’t be sent. This variability, even if you run in capped mode to limit the bandwidth of particularly complex scenes, can look like a failing encoder to some systems, so the fact this is now in ‘VBR’ mode, needs to be understood by all the departments and companies who monitor your feed.

Advertising is famously important for the Super Bowl, so Taylor gives an overview of how they used the CableLabs ESAM protocol and SCTE to receive information about and trigger the adverts. This combined SCTE-104, ESAM and SCTE-35 as we’ll as allowing clients to use VAST for tracking. Extra caching was provided by Fastly’s Media Shield which tests for problems with manifests, origin servers and encoders. This fed a Multi-CDN setup using 4 CDNs which could be switched between. There is a decision point for requests to determine which CDN should answer.

Taylor then looks at the tools, such as Mux’s dashboard, which they used to spot problems in the system; both NOC-style tools and multiviewers. They set up three war rooms which looked at different aspects of the system, connectivity, APIs etc. This allowed them to focus on what should be communicated keeping ‘noise’ down to give people the space they needed to do their work at the same time as providing the information required. Taylor then opens up to questions from the floor.

Watch now!
Speaker

Taylor Busch, Sr. Taylor Busch
Senior Director Engineering,
CBS Sports Digital

Video: Edge Compute

Delivering personalised video at scale, live or otherwise, is a tradeoff between speed and complexity. In this lightning talk at Demuxed 2019, Kyle Boutette from Cloudflare explains the benefits of running code on the ‘edge’.

Kyle starts by highlighting the reason to use CDNs; they take the management of a whole fleet of servers off your hands allowing you to concentrate on delivering a video service and deploying the technology to do just that. This works really well and CDNs are the backbone of most of the large sites on the internet. Some companies build their own whilst some use Cloudflare or Amazon CloudFront among the many CDNs out there. Apart from dealing with the admin of the servers, CDNs are careful to provide servers as close to your users as practical which helps in reducing latency.

The problem that Kyle exposes is that any personalisation needs to be done on the player itself or on the server. The former requiring implementing the same features on many platforms, the latter destroying the value of the CDN since it’s based on needing the central server(s) to calculate the new information and send it to the CDN bringing us back to square one.

The solution that Cloudflare has developed allows javascript to run on the the CDN’s computers, referred to as the ‘edge’. This allows much of the logic to be done close to the consumer and gives the highest chance of reusing CDN assets whilst also reducing the latency of the requests compared to talking to the central server infrastructure. Doing this with javascript provides a well-understood environment for web developers. Kyle provides examples to understand how this can be done with relatively simple code.

Watch now!
Speaker

Kyle Boutette Kyle Boutette
Systems Engineer,
Cloudflare