Video: What is HESP Ultra-Low-Latency Streaming?

Is it possible to improve on CMAF’s offer of an ultra-low-latency, scalable protocol with good viewer experience? This is what HESP, the High-Efficiency Streaming Protocol, promises. With almost instant channel change times and sub-second latency, it’s worth taking a look at those protocol created by THEOPlayer to understand where it might work in your workflows.

Presented by Pieter-Jan Speelmans and Johan Vounckx from THEO, we hear some more detail surrounding HESP’s inception. Quality, latency and bitrate are often referred to as a triangle where if you improve one or even two, the remaining factor will get worse to compensate. HESP plays in the triangle connecting ‘viewer experience’, ‘low latency’ and ‘scalability’. If you compare WebRTC with CMAF, you see that WebRTC prioritises low-latency streaming but suffers in terms of scalability. CMAF, being 2-5 seconds higher latency, has much better scalability but the channel zapping times are high which affects viewer experience as well as overall latency. HESP, contests Pieter-Jan, actually improves all three. It’s able to do this because it’s not extending existing protocols which weren’t designed to meet all these requirements, rather it’s bringing in new techniques which shift the whole equation.

THEOPlayer has created the HESP Alliance which is devoted to standardising the HESP technology through the IETF or other avenue, promoting adoption through marketing and the creation of tools, certification and management of intellectual property. The talk outlines the decoder royalties which can be payable by subscriber, per subscriber per hour, or per device.

Source: THEOPlayer

Looking at the technical details, we find out that you can actually start playing an HESP stream without downloading the manifest. While HESP does have manifest files, they change very infrequently. If a new one is changed at short notice, the server can ask players to download one by embedding a message in the stream. The channel zapping speed is achieved using two streams, an initialisation stream and a continuation stream. The initialisation stream just I and P frames allowing you to start playing immediately. The continuation stream is intended to be the low-bitrate stream used after the establishment of the stream.

HESP uses two modes: Maximal Gain and Maximal Compatability. Maximal gain aims to have the lowest latency, lowest bandwidth and lowest zapping times. It has long segments with 1 frame chunks containing one I or P frame. The Maximal Compatability mode, however, allows you to reuse Low-Latency DASH and LLHLS streams and uses 6-second segments with 200msec chunks including B frames.

THEOPlayer claim 7x less delivery delay, 20x lower zapping times and a 20% bandwidth saving over CMAF with broad compatibility with many TVs, android, iOS, Web, streaming devices.

Watch now!

Pieter-Jan Speelmans Pieter-Jan Speelmans
CTO & Founder,
Johan Vounckx Johan Vounckx
Vice President, Innovation,

Video: Real-time AV1 in WebRTC

AV1 seems to be shaking off its reputation for slow encoding, now only 2x slower than HEVC. How practical, then is it to put AV1 into a real-time codec aiming for sub-second latency? This is exactly what the Alliance for Open Media are working on as parts of AV1 are perfectly suited for the use case.

Dr Alex from CoSMo Software took the podium at the Alliance for Open Media Research Symposium to lay out the whys and wherefores of updating WebRTC to deliver AV1. He started by outlining the different requirements of real-time vs VoD. With non-live content, encoding time is often unrestricted allowing for complex encoding methods to achieve lower bitrates. Even live CMAF streams aiming to achieve a relatively low 3-second latency have time enough for much more complex encoding than real-time. Encoding, ingest, storage and delivery can all be separated into different parts of the workflow for VoD, whereas real-time is forced to collapse logical blocks down as much as possible. Unsurprisingly, Dr Alex outlines latency as the most important driver in the WebRTC use case.

When streaming, ABR isn’t quite as simple as with chunked formats. The different bit rate streams need to be generated at the encoder to save any transcoding delays. There are two ways of delivering these streams. One is to deliver them as separate streams, the other is to deliver only one, layered stream. The latter method is known as Scalable Video Coding (SVC) which sends a base layer of a low-resolution version of the video which can be decoded on its own. Within that stream, is also the information which builds on top of that video to create a higher-resolution version of the same stream. You can have multiple layers and hence provide information for 3, 4 or more streams.

Managing which streams get to the decoder is done through an SFU (Selective Forwarding Unit) which is a server to which WebRTC clients connect to receive just the stream, or parts of a stream, they need for their current bandwidth capability. It’s important to remember that compared to video conferencing solutions based on WebRTC, that streaming using WebRTC scales linearly. Whilst it’s difficult to hold a meeting with 50 people in a room, it’s possible to optimise what video is sent to everyone by only showing the last 5 speakers in full resolution, the others as thumbnails. Such optimisations are not available for video distribution, rather SFUs and media servers need to be scaled and cascaded. This should be simple, but testing can be difficult but it’s necessary to ensure quality and network resilience at scale.

Cisco have already demonstrated the first real-time AV1-based WebRTC system, though without SVC support. Work is ongoing to deliver improvements to RTP encapsulation of AV1 in WebRTC. For instance, providing Decoding Target Information which embeds information about frames without needing to decode the video itself. This information explains how important each frame is and how it relates to the other video. Such metadata can be used by the SFU or the decoder to understand which frames to drop and send/decode.

Watch now!
Download the slides

Alex Gouaillard Dr Alex Gouaillard
Video Codec Working Group – Real-time subgroup, Allience for Open Media
Founder, Directory & CEO, CoSMo Software Consulting Pte. Ltd.
Co-founder & CTO, Millicast

Video: Line by Line Processing of Video on IT Hardware

If the tyranny of frame buffers is let to continue, line-latency I/O is rendered impossible without increasing frame-rate to 60fps or, preferably, beyond. In SDI, hardware was able to process video line-by-line. Now, with uncompressed SDI, is the same possible with IT hardware?

Kieran Kunhya from Open Broadcast Systems explains how he has been able to develop line-latency video I/O with SMPTE 2110, how he’s coupled that with low-latency AVC and HEVC encoding and the challenges his company has had to overcome.

The commercial drivers are fairly well known for reducing the latency. Firstly, for standard 1080i50, typically treated as 25fps, if you have a single frame buffer, you are treated to a 40ms delay. If you need multiple buffers for a workflow, this soon stacks up so whatever the latency of your codec – uncompressed or JPEG XS, for example – the latency will be far above it. In today’s covid world, companies are looking for cutting the latency so people can work remotely. This has only intensified the interest that was already there for the purposes of remote production (REMIs) in having low-latency feeds. In the Covid world, low latency allows full engagement in conversations which is vital for news anchors to conduct interviews as well as they would in person.

IP, itself, has come into its own during recent times where there has been no-one around to move an SDI cable, being able to log in and scale up, or down, SMPTE ST 2110 infrastructure remotely is a major benefit. IT equipment has been shown to be fairly resilient to supply chain disruption during the pandemic, says Kieran, due to the industry being larger and being used to scaling up.

Kieran’s approach to receiving ST 2110 deals in chunks of 5 to 10 lines. This gives you time to process the last few lines whilst you are waiting for the next to arrive. This processing can be de-encapsulation, processing the pixel values to translate to another format or to modify the values to key on graphics.

As the world is focussed on delivering in and out of unusual and residential places, low-bitrate is the name of the game. So Kieran looks at low-latency HEVC/AVC encoding as part of an example workflow which takes in ST 2110 video at the broadcaster and encodes to MPEG to deliver to the home. In the home, the video is likely to be decoded natively on a computer, but Kieran shows an SDI card which can be used to deliver in traditional baseband if necessary.

Kieran talks about the dos and don’ts of encoding and decoding with AVC and HEVC with low latency targetting an end-to-end budget of 100ms. The name of the game is to avoid waiting for whole frames, so refreshing the screen with I-frame information in small slices, is one way of keeping the decoder supplied with fresh information without having to take the full-frame hit of 40ms (for 1080i50). Audio is best sent uncompressed to ensure its latency is lower than that of the video.

Decoding requires carefully handling the slice boundaries, ensuring deblocking i used so there are no artefacts seen. Compressed video is often not PTP locked which does mean that delivery into most ST 2110 infrastructures requires frame synchronising and resampling audio.

Kieran foresees increasing use of 2110 to MPEG Transport Stream back to 2110 workflows during the pandemic and finishes by discussing the tradeoffs in delivering during Covid.

Watch now!

Kieran Kunhya Kieran Kunhya
CEO & Founder, Open Broadcast Systems

Video: CMAF And The Future Of OTT

Why is CMAF still ‘the future’ of OTT? Published in 2018, CMAF’s been around for a while now so what are the challenges and hurdles holding up implementation? Are there reasons not to use it at all? CMAF is a way of encoding and packaging media which then can be sent using MPEG DASH and HLS, the latter being the path Disney+ has chosen, for instance.

This panel from Streaming Media West Connect, moderated by Jan Ozer, discusses CMAF use within Akami, Netflix, Disney+, and Hulu. Peter Chave from Akamai starts off making the point that CMAF is important to CDNs because if companies are able to use just one CMAF file as the source for different delivery formats, this reduces storage costs for consumers and makes each individual file more popular thus increasing the chance of having a file available in the CDN (particularly at the edge) and reducing cache misses. They’ve had to do some work to ensure that CMAF is carried throughout the CDN efficiently and ensuring the manifests are correctly checked.

Disney+, explains Bill Zurat, is 100% HLS CMAF. Benefiting from the long experience of the Disney Streaming Services teams (formerly BAMTECH), but also from setting up a new service, Disney were able to bring in CMAF from the start. There are issues ensuring end-device support, but as part of the launch, a number were sunsetted which didn’t have the requirements necessary to support either the protocol or the DRM needed.

Hulu is an aggregator so they have strong motivation to normalise inputs, we hear from Hulu’s Nick Brookins. But they also originate programming along with live streaming so CMAF has an important to play on the way in and the way out. Hulu dynamically regenerates their manifests so can iterate as they roll out easily. They are currently part the way through the rollout and will achieve full CMAF compatibility within the next 18 months.

The conversation turns to DRM. CMAF supports two methods of DRM known as CTR (adopted by Apple) and CBC (also known as CBCS) which has been adopted by others. AV1 supports both, but the recommendation has been to use CBC which appears have been universally followed to date explains Netflix’s Cyril Concolato. Netflix have been using AV1 since it was finalised and are aiming to have most titles transitioned by 2021 to CMAF.

Peter comments from Akamai’s position that they see a number of customers who, like Disney+ and Peacock, have been able to enter the market recently and move straight into CMAF, but there is a whole continuum of companies who are restricted by their workflows and viewer’s devices in moving to CMAF.

Low latency streaming is one topic which invigorates minds and debates for many in the industry. Netflix, being purely video on demand, they are not interested in low-latency streaming. However, Hulu is as is Disney Streaming Services, but Bill cautions us on rushing to the bottom in terms of latency. Quality of experience is improved with extra latency both in terms of reduced rebuffering and, in some cases, picture quality. Much of Disney Streaming Services’ output needs to match cable, rather than meeting over-the-air latencies or less.

The panel session finishes with a quick-fire round of questions from Jan and the audience covering codec strategy, whether their workflows have changed to incorporate CMAF, just-in-time vs static packaging, and what customers get out of CMAF.

Watch now!

Cyril Concolato Cyril Concolato
Senior Software Engineer,
Peter Chave Peter Chave
Principal Architect,
Nick Brookins Nick Brookins
VP, Platform Services Group,
Bill Zurat Bill Zurat
VP, Core Technology
Disney Streaming Services
Jan Ozer Moderator: Jan Ozer
Contributing Editor, Streaming Media