Video: Real-time AV1 in WebRTC

AV1 seems to be shaking off its reputation for slow encoding, now only 2x slower than HEVC. How practical, then is it to put AV1 into a real-time codec aiming for sub-second latency? This is exactly what the Alliance for Open Media are working on as parts of AV1 are perfectly suited for the use case.

Dr Alex from CoSMo Software took the podium at the Alliance for Open Media Research Symposium to lay out the whys and wherefores of updating WebRTC to deliver AV1. He started by outlining the different requirements of real-time vs VoD. With non-live content, encoding time is often unrestricted allowing for complex encoding methods to achieve lower bitrates. Even live CMAF streams aiming to achieve a relatively low 3-second latency have time enough for much more complex encoding than real-time. Encoding, ingest, storage and delivery can all be separated into different parts of the workflow for VoD, whereas real-time is forced to collapse logical blocks down as much as possible. Unsurprisingly, Dr Alex outlines latency as the most important driver in the WebRTC use case.

When streaming, ABR isn’t quite as simple as with chunked formats. The different bit rate streams need to be generated at the encoder to save any transcoding delays. There are two ways of delivering these streams. One is to deliver them as separate streams, the other is to deliver only one, layered stream. The latter method is known as Scalable Video Coding (SVC) which sends a base layer of a low-resolution version of the video which can be decoded on its own. Within that stream, is also the information which builds on top of that video to create a higher-resolution version of the same stream. You can have multiple layers and hence provide information for 3, 4 or more streams.

Managing which streams get to the decoder is done through an SFU (Selective Forwarding Unit) which is a server to which WebRTC clients connect to receive just the stream, or parts of a stream, they need for their current bandwidth capability. It’s important to remember that compared to video conferencing solutions based on WebRTC, that streaming using WebRTC scales linearly. Whilst it’s difficult to hold a meeting with 50 people in a room, it’s possible to optimise what video is sent to everyone by only showing the last 5 speakers in full resolution, the others as thumbnails. Such optimisations are not available for video distribution, rather SFUs and media servers need to be scaled and cascaded. This should be simple, but testing can be difficult but it’s necessary to ensure quality and network resilience at scale.

Cisco have already demonstrated the first real-time AV1-based WebRTC system, though without SVC support. Work is ongoing to deliver improvements to RTP encapsulation of AV1 in WebRTC. For instance, providing Decoding Target Information which embeds information about frames without needing to decode the video itself. This information explains how important each frame is and how it relates to the other video. Such metadata can be used by the SFU or the decoder to understand which frames to drop and send/decode.

Watch now!
Download the slides
Speaker

Alex Gouaillard Dr Alex Gouaillard
Video Codec Working Group – Real-time subgroup, Allience for Open Media
Founder, Directory & CEO, CoSMo Software Consulting Pte. Ltd.
Co-founder & CTO, Millicast

Video: Preparing for 5G Video Streaming

Will streaming really be any better with 5G? What problems won’t 5G solve? Just a couple of the questions in this panel from the Streaming Video Alliance. There are so many aspects of 5G which are improvements, it can be very hard to clearly articulate for a given use case which are the main ones that matter. In this webinar, the use case is clear: streaming to the consumer.

Moderating the session, Dom Robinson kicks off the conversation asking the panellists to dig below the hype and talk about what 5G means for streaming right now. Brian Stevenson is first up explaining that the low-bandwidth 5G option really useful as it allows operators to roll out 5G offerings with the spectrum they already have and, given its low frequency, get a good decent a propagation distance. In the low frequencies, 5G can still give a 20% improvement bandwidth. Whilst this is a good start, he continues, it’s really delivering in the mid-band – where bandwidth is 6x – that we can really start enabling the applications which are discussed in the rest of the talk.

Humberto la Roche from Cisco says that in his opinion, the focus needs to be on low-latency. Latency at the network level is reduced when working in the millimetre wavelengths, reducing around 10x. This is important even for video on demand. He points out, though that delay happens within the IP network fabric as well as in the 5G protocol itself and the wavelength it’s working on. Adding buffers into the network drives down the cost of that infrastructure so it’s important to look at ways of delivering the overall latency needed at a reasonable cost. We also hear from Sanjay Mishra who explains that some telcos are already deploying millimetre wavelengths and focussing on advancing edge compute in high-density areas as their differentiator.

The panel discusses the current technical challenges for operators. Thierry Fautier draws from his experience of watching sports in the US on his mobile devices. The US has a zero-rating policy, he explains, where a mobile operator waives all data charges when you use a certain service, but only delivers the video at SD resolution at 1.5 Mbps. Whilst the benefits to this are obvious, it means that as people buy new, often larger phones, with better screens, they expect to reap the benefits. At SD, Thierry says, you can’t see the ball in Tennis, so there 5G will offer the over-the-air network bandwidth needed to allow the telcos to offer HD as part of these deals.

Preparing for 5G Video Streaming from Streaming Video Alliance on Vimeo.

The panel discusses the problems seen so far in delivering MBMS – multicast for mobile networks. MBMS has been deployed sporadically around the world in current LTE networks (using eMBMS) but has faced a typical chicken and egg problem. Given that both cell towers and mobile devices need to support the technology, it hasn’t been worth the upgrade cost for the telcos given that eMBMS is not yet supported by many chipsets including Apple’s. Thierry says there is hope for a 5G version of MBMS since Apple is now part of the 3GPP.

CMAF had a similar chicken and egg situation when it was finalised, there was hesitance in using it because Apple didn’t support it. Now with iOS 14 supporting HLS in CMAF, there is much more interest in deploying such services. This is just as well, cautions Thierry, as all the talk of reduced latency in 5G or in the network itself won’t solve the main problem with streaming latency which exists at the application layer. If services don’t abandon HLS/DASH and move to LL-HLS and LL-DASH/CMAF then the improvements in latency lower down the stack will only convey minimal benefits to the viewer.

Sanjay discusses the problem of coverage and penetration which will forever be a problem. “All cell towers are not created equal.” The challenge will remain as to how far and wide coverage will be there.

The panel finishes looking at what’s to come and suggests more ‘federations’ of companies working together, both commercially and technically, to deliver video to users in better ways. Thierry sums up the near future as providing higher quality experiences, making in-stadia experiences great and enabling immersive video.

Watch now!
Speakers

Brian Stevenson Brian Stevenson
SME,
Streaming Video Alliance
Humberto La Roche Humberto La Roche
Principal Engineer,
Cisco
Sanjay Mishra Sanjay Mishra
Associate Fellow,
Verizon
Thierry Fautier Thierry Fautier
President-Chair at Ultra HD Forum
VP Video Strategy Harmonic at Harmonic
Dom Robinson Moderator: Dom Robinson
Co-Founder, Director, and Creative Firestarter
id3as

Video: Web Media Standards

The internet has been a continuing story of proprietary technologies being overtaken by open technologies, from the precursors to TCP/IP, to Flash/RTMP video delivery, to HLS. Understanding the history of why these technologies appear, why they are subsumed by open standards and how boost in popularity that happens at that transition is important to help us make decisions now and foresee how the technology landscape may look in five or ten years’ time.

This talk, by Jonn Simmons, is a talk of two halves. Looking first at the history of how our standards coalesced into what we have today will fill in many blanks and make the purpose of current technologies like MPEG DASH & CMAF clearer. He then looks at how we can understand what we have today in light of similar situations in the past answering the question of whether we are at an inflexion point in technology.

John first looks at the importance of making DRM-protected content portable in the same way as non-protected content was easy to move between computers and systems. This was in response to a WIPO analysis which, as many would agree, concluded that this was essential to enable legal video use on the internet. In 2008, Mircosoft analysed all the elements needed, beyond the simple encryption, to allow such media to be portable. It would require HTML extensions for delivery, DRM signalling, authentication, a standard protocol for Adaptive Delivery (also known as ABR) and an adaptive container format. We then take a walk through the timeline starting in 2009 through to 2018 seeing the beginnings and published availability of such technologies Common Encryption, MPEG DASH and CMAF.

Milestones for Web Media Portability

John then walks through these key technologies starting with the importance of Common Encryption (also known as CENC). Previously all the DRM methods had their own container formats. Harmonisation of DRM is, likely, never going to happen so we’ll always have Apple’s own, Google’s own, Microsoft’s and plenty of others. For streaming providers, it’s a major problem to deliver all the different formats and makes for messy, duplicative workflows. Common Encryption allows for one container format which can contain any DRM information allowing for a single workflow with different inputs. On the player side, the player can, now, simply accept a single stream of DRM information, authenticate with the appropriate service and decode the video.

CMAF is another key technology called out by John in enabling portability of media. It was co-developed with Apple to enable a common media format for HLS and DASH. We’ve covered this before on The Broadcast Knowledge starting with the ISO BMFF format on which DASH and CMAF are based, Will Law’s famous ‘Chunky Monkey’ talk and many more. We recently covered FuboTV’s talk on how they distribute HLS & DASH multi-codec encoding and packaging.

Also highlighted by John. are the JavasScript Media Source Extensions and Encrypted Media Extensions which allow interaction from browsers/JavaScript with both ABR/Adaptive Streaming and DRM. He then talks about CTA WAVE which is a project that specifically aims to improve streamed media experiences on consumer devices, CTA being the Consumer Technology Association who are behind the annual CES exhibition in Las Vegas.

What is often less apparent is the current work happening developing new standards and specifications. John calls out a number of different projects within W3C and MPEG such as Low latency support for CMAF, MSE and codec switching in MSE. Work on ad signalling period boundaries and SCTE-35 is making its debut into JavaScript with some ongoing work to create the link between ad markers and JS applications. He also calls out VVC and AV1 mappings into CMAF.

In the second part of the presentation, John asked ‘where will we end up?’ John draws upon two examples. One is the number of TCP/IP hosts between 1980 and 1992. He shows it was clear that when TCP/IP was publicly available there was an exponential increase in adoption of TCP/IP, moving on from proprietary network interfaces available in the years before. Similarly with websites between 1990 and 1997. Exponential growth happened after 1993 when the standard was set for Web Clients. This did take a few years to have a marked effect, but the number of websites moved from a flat ‘less than 100’ number to 600, then 10,000 in 1994 increasing to a quarter of a million by 1995 and then over one million in 1996. This shows the difference between the power ‘walled garden’ environments and the open internet.

John sees media technology today as still having a number of ‘traditional’ walled gardens such as DISH and Sky TV. He sees people self-serving multiple walled gardens to create their own larger pool of media options, typically known as ‘cord cutters’. He, therefore, sees two options for the future. One is ever larger walled gardens where large companies aggregate the content of smaller content owners/providers. The other option is having cloud services that act as a one-stop-shop for your media, but dynamically authenticate against whichever service is needed. This is a much more open environment without the need to be separately subscribing to each and every outlet in the traditional sense.

Watch now!
Speakers

John Simmons John Simmons
W3C Evangelist, Media & Entertainment
W3C

Video: LL-HLS Discussion with THEO, Wowza & Fastly

Roundtable discussion with Fastly, Theo and Wowza

iOS 14 has finally started to hit devices and with it, LL-HLS is now available in millions of devices. Low-Latency HLS is Apple’s latest evolution of HLS, a streaming protocol which has been widely used for over a decade. Its typical latency has gradually come down from 60 seconds to, between 6 and 15 seconds now. There are still a lot of companies that want to bring that down further and LL-HLS is Apple’s answer to people who want to operate at around 2-4 seconds total latency, which matches or beats traditional broadcast.

LL-HLS was introduced last year and had a rocky reception. It came after a community-driven low-latency scheme called LHLS and after MPEG DASH announced CMAF’s ability to hit the same 2-4 second window. Famously, this original context, as well as the technical questions over the new proposal, were summed up well in Phil Cluff’s blog post which was soon followed by a series of talks trying to make sense of LL-HLS ahead of implementation. This is the Apple video introducing LL-HLS in its first form. And the reactions from AL Shenker from CBS Interactive, Marina Kalkanis from M2A Media and Akamai’s Will Law which also nicely sums up the other two contenders. Apple have now changed some of the spec in response to their own further reasearch and external feedback which was received positively and summed up in, THEO CTO, Pieter-Jan Speelmans’ recent webinar bringing us the updates.

In this panel, Pieter is joined by Chris Buckley from Fastly Inc. and Wowza’s Jamie Sherry discussing pressing LL-HLS into action. Moderator Alison Kolodny hosts the talk which covers a wide variety of points.

“Wide adoption” is seen as the day-1 benefit. If you support LL-HLS then you’ll know you’re able to hit a large number of iPads, iPhones and Macs. Typically Apple sees a high percentage of the userbase upgrade fairly swiftly and easily see more than 75% of devices updated within four months of release. The panel then discusses how implementation has become easier given the change in protocol where the use of HTTP/2’s push technology was dropped which would have made typical CDN techniques like hosting the playlists separately to the media impossible. Overall, CDN implementation has become more practical since with pre-load hints, a CDN can host many, many connections into to it, all waiting for a certain chunk and collapse that down to a single link to the origin.

One aspect of implementation which has improved, we hear from Pieter-Jan, is building effective Adaptive Bit Rate (ABR) switching. With low-latency protocols, you are so close to live that it becomes very hard to download a chunk of video ahead of time and measure the download speed to see if it arrived quicker than realtime. If it did, you’d infer there was spare bit rate. LL-HLS’s use of rendition reports, however, make that a lot easier. Pieter-Jan also points out SSAI is easier with rendition reports.

The rest of the discussion covers device support for LL-HLS, subtitles workflows, the benefits of TLS 1.3 being recommended, and low-latency business cases.

Watch now!
The webinar is free to watch, on demand, in exchange for your email details. The link is emailed to you immediately.
Speaker

Chris Buckley
Senior Sales Engineer,
Fastly Inc.
Pieter-Jan Speelmans Pieter-Jan Speelmans
CTO,
THEO Technologies
Jamie Sherry Jamie Sherry
Senior Product Manager,
Wowza
Alison Kolodny Moderator: Alison Kolodny
Senior Product Manager of Media Services,
Frame.io