Who better to dig below the surface of WebRTC, which delivers sub-second latency, than Sean DuBois, creator of the Pion WebRTC library? This video takes a different look at WebRTC to others that focus on latency or scaling. Rather Sean looks at congestion control and managing the impacts of congestion noting that people remember how bad the video got and not how nice your sign-up page was.
Congestion is inevitable in large ‘unmanaged’ networks such as the internet and on wifi and cellular networks. Sean points out that the use of MPEG codecs which add dependencies between frames magnify the effect of lost packets. With frame-by-frame codecs, dropping a frame and repeating the last one is barely noticeable, but with MPEG, many more could be damaged. WebRTC was implemented over UDP so it could use its own congestion control.
RTP and RTCP are the key to WebRTC’s congestion control. RTP is well known for carrying real-time media as it’s used for AES67 audio, SMPTE ST 2110 and ST 2022-6 to name just a few standards. RTCP is RTP’s sidekick. Whilst RTP does the legwork of carrying the media, the RTP Control Protocol (RTCP) passes messages to control the flow. In this case, Sean explains, the RTCP channel is used to tell the sender that it’s sending too much video or which packets it’s lost. In terms of mitigating congestion, the source can adjust the bitrate directly or change the resolution or the framerate of the video to bring the bitrate down indirectly.
Sean shows a summary diagram of congestion controller flow which is built to handle jitter and out of order packets. Buffers are the normal way of fixing out-of-order packets but they have a big downside of adding latency and exacerbating timing problems. WebRTC has to use the RTCP channel to make sure it can map packet timing with NTP, using Sender Reports, as each packet’s timing information is only relative. When packet loss is spotted NACK (negative acknowledgements) are sent via RTCP or if things are worse, a Picture Loss Indication is sent which request a new keyframe. Fixing any impairments that do occur can be done either with FEC or by concealing the error with some form of masking, nowadays this may be based on machine learning.
The talk finishes with a look at a number of innovative projects which use WebRTC in one way or another, including for file transfer.
WebRTC continues to live two lives; one of massive daily use in video conferencing in apps from Google, Facebook as well as many others, and one as a side-lined streaming protocol in hte broadcast and streaming industry. WebRTC is now an IETF/W3C standard, is a decade old and is seeing continued work and innovation from Google, other large companies and smaller specialists pushing it forward.
In this extended Streaming Media Connect video with Millicast’s Ryan Jespersen, we explore where WebRTC is up to now, how it can replace RTMP, how real-time AV1 not only shows the innovation within the technology but also enables several use cases and upcoming technologies such as end-to-end encryption for streaming workflows. The video is in sections: product demos, technology discussion and overviews of use cases.
A clear first question is why bother with WebRTC at all. Ryan’s quick to point out that WebRTC is in daily use not only in many of the big video call apps but also in Clubhouse, the high-scale WebRTC-based interactive audio platform. He also establishes that it’s commonly in use on CDNs such as Limelight and Millicast to deliver ultra-low-latency streams to end-users for auctions, gambling and interactive streams, but also as part of broadcast workflows. NFL, for instance, used WebRTC for low-latency monitoring of 122 cameras for the Super Bowl. As far as end-users are concerned, Ryan sees the ‘interactivity’ market as a way, as yet untapped, to release value in many verticals and will be the fastest-growing sector of the streaming industry over the next few years.
Looking back at Flash, Ryan explains that we came from a point where we had a low-latency protocol in the name of RTMP. Its latency was in the realms of 1 to 3 seconds, it had end-to-end security, encoder control and interactivity. RTMP was displaced due to three main factors, security concerns, rejection of the proprietary nature of the protocol and the move to HLS which provided improved scalability and was enthusiastically adopted by CDNs.
WebRTC, Ryan contends, learns from the mistakes of RTMP. WebRTC has ways to recover lost packets, is content agnostic, has a solution for NAT traversal, is non-proprietary and has no plugins. These latter two points address many of the security concerns of RTMP. Now a standard, the W3C has documented many upcoming use cases for this free, Open Source, technology.
Why, then, do we not see WebRTC much more prevalent in video streaming such as Netflix or Peacock? This is a question that Russell Trafford-Jones discussed in this IBC panel with nanocosmos, M2A and VisualOn. One view from that panel is that sub-second is lower than needed for some services. For instance, a public broadcaster may not wish to deliver online faster than it does over the air. Also, there’s a quality issue to contend with. One strength of WebRTC is that it prioritises latency over quality, always. This is great for face-to-face communication, but tier-1 broadcasters want people to see video in the same quality that left their encoders and if that means waiting for packets to be recovered instead of showing an impaired signal, that’s what they will do. As ever, therefore, this is a business decision that has to pay careful attention to the needs of the viewers, the quality aspirations of the viewers and broadcaster/provider as well as the technical pros and cons of each approach.
Ryan tlks about Real-time AV1 in WebRTC covered also in this talk
Moving on to AV1, Ryan explains that this royalty-free codec has been sped up significantly since the early days when it required thousands of CPUs for real-time encoding. Using AV1 is a boon for WebRTC for two reasons: screen content and scalable video coding. Screen Content Coding is a set of techniques to adapt encoding specifically for screen content meaning computer graphics whether that be in games or just sending a computer desktop. With straighter lines and the possibility for many parts of the screen to be identical or close to identical to other parts, it’s possible to get much better encoding for screen content if you can detect it and optimise for it.
Ryan moves on to AV1’s use in shoring up security. Although a codec and not a security measure in and of itself, AV1’s ability to send multiple resolutions in one stream is a big deal for securing communications. Scalable video coding, SVC, is not a new technology, but AV1 is one of the first mainstream, modern codecs which has it by default. This enables an encoder to encode to, say, sub-SD, SD and HD resolution and send these all at once in one stream. These are not simply 3 encodes squeezed down the same pipe, but they encode that build on top of each other. The sub-HD provides a foundation on which the SD feed provides enhancement information. You need both the sub-SD and SD layer to get SD. Adding on the HD layer to those two gives you that full-resolution HD. By only delivering the extra information needed for HD rather than all the underlying data again, a lot of bitrate can be saved. Importantly, by generating all the encoding at the source, you can encrypt at the source for an end-to-end encrypted workflow and also deliver multiple bitrates. Ryan explains that the move to ABR streaming, whether HLS, DASH or otherwise breaks the end-to-end security model as the need to transcode the media necessitates being able to view it. Using AV1’s SVC is one way around the need for mid-workflow transcoding.
One aspect is missing, though, for modern streaming workflows. If you don’t want to do peer-to-peer networking, some form of traffic manipulation will be needed in your CDN and/or delivery infrastructure. This is why Ryan says that Millicast has proposed that ‘secure frames’ are added to the WebRTC spec. Whilst this talk doesn’t detail their functionality they add a way of encrypting data twice such that the media can be encrypted for end-to-end workflows, but also each hop can be separately encrypted. This provides just enough access to the metadata of the stream for traffic manipulation, but without allowing access to the underlying media.
As the video comes to end, Ryan gives us a glimpse into one other upcoming technology that may be added to WebRTC called WHIP. The RFC explains the intention of WHIP:
The WebRTC-HTTP ingest protocol (WHIP) uses an HTTP POST request to
perform a single shot SDP offer/answer so an ICE/DTLS session can be
established between the encoder/media producer and the broadcasting
Once the ICE/DTLS session is set up, the media will flow
unidirectionally from the encoder/media producer broadcasting
ingestion endpoint. In order to reduce complexity, no SDP
renegotiation is supported, so no tracks or streams can be added or
removed once the initial SDP O/A over HTTP is completed.
Ryan closes his video with a demonstration of the Millicast platform and looks at how other use cases might be architected such as watch parties.
WebRTC is now a W3C standard providing sub-second peer-to-peer video and audio streaming with NAT traversal. Widely used for video conferencing, its sub-second latency has also been the focus of video streaming companies such as Millicast and Limelight (to name but two) who aim to deliver this otherwise peer-to-peer technology to thousands or millions of people in under a second enabling interactive video, gamefied streams, auctions and ultra-low-latency sports.
Addressing directly people using other streaming protocols, Pion creator Sean DuBois spoke at SF Video Tech about what WebRTC brings over and above protocols like RTMP, SRT and RIST. At the heart of it, WebRTC, like SRT and RIST, creates a connection over which it can send a variety of data. Whilst we expect media to be sent, actually, file transfer can be easily achieved – let’s not forget the whole of SRT is build upon UDT which is specifically a file delivery utility. Where file transfer can be achieved, so can real-time data & metadata transfer.
Sean quickly summarises WebRTC as a Protocol between (typically) browsers, an peer-to-peer secure connection over which multiple audio & video streams can flow. In common with RIST and other recent protocols, it’s based on many pre-existing
technologies such as SRTP, DTLS, ICE and SDP to deliver signalling, connection management, encryption and communication.
The list of improvements over RTMP is very long. They’re spelt out concisely in the video so we will highlight just a few here. Importantly, low-latency is key. RTMP was low-latency for its time, but not by today’s standards. Google’s Stadia can boast 125ms video latency for a keypress, explains Sean. DTLS and SRTP are essential for security but are well understood, trusted methods of securing your data. DTLS is pretty much exactly the same as the TLS which secures your bank transfers, just moved into UDP instead of TCP. However, WebRTC can work by exchanging ‘fingerprints’ (DTLS-SRTP) instead of the full trusted certificate infrastructure that underpins TLS on the web. Removing the requirement for certs is a big boost for flexibility and agility as long as you are confident you can exchange fingerprints securely ahead of time.
NAT traversal is also a big boon where, even with both endpoints behind a firewall, endpoints can always find a way to communicate although this does mean that ICE servers are needed to facilitate connectivity. Within broadcasting, however, it’s more likely that you’ll have control of one end so this is less needed. Sean highlights the ability to send multiple quality levels within the same stream using the ‘simulcast’ ability of WebRTC.
Sean then looks at SRT and RIST. Both of these are low-latency streaming protocols which can, both, also provide sub-second streaming for good connections with a relatively low RTT. Sean highlights the lack of SRT and RIST to negotiate the codec in use and their optional security. Being focused more on delivering contribution feeds, they tend to have a more static configuration often created after a programme of testing to ensure the quality will be acceptable to the broadcaster/streaming provider.
To finish, Sean highlights a whole series of interesting, innovative uses of WebRTC from informal group streaming to drones to shared online games to file transfers and more.
Continuing our look at the most popular videos of 2020, in common with the previous post on SRT, today we look at replacing RTMP for ingest. This time, WebRTC is demonstrated as an option. With sub-second latency, WebRTC is a compelling replacement for RTMP.
Read what we said about it the first time in the original article, but you’ll see that Nick Chadwick from Mux takes us through the how RTMP works and where the gaps are as it’s phased out. He steps through the alternatives showing how even the low-latency delivery formats don’t fit the bill for contribution and shows how WebRTC can be a sub-second solution.
RIST and SRT saw significant and continued growth in use throughout 2020 as delivery formats and appear to be more commonly used than WebRTC, though that’s not to say that WebRTC isn’t continuing to grow within the broadcast community. SRT and RIST are both designed for contribution in that they actively manage packet loss, allow any codecs to be used and provide for other data to be sent, too. Overall, this tends to give them the edge, particularly for hardware products. But WebRTC’s wide availability on computers can be a bonus in some circumstances. Have a listen and come to your own conclusion.
Views and opinions expressed on this website are those of the author(s) and do not necessarily reflect those of SMPTE or SMPTE Members.
This website is presented for informational purposes only. Any reference to specific companies, products or services does not represent promotion, recommendation, or endorsement by SMPTE