Video: WebRTC: Mostly the video bits

Who better to dig below the surface of WebRTC, which delivers sub-second latency, than Sean DuBois, creator of the Pion WebRTC library? This video takes a different look at WebRTC to others that focus on latency or scaling. Rather Sean looks at congestion control and managing the impacts of congestion noting that people remember how bad the video got and not how nice your sign-up page was.

Congestion is inevitable in large ‘unmanaged’ networks such as the internet and on wifi and cellular networks. Sean points out that the use of MPEG codecs which add dependencies between frames magnify the effect of lost packets. With frame-by-frame codecs, dropping a frame and repeating the last one is barely noticeable, but with MPEG, many more could be damaged. WebRTC was implemented over UDP so it could use its own congestion control.

RTP and RTCP are the key to WebRTC’s congestion control. RTP is well known for carrying real-time media as it’s used for AES67 audio, SMPTE ST 2110 and ST 2022-6 to name just a few standards. RTCP is RTP’s sidekick. Whilst RTP does the legwork of carrying the media, the RTP Control Protocol (RTCP) passes messages to control the flow. In this case, Sean explains, the RTCP channel is used to tell the sender that it’s sending too much video or which packets it’s lost. In terms of mitigating congestion, the source can adjust the bitrate directly or change the resolution or the framerate of the video to bring the bitrate down indirectly.



Sean shows a summary diagram of congestion controller flow which is built to handle jitter and out of order packets. Buffers are the normal way of fixing out-of-order packets but they have a big downside of adding latency and exacerbating timing problems. WebRTC has to use the RTCP channel to make sure it can map packet timing with NTP, using Sender Reports, as each packet’s timing information is only relative. When packet loss is spotted NACK (negative acknowledgements) are sent via RTCP or if things are worse, a Picture Loss Indication is sent which request a new keyframe. Fixing any impairments that do occur can be done either with FEC or by concealing the error with some form of masking, nowadays this may be based on machine learning.

The talk finishes with a look at a number of innovative projects which use WebRTC in one way or another, including for file transfer.

Watch now!

Sean DuBois Sean DuBois
Creator, Pion WebRTC
Developer, Apple

Video: UHD and HDR at the BBC – Where Are We Now, and Where Are We Going? –

Has UHD been slow to roll out? Not so, we hear in this talk which explains the work to date in standardising, testing and broadcasting in UHD by the BBC and associated organisations such as the EBU.

Simon Thompson from BBC R&D points out that HD took decades to translate from an IBC demo to an on-air service, whereas UHD channels surfaced only two years after the first IBC demonstration of UHD video. UHD has had a number of updates from the initial resolution focused definition which created UHD-1, 2160p lines high and UHD-2 which is often called 8K. Later, HDR with Wide Colour Gamut (WCG) was added which allowed the image to much better replicate the brightnesses the eye is used to and almost all of the naturally-occurring colours; it turns out that HD TV (using REC.709 colour) can not reproduce many colours commonly seen at football matches.

In fact, the design brief for HDR UHD was specifically to keep images looking natural which would allow better control over the artistic effect. In terms of HDR, the aim was to have a greater range than the human eye for any one adpation state. The human eye can see an incredible range of brightnesses, but it does this by adapting to different brightness levels – for instance by changing the pupil size. When in a fixed state the eye can only access a subset of sensitivity without further adapting. The aim of HDR is to have the eye in one adaptation state due to the ambient brightness, then allow the TV to show any brightness the eye can then hold.

Simon explains the two HDR formats: Dolby’s PQ widely adopted by the film industry and the Hybrid Log-Gamma format which is usually favoured by broadcasters who show live programming. PQ, we hear from Simon, covers the whole range of the human visual system meaning that any PQ stream has the capability to describe images from 1 to 10,000 Nits. In order to make this work properly, the mix needs to know the average brightness level of the video which will not be available until the end of the recording. It also requires sending metadata and is dependent on the ambient light levels in the room.

Hybrid Log-Gamma, by contrast, works on the fly. It doesn’t attempt to send the whole range of human eye and no metadata needed. This lends itself well to delivering HDR for live productions. To learn more about the details of PQ and HLG, check out this video.

Simon outlines the extensive testing and productions done in UHD and looks at the workflows possible. The trick has been finding the best way to produce both an SDR and an HDR production at the same time. The latest version that Simon highlights had all the 70 cameras being racked in HDR by people looking at the SDR down-mix version. The aim here is to ensure that the SDR version looks perfect, as it still serves over 90% of the viewership. However, the aim is to move to a 100% HDR production with SDR being derived off the back of that without any active monitoring. The video ends with a look to the challenges yet to be overcome in UHD and HDR production.

Watch now!

Simon Thompson Simon Thompson
Senior R&D Engineer

Video: A video transport protocol for content that matters

What is RIST and why’s it useful? The Reliable Internet Stream Protocol was seeing as strong uptake by broadcasters and other users wanting to use the internet to get their video from A to B over the internet even before the pandemic hit.

Kieran Kunhya from Open Broadcast Systems explains what RIST is trying to do. It comes from a history of expensive links between businesses, with fixed lines or satellite and recognises the increased use of cloud. With cloud computing increasingly forming a key part of many companies’ workflows, media needs to be sent over the internet to get into the workflow. Cloud technology, he explains, allows broadcasters to get away from the traditional on-prem model where systems need to be created to handle peak workload meaning there could be a lot of underutilised equipment.

Whilst the inclination to use the internet seems only too natural given this backdrop, RIST exists to fix the problems that the internet brings with it. It’s not controversial to say that it loses packets and adds jitter to signals. On top of that, using common file transfer technologies like HTTP on TCP leaves you susceptible to drops and variable latency. For broadcasters, it’s also important to know what your latency will be, and know it won’t change. This isn’t something that typical TCP-based technologies offer. On top of solving these problems, RIST also sets out to provide an authenticated, encrypted link.

Ways of doing this have been done before, with Zixi and VideoFlow being two examples that Kieran cites. RIST was created in order to allow interoperability between equipment in a vendor-neutral way. To underline it’s open nature, Kieran shows a table of the IETF RFCs used as part of the protocol.

RIST has two groups of features, those in the ‘Simple Profile’ such as use of RTP, packet loss recovery, bonding and hitless switching. Whereas the ‘Main Profile’ adds on top of that tunnelling (including the ability to choose which direction you set up your connection), encryption, authentication and null packets removal. Both of these are available as published specifications today. A third group of features is being planned under the ‘enhanced profile’ to be released around the beginning of Q2 2021.

Kieran discusses real-world proof points such as a 10-month link which had lost zero packets, though had needed to correct for millions of lost packets. He discusses deployments and moves on to SRT. SRT, Secure Reliable Transport, is a very popular technology which achieves a lot of what RIST does. Although it is an open-source project, it is controlled by one vendor, Haivision. It’s easy to use and has seen very wide deployment and it has done much to educate the market so people understand why they need a protocol such as RIST and SRT so has left a thirst in the market. Kieran sees benefit in RIST having brought together a whole range of industry experts, including Haivision, to develop this protocol and that it already has multipath support, unlike SRT. Furthermore, at 15% packet loss, SRT doesn’t work effectively whereas RIST can achieve full effectiveness with 40% packet loss, as long as you have enough bandwidth for a 200% overhead.

Watch now!

Kieran Kunhya Kieran Kunhya
Director, RIST Forum
Founder & CEO, Open Broadcast Systems

Video: Let’s be hAV1ng you

AV1 is now in use for some YouTube feeds, Netflix also can deliver AV1 to Android devices so we are no longer talking about “if AV1 happens” or “when AV1’s finished”. AV1 is here to stay, but in a landscape of 3 new MPEG codecs, VVC, EVC and LCEVC, the question moves to “when is AV1 the right move?”

In this talk from Derek Buitenhuis, we delve behind the scenes of AV1 to see which AV1 terms can be, more-or-less, mapped to which MPEG terms. AV1 is promoted as a royalty-free codec, although notably a patent pool has appeared to try and claim money from users. Because it’s not reusing ideas from other technologies, the names and specific functions of parts of the spec are both not identical to other codecs, but are similar in function.

Derek starts by outlining some of the terms we need to understand before delving in further such as “Temporal Unit” which of course is called a TU and is analogous to a GOP. Then he moves on to highlight the many ways in which previous DCT-style work has been extended meaning the sizes and types of DCT have been increased, and the prediction modes have changed. All of this is possible but increases computation.

Derek then highlights several major tools which have been added. One is the prediction of the Chroma from the Luma signal. Another is the ‘Constrained Direction Enhancement Filter’ which improves the look of diagonal hard edges. The third is ‘switch frames’ which are similar to IDR frames or, as Derek puts it ‘a fancy P-frame.’ There is also a Multi-Symbolic Arithmetic Codec which is a method of guessing a future binary digit which, based on probability, allows you to encode a subset of the number but just enough to ensure that the algorithm will come out with the full number,

After talking about the Loop Restoration Filter Derek then critiques a BBC article which drew, it seems, incorrect conclusions based on not enabling the appropriate functions needed for good compression and also suggesting that there was not enough information provided for anyone else to replicate the experiment. Derek then finishes with MS-SIM plots of different encoders.

Watch now!
Download the slides.

Derek Buitenhuis Derek Buitenhuis
Senior Video Encoding Engineer,