Video: DASH: from on-demand to large scale live for premium services

A bumper video here with 7 short talks from VideoLAN, Will Law and Hulu among others, all exploring the state of MPEG DASH today, the latest developments and the hot topics such as low latency, ad insertion, bandwidth prediction and one red-letter feature of DASH – multi-DRM.

The first 10 minutes sets the scene introducing the DASH Industry Forum (DASH IF) and explaining who takes part and what it does. Thomas Stockhammer, who is chair of the Interoperability Working Group explains that DASH IF is made of companies, headline members including Google, Ericsson, Comcast and Thomas’ employer Qualcomm who are working to promote the adoption of MPEG-DASH by working to improve the specification, advise on how to put it into practice in real life, promote interoperability, and being a liaison point for other standards bodies. The remaining talks in this video exemplify the work which is being done by the group to push the technology forward.

Meeting Live Broadcast Requirements – the latest on DASH low latency!
Akamai’s Will Law takes to the mic next to look at the continuing push to make low-latency streaming available as a mainstream option for services to use. Will Law has spoken about about low latency at Demuxed 2019 when he discussed the three main file-based to deliver low latency DASH, LHLS and LL-HLS as well as his famous ‘Chunky Monkey’ talk where he explains how CMAF, an implementation of MPEG-DASH, works in light-hearted detail.

In today’s talk, Will sets out what ‘low latency’ is and revises how CMAF allows latencies of below 10 seconds to be achieved. A lot of people focus on the duration of the chunks in reducing latency and while it’s true that it’s hard to get low latency with 10-second chunk sizes, Will puts much more emphasis on the player buffer rather than the chunk size themselves in producing a low-latency stream. This is because even when you have very small chunk sizes, choosing when to start playing (immediately or waiting for the next chunk) can be an important part of keeping the latency down between live and your playback position. A common technique to manage that latency is to slightly increase and decrease playback speed in order to manage the gap without, hopefully, without the viewer noticing.

Chunk-based streaming protocols like HLS make Adaptive Bitrate (ABR) relatively easy whereby the player monitors the download of each chunk. If the, say, 5-second chunk arrives within 0.25 seconds, it knows it could safely choose a higher-bitrate chunk next time. If, however, the chunk arrives in 4.8 seconds, it can choose to the next chunk to be lower-bitrate so as to receive the chunk with more headroom. With CMAF this is not easy to do since the segments all arrive in near real-time since the transferred files represent very small sections and are sent as soon as they are created. This problem is addressed in a later talk in this talk.

To finish off, Will talks about ‘Resync Elements’ which are a way of signalling mid-chunk IDRs. These help players find all the points which they can join a stream or switch bitrate which is important when some are not at the start of chunks. For live streams, these are noted in the manifest file which Will walks through on screen.

Ad Insertion in Live Content:Pre-, Mid- and Post-rolling
Whilst not always a hit with viewers, ads are important to many services in terms of generating the revenue needed to continue delivering content to viewers. In order to provide targeted ads, to ensure they are available and to ensure that there is a record of which ads were played when, the ad-serving infrastructure is complex. Hulu’s Zachary Cava walks us through the parts of the infrastructure that are defined within DASH such as exchanging information on ‘Ad Decision Parameters’ and ad metadata.

In chunked streams, ads are inserted at chunk boundaries. This presents challenges in terms of making sure that certain parameters are maintained during this swap which is given the general name of ‘Content Splice Conditioning.’ This conditioning can align the first segment aligned with the period start time, for example. Zachary lays out the three options provided for this splice conditioning before finishing his talk covering prepared content recommendations, ad metadata and tracking.

Bandwidth Prediction for Multi-bitrate Streaming at Low Latency
Next up is Comcast’s Ali C. Begen who follows on from Will Law’s talk to cover bandwidth prediction when operating at low-latency. As an example of the problem, let’s look at HTTP/1.1 which allows us to download a file before it’s finished being written. This allows us to receive a 10-second chunk as it’s being written which means we’ll receive it at the same rate the live video is being encoded. As a consequence, the time each chunk takes to arrive will be the same as the real-time chunk duration (in this example, 10 seconds.) When you are dealing with already-written chunks, your download time will be dependent on your bandwidth and therefore the time can be an indicator of whether your player should increase or decrease the bitrate of the stream it’s pulling. Getting back this indicator for low-latency streams is what Ali presents in this talk.

Based on this paper Ali co-authored with Christian Timmerer, he explains a way of looking at the idle time between consecutive chunks and using a sliding window to generate a bandwidth prediction.

Implementing DASH low latency in FFmpeg
Open-source developer Jean-Baptiste Kempf who is well known for his work on VLC discusses his work writing an MPEG-DASH implementation for FFmpeg called the DASH-LL. He explains how it works and who to use it with examples. You can copy and paste the examples from the pdf of his talk.

Managing multi-DRM with DASH
The final talk, ahead of Q&A is from NAGRA discussing the use of DRM within MPEG-DASH. MPEG-DASH uses Common Encryption (CENC) which allows the DASH protocol to use more than one DRM scheme and is typically seen to allow the use of ‘FairPlay’, ‘Widevine’ and ‘PlayReady’ encryption schemes on a single stream dependent on the OS of the receiver. There is complexity in having a single server which can talk to and negotiate signing licences with multiple DRM services which is the difficulty that Lauren Piron discusses in this final talk before the Q&A led by Ericsson’s VP of international standards, Per Fröjdh.

Watch now!
Speakers

Thomas Stockhammer Thomas Stockhammer
Director of Technical Standards,
Qualcomm
Will Law Will Law
Chief Architect,
Akamai
Zachary Cava Zachary Cava
Software Architect,
Hulu
Ali C. Begen Ali C. Begen
Technical Consultant, Video Architecture, Strategy and Technology group,
Comcast
Jean-Baptiste Kempf Jean-Baptiste Kempf
President & Lead VLC Developer
VideoLAN
Laurent Piron Laurent Piron
Principal Solution Architect
NAGRA
Per Fröjdh Moderator: Per Fröjdh
VP International Standards,
Ericsson

Video: IP Fundamentals For Broadcast Seminar IV

“When networking gets real”, perhaps, could have been the title of this last of 4 talks about IP for broadcast. This session wraps up a number of topics from the classic ‘TCP Vs. UDP’ discussion to IPv6 and examines the switches and networks that make up a network as well as the architecture options. Not only that, but we also look at VPNs and firewalls finishing by discussing some aspects of network security. When viewed with the previous three talks, this discusses many of the nuances from the topics already covered bringing in the relevance of ‘real world’ situations.

Wayne Pecena, President of SBE, starts by discussing subnets and collision domains. The issue with any NIC (Network Interface Controller) is that it’s not to know when someone else is talking on the wire (i.e. when another NIC is sending a message by changing the voltage of the wire). It’s important that NICs detect when other NICs are sending messages and seek to avoid sending while this is happening. If this does’t work out well, then two messages on the same wire are seen as a ‘collision’. It’s no surprise that collisions are to be avoided which is the starting point of Wayne’s discussion.

Moving from Layer 2 to Layer 4, Wayne pits TCP against UDP looking at the pros and cons of each protocol. Whilst this is no secret, as part of the previous talks this is just what’s needed to round the topic off ahead of talking about network architecture.

“Building and Securing a Segmented IP Network Infrastructure” is the title of the next talk which starts to deal with real-world problems when an engineer gets back from a training session and starts to actually specify a network herself. How should the routers and switches be interconnected to deliver the functionality required by the business and, as we shall see, which routers/switches are actually needed? Wayne discusses some of the considerations of purchasing switches (layer 2) and routers (layer 3 & 2) including the differing terms used by HP and Cisco before talking about how to assign IP addresses, also called an IP space. Wayne takes us through IP addressing plans, examples of what they would look like in excel along with a lot of the real-world thinking behind it.

Security is next on the list, not just in terms of ‘cybersecurity’ in the general sense but in terms of best practice, firewalls and VPNs. Wayne takes a good segment of time out to discus the different aspects of firewalls – how they work, ACLs (Access-control Lists), and port security amongst other topics before doing the same for VPNs (Virtual Private Networks) before making the point that a VPN and a firewall are not the same. A VPN allows you to extend a network out from a building to be in another – the typical example being from your work’s address into your home. Whilst a VPN is secured so that only certain people can extend the network, a firewall more generally acts to prevent anything coming into a network.

As an addendum to this talk, Wayne explains IPV4 depletion and how IPv6 addressing works. In practice, for broadcasters deploying within their company in the year 2020, IPv6 is unlikely to be a topic needed. However, for people who are distributing to homes and working closer with CDNs and ISPs, there is a chance that this information is more relevant on a day-to-day basis. Whilst IP address depletion is a real thing, since every company has a 10.x.x.x address space to play with, most companies use internal equipment with an IPv4 address plan.
Watch now!
Speaker

Wayne Pecena Wayne Pecena
Director of Engineering, KAMU TV/FM at Texas A&M University
President, Society of Broadcast Engineers AKA SBE

Video: PTP in Virtualized Media Environment

How do we reconcile the tension between the continual move towards virtualisation, microservices and docker-like deployments and the requirements of SMPTE 2110 to have highly precise timing so it can synchronise the video, audio and other essence streams? Virtualisation adds fluidity in to computing so it can share a single set of resources amongst many virtual computers yet PTP, the Precision Time Protocol a successor to NTP, requires close to nano-second precision in its timestamps.

Alex Vainman from Mellanox explains how to make PTP work in these cases and brings along a case study to boot. Starting with a little overview and a glossary, Alex explains the parts of the virtual machine and the environment in which it sits. There’s the physical layer, the hypervisor as well as the virtual machines themselves – each virtual machine being it’s own self-contained computer sitting on a shared computer. Hardware must be shared between, often, many different computers. However some devices aren’t intended to be shared. Take, for instance, a dongle that contains a licence for software. This should clearly be only owned by one computer therefore there is a ‘direct’ mode which means that it is only seen by one computer. Alex goes on to explain the different virtualisation I/O modes which allow devices which can be shared, a printer, storage or CPU for instance need to have access computers may need to wait until they have access to the device to enable sharing. Waiting, of course, is not good for a precision time protocol.

In order to understand the impact that virtualisation might have, Alex details the accuracy and other requirements necessary to have PTP working well enough to support SMPTE 2110 workflows. Although PTP is an IEEE standard, this is just a standard to define how to establish accurate time. It doesn’t help us understand how to phase and bring together media signals without SMPTE ST 2059-1 and -2 which provide the standard of the incoming PTP signal and the way by which we can compare timing and media signals (more info here.) All important is to understand how PTP can actually determine the accurate time given that we know every single message has an unknown propagation delay. By exchanging messages, Alex shows, it is quite practical to measure the delays involved and bring them into the time calculation.

We now have enough information to see why the increased jitter of VM-based systems is going to cause a problem as there are non-deterministic factors such as contention and traffic load to consider as well as factors such as software overhead. Alex takes us through the different options of getting PTP well synchronised in a variety of different VM architectures. The first takes the host clock and ensures that is synchronised to PTP. Using a dedicated PTP library within the VM, this then speaks to the host clock and synchronises the VM OS clock providing very accurate timing. Another, where hardware support in the VM’s hardware for PTP isn’t present, is to have NICs with dedicated PTP support which can directly support the VM OSes maintaining their own PTP clock. The major downside here is that each OS will have to make their own PTP calls creating more load on the PTP system as opposed to the previous architecture whereby the host clock of the VM was the only clock synchronising to the system PTP and therefore there was only ever one set of PTP messages no matter how many VMs were being supported.

To finish off, Alex explains how Windows VMs can be supported – for now through third-party software – and summarises the ways in which we can, in fact, create PTP ecosystems that incorporate virtual machines.

Watch now!
Download the slides
Speakers

Alex Vainman Alex Vainman
Senior Staff Engineer,
Mellanox Technologies

Video: Super Resolution – The scaler of tomorrow, here today!

If we ever had a time when most displays were the same resolution, those days are long gone with smartphone and tablets with extremely high pixel density nestled in with laptop screens of various resolutions and 1080-line TVs which are gradually being replaced with UHD variants. This means that HD videos are nearly always being upscaled which makes ‘getting upscaling right’ a really worthwhile topic. The well-known basic up/downscaling algorithms have been around for a while, and even the best-performing Lanczos is well over 20 years old. The ‘new kid on the block’ isn’t another algorithm, it’s a whole technique of inferring better upscaling using machine learning called ‘super resolution’.

Nick Chadwick from Mux has been running the code and the numbers to see how well super resolution works. Taking to the stage at Demuxed SF, he starts by looking at where scaling is used and what type it is. The most common algorithms are nearest neighbour, bi-cubic, bi-linear and lanczos with nearest neighbour being the most basic and least-well performing. Nick shows, using VMAF that using these for up and downscaling, that the traditional opinions of how well these algorithms perform are valid. He then introduces some test videos which are designed to let you see whether your video path is using bi-linear or bi-cubic upscaling, presenting his results of when bi-cubic can be seen (Safari on a MacBook Pro) as opposed to bi-linear (Chrome on a MacBook Pro). The test videos are available here.

In the next part of the talk, Nick digs a little deeper into how super resolution works and how he tested ffmpeg’s implementation of super resolution. Though he hit some difficulties in using this young filter, he is able to present some videos and shows that they are, indeed, “better to view” meaning that the text looks sharper and is easier to see with details being more easy pick out. It’s certainly possible to see some extra speckling introduced by the process, but VMAF score is around 10 points higher matching with the subjective experience.

The downsides are a very significant increase in computational power needed which limits its use in live applications plus there is a need for good, if not very good, understanding of ML concepts and coding. And, of course, it wouldn’t be the online streaming community if clients weren’t already being developed to do super-resolution on the decode despite most devices not being practically capable of it. So Nick finishes off his talk discussing what’s in progress and papers relating to the implementation of super resolution and what it can borrow from other developing technologies.

Watch now!
Speaker

Nick Chadwick Nick Chadwick
Software Engineer,
Mux