Errors in streaming often require deep knowledge that system specialsts and developers have, but getting them the data they need is often an uphill struggle. This video shows ways in which we can short circuit this problem showing some approaches that Bitmovin is taking to get the data to the right people. Bitmovin announced, yesterday, €25M of further investment in the company. We’ve featured Bitmovin many times here on The Broadcast Knowledge talking about codecs, low-latency live streaming or super-resolution. Reading through this full list makes it clear that Bitmovin’s interested in the whole chain from encode to delivery.
Christoph Prager sets the scene looking at an analysis of errors showing that only 15% have a clear reason with 65% being ambiguous. If an error’s ambiguous, you need data to drill into it and disambiguate the situation. This is exacerbated by the standard aggregate metrics which make getting to the root cause very difficult. Definitions of ‘buffering percentage’ and ‘startup time’ are very useful to gauge the scale of an issue or to find there’s even a problem to begin with. But for developers, they are like the foreword to the book they need to read to find the problem. This has led Bitmovin to think from the angle that errors are a lot more obvious when you have the data.
Daniel Hölbling-Inzko takes us through Bitmovin’s new features to expose data surrounding errors. Whilst these will be coming to Bitmovin products, they show what a useful set of tools for debugging would be and can inspire the same in your platform if you are able to customise those aspects of it. Daniel points out that the right detailed information can be useful to customer support, but it’s the deeper information that he’s interested it. Bitmovin can collate all the stack traces from problem places but also track segments from the time there was an error.
Segment tracking shows the status, type, download speed, time to first byte and the size of each of 10 segments from around the time the error was collected. Viewing these can help see trends such as diminishing bandwidth or just simply show that a problem happened abruptly. Daniel talks through three errors where segment tracking can help you pinpoint problems: ‘NETWORK_SEGMENT_DOWNLOAD_TIMEOUT’, ‘ANALYTICS_BUFFERING_TIMEOUT’ and ‘DRM: license request failed’. Because the requests are now split out individually it makes it easy to see where the 403 error is that is stopping the DRM or how the internet speed is dropping resulting in an analytics timeout. Daniel highlights that it’s the trends that are usually the most important part.
Part II in this Cisco series on PTP, Precision Time Protocol, focuses on Boundary Clocks and Transparent Clocks. Last week we heard how PTP maintains accurate time by calculating the delay between clocks and the grandmaster clock which is the source of time for the network. This video summarises how to distribute that source of time to all your devices and how to choose between the two methods.
Albert Mitchell from Cisco explains that transparent clocks are just that, they transparently let the timing data flow through. All they do is update the timestamps on the outgoing packets to compensate for the extra time getting through the switch. A boundary clock (BC), however, is a source of time of itself but gets its time from the grandmaster like any other clock. Acting in this dual way, it creates the boundary it’s named after. It’s a boundary because it provides time to other end devices on the network, These devices never see the grandmaster, they only see the BC. Likewise, the grandmaster only sees the BC acting like any ordinary clock sending delay requests. This means that the boundary clock can shield the grandmaster from the rest of the devices on the network. A grandmaster with 10 boundary clocks can deliver time to over a thousand endpoints without a problem. Without the boundary clocks, the grandmaster may not be able to handle the two-way conversations necessary with so many clocks.
For broadcast networks, boundary clocks are preferred as they enable easier diagnosis and can reduce the blast radius of problems. Importantly they can span multiple VLANs. Other benefits are that they filter packet delay variation and shields the downstream/following clocks from any transient changes in the grandmasters. The downside of BCs is that they do add small errors to the timing which can add up if multiple BCs are concatenated.
Transparent clocks, on the other hand, don’t help with scalability like BCs and are limited to single VLANs. On the plus side, they require no configuration and provide faster convergence.
Lastly, Albert looks at the Best Master Clock Algorithm (BMCA) which is the method used to determine which grandmaster is providing timing to the whole network. For a deeper dive into the BMCA, have a look at this Arista video on PTP timing. Albert gives a good starting overview of how the algorithm works, the data it needs to operate and advice on settings to make sure you know which clock will win in each instance.
Quality of Experience (QoE) has a wider meaning than Quality of Service (QoS) even though viewers have a worse time if either are impacted. What’s the difference and how are companies trying to deal with maximising enjoyment of their services? This panel from Streaming Media brings together Akamai’s Will Law, Robert Colantuoni from Disney Streaming Services, CJ Harvey from HBO Max. and Ian Greenblatt from JD Power detail the nuances of Quality of Experience.
The panel starts by outlining some of the differences between QoS and QoE. Ian explains that QoE is about the whole experience of the UI, recommendations, search, rebuffering and much more. QoS can impact QoE but is restricted to the success of the delivery the stream itself. QoS measures impairments such as rebuffering, macroblocking, video quality, time to play etc. Whilst poor QoS will usually reduce QoE, there’s a lot that a well-written player can do to mitigate the effects of QoS. Having good QoE is ensuring the viewer can put trust in each of their ‘clicks’, that they will know what will happen and won’t have to wait.
“Multi-CDN is the new norm” declares Will Law, as the conversation turns to how players should deal with CDN selection. The challenge is to be picking for the CDN which works best for the user. Robert points out that a great CDN in one geography may not perform so well in another. A player making a ping-based choice at the beginning of playback is going to make a much worse choice overall than a player which samples each CDN in turn and continues to pick the best. This needs to be done carefully though, giving each CDN time to warm up and usefully affect its pre-fetch capabilities.
Where QoE raises itself over QoS is in questions of perception. A good player will not simply target high bitrate, but take in to account colour volume depth, resolution and device to name but three.
There are plenty of questions from the audience covering load balancers, jarring changes between sharp, high budget productions and old episodes of 4:3 TV dramas plus a look-ahead to the next two years of streaming.
Who better to dig below the surface of WebRTC, which delivers sub-second latency, than Sean DuBois, creator of the Pion WebRTC library? This video takes a different look at WebRTC to others that focus on latency or scaling. Rather Sean looks at congestion control and managing the impacts of congestion noting that people remember how bad the video got and not how nice your sign-up page was.
Congestion is inevitable in large ‘unmanaged’ networks such as the internet and on wifi and cellular networks. Sean points out that the use of MPEG codecs which add dependencies between frames magnify the effect of lost packets. With frame-by-frame codecs, dropping a frame and repeating the last one is barely noticeable, but with MPEG, many more could be damaged. WebRTC was implemented over UDP so it could use its own congestion control.
RTP and RTCP are the key to WebRTC’s congestion control. RTP is well known for carrying real-time media as it’s used for AES67 audio, SMPTE ST 2110 and ST 2022-6 to name just a few standards. RTCP is RTP’s sidekick. Whilst RTP does the legwork of carrying the media, the RTP Control Protocol (RTCP) passes messages to control the flow. In this case, Sean explains, the RTCP channel is used to tell the sender that it’s sending too much video or which packets it’s lost. In terms of mitigating congestion, the source can adjust the bitrate directly or change the resolution or the framerate of the video to bring the bitrate down indirectly.
Sean shows a summary diagram of congestion controller flow which is built to handle jitter and out of order packets. Buffers are the normal way of fixing out-of-order packets but they have a big downside of adding latency and exacerbating timing problems. WebRTC has to use the RTCP channel to make sure it can map packet timing with NTP, using Sender Reports, as each packet’s timing information is only relative. When packet loss is spotted NACK (negative acknowledgements) are sent via RTCP or if things are worse, a Picture Loss Indication is sent which request a new keyframe. Fixing any impairments that do occur can be done either with FEC or by concealing the error with some form of masking, nowadays this may be based on machine learning.
The talk finishes with a look at a number of innovative projects which use WebRTC in one way or another, including for file transfer.
Views and opinions expressed on this website are those of the author(s) and do not necessarily reflect those of SMPTE or SMPTE Members.
This website is presented for informational purposes only. Any reference to specific companies, products or services does not represent promotion, recommendation, or endorsement by SMPTE