Video: Machine Learning for Per-title Encoding

AI’s continues its march into streaming with this new approach to optimising encoder settings to keep down the bitrate and improve quality for viewers. By its more appropriate name, ‘machine learning’, computers learn how to characterise video to avoid hundreds of encodes whilst determining the best way to encode video assets.

Daniel Silhavy from Fraunhofer FOKUS takes the stand at Mile High Video 2020 to detail the latest technique in per-title and per-scene encoding. Daniel starts by outlining the problem with fixed ABR which is that efficiencies are gained by being flexible both with resolution and with bitrate.

Netflix were the best-known pioneers of the per-title encoding idea where, for each different video asset, many, many encodes are done to determine the best overall bitrate to choose. This is great because it will provide for animation-based files to be treated differently than action films or sports. Efficiency is gained.

However, per-title delivers an average benefit. There are still parts of the video which are simple and could see reduced bitrate and arts where complexity isn’t accounted for. When bitrate is higher than necessary to achieve a certain VMAF score, Danel calls this ‘wasted quality’. This means bitrate was used making the quality better than we needed it to be. Whilst better quality sounds like a boon, it’s not always possible for it to be seen, hence having a target VMAF at a lower level.

Naturally, rather than varying the resolution mix and bitrate for each file, it would be better to do it for each scene. Working this way, variations in complexity can be quickly accounted for. This can also be done without machine learning, but more encodes are needed. The rest of the talk looks at using machine learning to take a short-cut through some of that complexity.

The standard workflow is to perform a complexity analysis on the video, working out a VMAF score at various bitrate and resolution combinations. This produces a ‘Convex hull estimation’ allowing determination of the best parameters which then feed in to the production encoding stage.

Machine learning can replace the section which predicts the best bitrate-resolution pairs. Fed with some details on complexity, it can avoid multiple encodes and deliver a list of parameters to the encoding stage. Moreover, it can also receive feedback from the player allowing further optimisation of this prediction module.

Daniel shows a demo of this working where we see that the end result has fewer rungs on the ABS ladder, a lower-resolution top rung and fewer resolutions in general, some repeated at different bitrates. This is in common with the findings of Facebook which we covered last week who found that if they removed their ‘one bitrate per resolution rule’ they could improve viewers’ experience. In total, for an example Fraunhofer received from a customer, they saw a 53% reduction in storage needed.

Watch now!
Download the slides
Speakers

Daniel Silhavy
Scientist & Project Manager,
Fraunhofer FOKUS

Video: I know X, what does WebRTC get me?

WebRTC is now a W3C standard providing sub-second peer-to-peer video and audio streaming with NAT traversal. Widely used for video conferencing, its sub-second latency has also been the focus of video streaming companies such as Millicast and Limelight (to name but two) who aim to deliver this otherwise peer-to-peer technology to thousands or millions of people in under a second enabling interactive video, gamefied streams, auctions and ultra-low-latency sports.

Addressing directly people using other streaming protocols, Pion creator Sean DuBois spoke at SF Video Tech about what WebRTC brings over and above protocols like RTMP, SRT and RIST. At the heart of it, WebRTC, like SRT and RIST, creates a connection over which it can send a variety of data. Whilst we expect media to be sent, actually, file transfer can be easily achieved – let’s not forget the whole of SRT is build upon UDT which is specifically a file delivery utility. Where file transfer can be achieved, so can real-time data & metadata transfer.

Sean quickly summarises WebRTC as a Protocol between (typically) browsers, an peer-to-peer secure connection over which multiple audio & video streams can flow. In common with RIST and other recent protocols, it’s based on many pre-existing
technologies such as SRTP, DTLS, ICE and SDP to deliver signalling, connection management, encryption and communication.

 

 

The list of improvements over RTMP is very long. They’re spelt out concisely in the video so we will highlight just a few here. Importantly, low-latency is key. RTMP was low-latency for its time, but not by today’s standards. Google’s Stadia can boast 125ms video latency for a keypress, explains Sean. DTLS and SRTP are essential for security but are well understood, trusted methods of securing your data. DTLS is pretty much exactly the same as the TLS which secures your bank transfers, just moved into UDP instead of TCP. However, WebRTC can work by exchanging ‘fingerprints’ (DTLS-SRTP) instead of the full trusted certificate infrastructure that underpins TLS on the web. Removing the requirement for certs is a big boost for flexibility and agility as long as you are confident you can exchange fingerprints securely ahead of time.

NAT traversal is also a big boon where, even with both endpoints behind a firewall, endpoints can always find a way to communicate although this does mean that ICE servers are needed to facilitate connectivity. Within broadcasting, however, it’s more likely that you’ll have control of one end so this is less needed. Sean highlights the ability to send multiple quality levels within the same stream using the ‘simulcast’ ability of WebRTC.

Sean then looks at SRT and RIST. Both of these are low-latency streaming protocols which can, both, also provide sub-second streaming for good connections with a relatively low RTT. Sean highlights the lack of SRT and RIST to negotiate the codec in use and their optional security. Being focused more on delivering contribution feeds, they tend to have a more static configuration often created after a programme of testing to ensure the quality will be acceptable to the broadcaster/streaming provider.

To finish, Sean highlights a whole series of interesting, innovative uses of WebRTC from informal group streaming to drones to shared online games to file transfers and more.

Watch now!
Speaker

Sean DuBois Sean DuBois
Developer, Apple
Creator of Pion WebRTC

Video: The QUIC-ematic universe season 2020-2021 preview

QUIC is an encrypted transport protocol with better performance than HTTP and HTTP/2. While young, it’s already seeing some use in the larger internet companies who are learning how best to harness the optimisations. One of the stark differences is that it’s built on top of UDP rather than TCP. This is one of the main ways it increases efficiency. Freed from TCP’s constant acknowledgement of packets, QUIC also ensures reliable delivery but on its own terms which allows it to prioritise speedy delivery over acknowledgement admin. We’ve covered QUIC before, so if it’s new to you, check out this explainer as this talk is an update on what’s happened in 2020 and the plans for 2021 as QUIC aims to be standardised and much more available.

Lucas Pardue from Cloudflare works on the IETF working group devoted to QUIC and spoke at Demuxed 2020. “The IETF are standardisers” he says with QUIC being on its 31st draft with a move to standardise during 2021 what is called ‘IETF QUIC’ to differentiate from a slightly different version of QUIC from Google. IETF quick, Lucas outlines, delivers secure, reliable stream multiplexing.
 

 

QUIC actually forms a base layer for other applications like HTTP/3 with HTTP semantics to work on top of. Like most modern standards, QUIC is actually a name for more than one document. There is a transport layer, header compression, TLS handshake description and a document for recovery and loss protection. QUIC itself lives on UDP datagrams which is why one of the new options coming is to turn off some of the reliability which has been built on top of UDP to deliver TCP-like reliability for data which doesn’t really need it. One possibility here is running a QUIC tunnel where one QUIC connection actually has many QUIC streams within it. In this circumstance, you only want any one bit of data being protected by one reliable transmission mechanism. So you’d want to be turning off reliable transmission for your internal QUIC streams as they would be protected by the outer QUIC layer. There is a project called MASQUE which is working on this.

As with anything arriving on the market, it’s important to establish interoperability. We see this with the JT-NM and SRT plugfests. Lucas shows us the QUIC interop tester which automatically tests the latest implementations with each other and shows the results in a matrix plus allows access to logs and packet traces.

Lucas reminds us the QUIC streams are a first-class transport primitive providing reliable delivery. Within a stream, data will be delivered in order, but QUIC doesn’t specify how to schedule multiplexed streams. HTTP3 initially borrowed HTTP/2’s prioritisation scheme but found a better way to prioritise which is currently being discussed and finalised. Lucas has been working on quiche, Cloudflare’s own implementation of QUIC and shows a three-step process to getting quiche up and running.

Web Transport is another offering from QUIC which complements WebSocket which gives web apps better access to QUIC itself. The Chome Origin Trial explains how this is built in Chrome. Lucas talks about a test project he built on top of existing examples which is hosted at http3.wtf

Lucas ends by summarising the coming year: The working group is aiming to deliver documents to an IETF last call ahead of publication. The community will continue to get deployment experience as new users ar already working on enabling the technology and there is still work to be done on other adopted work items as well as considering others. Lucas ends by encouraging viewers to join in with the community,

Watch now!
Speaker

Lucas Pardue Lucas Pardue
Senior Software Engineer, Cloudflare
Co-Chair of the QUIC working group, IETF

Video: IPMX – Debunking the Myths

2110 for AV? IPMX is an IP specification for interoperating Pro AV equipment. SMPTE’s 2110 standard suite is very powerful, but not deployable easily enough to rig for a live event. At the moment the is no open standard in Pro AV that can deliver IP. Whilst there are a number of proprietary alliances, which enable widespread use of a single chip or software core, this interoperability comes at a cost and ultimately is underpinned by one, or a group of companies.

Dave Chiappini from Matrox discusses the work of the AIMS Pro AV working group which is developing IPMX. Dave underlines the fact that this is a pull to unify the Pro AV industry to help people avoid investing over and over again in reinventing protocols or reworking their products to interoperate. He feels that ‘open standards help propel markets forward’ adding energy and avoiding vendor lock-in. This is one reason for the inclusion of NMOS, allowing any vendor to make a control system by working to the same open specification, opening up the market to both small and large companies.

The Pro AV market needs more than just swift deployment. HDMI is pervasive and is able to carry more frame rates and resolutions than SDI so HDMI support is to of the list of features that IPMX will add on top of 2110, NMOS and PTP. HDMI also uses HDCP so AIMS is now working with the DCP on creating a method of carrying HDCP over 2110. TVs are already replacing SDI monitors, such interoperability with HDMI should bring down the costs of monitoring for non-picture critical environments.

Timing can be pricey and complex if PTP and GPS are required. A lot of time and effort goes into making the PTP infrastructure work properly within SMPTE 2110 infrastructure. Having to do this at an event whilst setting up in a short timespan is not helpful to anyone and, elaborates Dave, a point to point video link simply doesn’t need high precision timing. Not only does IPMX relax the timing requirements, but it will also support asynchronous video streams.

David explains that whilst there are times when zero compression is needed in both AV and Broadcast, a lot of the time we need video that will easily fit into 1Gbps. For this, JPEG XS is being used which is a lightweight codec that can be run in software, FPGA and more. This supports 4:4:4 video for maximum fidelity. For more about JPEG XS, have a listen to this talk. Some good news for bandwidth fans is that all new Intel chips support 2.5Gbe networking using existing cabling which IMPX will be supporting.

Pro AV needs the ability to throw some preview video out to an iPad or similar. This isn’t going to work with JPEG XS, the preferred ‘minimal compression’ codec for IPMX, so a system for including H264 or H265 is being investigated which could have knock-on benefits for Broadcast.

David finishes by underlining that IMPX will be an open standard that can be implemented in software on a server, on a desktop or on a mobile phone. It’s scalable and ready to support the ProAV and events industry.

Watch now!
Speakers

David Chiappini David Chiappini
Chair, Pro AV Working Group, AIMS
Executive Vice President, Research & Development,
Matrox Graphics Inc.