Video: The Future of SSAI on OTT Devices

Server-Side AD Insertion sounds like a sure-fire way to insert ads without ad-blockers noticing, but it’s not without problems – particularly on OTT devices plugged into the living room TV. As people are used to watching broadcast television on the TV, some of those expectations of broadcast TV are associated with whatever they watch on TV. The quick channel changing, low latency and constant quality are expected even if the viewer is watching a mini OTT streaming device plugged into HDMI input 2.

Phil Cluff from Mux looks at the challenges that devices other than computers throw up when using SSAI at this talk from Mile High Video. In general, OTT devices don’t have much memory or CPU power which renders Client-Side ad insertion impractical. SSAI can be achieved by manipulating the manifest or by rewriting timestamps on video segments. The latter damages the ability to cache chunks, so Phil explores the challenges of the former technique. On the surface, just swapping out some chunks by changing the manifest sounds simple looking at games consoles, smart TVs, streaming boxes and set-top boxes. Unsurprisingly streaming boxes like Apple TV and Roku boxes support the features needed to pull off SSAI fairly well. TVs fair less well, but those relying on Android tend to have workable solutions, explains Phil. The biggest hurdle is getting things working on set-top boxes of which there are thousands of variations, few of which support DRM and DASH well.

Phil examines the rollout of smart TVs finding that most are more than 3 years old which typically means they are on old firmware supporting features that existed when the TV was released but nothing more recent…such as supporting manifest manipulation. With this bleak picture, Phil attempts to ground us saying that we don’t need to deliver ads on all devices. Most services are able to find a core set of devices which form 80% or more of their viewership which means that supporting ads on devices outside of that core is unlikely ever to be profitable. And if it’s not profitable, is there any need to ever show ads on that device? Initially, it doesn’t feel right to deliver without ads to some devices, but if you look at the numbers, you may well find that your development time will never be paid back. An alternative solution is to deliver ads to these people by getting them to watch on Chromecasts you provide instead of on their STB which is a more common option than you may expect.

Phil finishes his talk looking at the future which includes a HbbTV spec specifically aimed around SSAI and a continued battle to find a reliable way to delivering and recording beacons for SSAI.

Watch now!
Speaker

Phil Cluff Phil Cluff
Streaming Architect,
Mux

Video: PTP in WAN Applications & Update on PTP v2.1

PTP is evolving as is our ability to use it over WANs. This video explains what’s new in PTP’s second revision and the evolving techniques of using PTP over a wide area network such as the internet. As PTP was built assuming the use of LANs, the longer and more unpredictable latency of WANs throws off the timing calculations, so what can be done to compensate?

In this video from RAVENNA, Andreas Hildebrand from ALC Networx takes us through PTP 2.1, the 2019 revision of PTP following on from PTP 2.0 in 2008 and from the original 1.0 in 2002. Famously, 2.0 and 1.0 were not compatible with each other which has caused problems with some hardware implementations of DANTE which were first written when 1.0 was the only edition. Importantly, Andreas highlights, version 2.1 is backwards compatible with version 2.0. If you’re looking for a PTP primer before digging in, have a look at this intro video from Daniel Boldt, Meinberg

Andreas explains the use of PTP profiles within both AES67 and SMTPTE 2110 which standardise some of the parameters for using PTP such as message frequency. Whilst they do have different defaults, there is an overlap between the two allowing for use of AES67 streams withing both an AES67 ecosystem and with a 2110-30 ecosystem. These overlaps are detailed in the joint AES and SMPTE document, AES-R16-2016.

“What’s new in PTP v2.1?” asks Andreas. Apart from clearer language, accuracy, flexibility and robustness have been enhanced.

Flexibility
Flexibility comes from the ability to mixed multicast and unicast messages. Announce and sync messages are still sent in multicast, but queries like delayresponse & delayrequest can now be sent unicast which provides better scalability as, for many scenarios, the reply never needs to go back to any other computers. Another example of flexibility is modular transparent clocks i.e. ones in SFPs. Another flexibility improvement is that it’s now possible to isolate profiles without using different PTP domain numbers. This is by adding a Profile ID called ‘SdoId’.

Robustness and Security
PTP now allows inter-domain interactions effectively allowing multiple GMs to vote onto a single ‘multi-domain clock’. This becomes a very robust clock as it’s created from the insight of three grandmasters. PTP v2.1 adds source integrity checking to allow identification of false, injected, messages. A long-needed improvement as security, even of a LAN, can’t be assumed nowadays.

Performance and Accuracy
Stats reporting has been added to PTP v2.1 allowing monitoring of the average, minimum, max and standard deviation of a number of metrics from the leader clock, delay metrics and message reception counters. Accuracy has been boosted to sub-nanosecond by the CERN-related White Rabbit Project which is an overall benefit to PTP even if sub-nanosecond timing isn’t needed.

Source: ALC NetworX

The second part of the video features Meinberg’s Daniel Boldt who discusses how to transmit PTP over the WAN. This is more challenging than a WAN because it’s more likely to be affected by queueing delays – particularly if the WAN in question is the internet. Queueing delays happen for a number of reasons but it all boils down to the fact the switches and routers often have to hold packets in a buffer, something that tends to happen more when there is more load. As such, this means that the delay is variable leading to varying jitter on the measurements.

Another problem often encountered is path changes where a switch happens in the network to divert the signal to a different path. Whilst this is a great way to route around problems, it does mean a sudden change in path length and therefore propagation delay. A conventional ping time may change from 100ms to 250ms in a second. This could have a big impact on the accuracy of a PTP timing signal if undetected.

Finally, the PTP timing algorithm assumes that it takes just as long, and no longer, to get the timing information from A to B as it does to get the follow-up reply from B to A. When one direction takes longer than the other, for instance when one direction is forced through a 100Mbps link rather than 1000Mbps, the PTP timing will have a constant timing error.

Source: Meinberg

Daniel explains that these issues can be mitigated by more thorough processing of the incoming packets. For instance, having a high-quality oscillator which can maintain an accurate frequency for a long time without external input is helpful. Having a local GM on GPS is another good way to avoid problems, in the cases when this is practical, where the WAN PTP becomes an ‘aide’ to timing rather than the authority. Finally, the ‘lucky packets’ technique is demonstrated.

Daniel explains that if you look at the delay of each packet incoming over, say, a two-second period, you can collect all the packets that, based on the timestamp, were lucky enough not to be delayed a lot. By discarding those very-delayed packets, the accuracy of the PTP signal becomes much higher and jitter can reduce, as we see from two case studies, by an order of magnitude.

Watch now!
Speakers

Andreas Hildebrand Andreas Hildebrand
RAVENNA Evangelist,
ALC NetworX
Daniel Boldt Daniel Boldt
Head of Software Development,
Meinberg

Video: Live Production Forecast: Cloudy for the Foreseeable Future

Our ability to work remotely during the pandemic is thanks to the hard work of many people who have developed the technologies which have made it possible. Even before the pandemic struck, this work was still on-going and gaining momentum to overcome more challenges and more hurdles of working in IP both within the broadcast facility and in the cloud.

SMPTE’s Paul Briscoe moderates the discussion surrounding these on-going efforts to make the cloud a better place for broadcasters in this series of presentation from the SMPTE Toronto section. First in the order is Peter Wharton from TAG V.S. talking about ways to innovate workflows to better suit the cloud.

Peter first outlines the challenges of live cloud production, namely keeping latency low, signal quality high and managing the high bandwidths needed at the same time as keeping a handle on the costs. There is an increasing number of cloud-native solutions but how many are truly innovating? Don’t just move workflows into the cloud, advocates Peter, rather take this opportunity to embrace the cloud.

Working with the cloud will be built on new transport interfaces like RIST and SRT using a modular and open architecture. Scalability is the name of the game for ‘the cloud’ but the real trick is in building your workflows and technology so that you can scale during a live event.

Source: TAG V.S.

There are still obstacles to be overcome. Bandwidth for uncompressed video is one, with typical signals up to 3Gbps uncompressed which then drives very high data transfer costs. The lack of PTP in the cloud makes ST 2110 workflows difficult, similarly the lack of multicast.

Tackling bandwidth, Peter looks at the low-latency ways to compress video such as NDI, NDI|HX, JPEG XS and Amazon’s lossless CDI. Peter talks us through some of the considerations in choosing the right codec for the task in hand.

Finishing his talk, Peter asks if this isn’t time for a radical change. Why not rethink the entire process and embrace latency? Peter gives an example of a colour grading workflow which has been able to switch from on-prem colour grading on very high-spec computers to running this same, incredibly intensive process in the cloud. The company’s able to spin up thousands of CPUs in the cloud and use spot pricing to create temporary, low cost, extremely powerful computers. This has brought waiting times down for jobs to be processed significantly and has reduced the cost of processing an order of magnitude.

Lastly Peter looks further to the future examining how saturating the stadium with cameras could change the way we operate cameras. With 360-degree coverage of the stadium, the position of the camera can be changed virtually by AI allowing camera operators to be remote from the stadium. There is already work to develop this from Canon and Intel. Whilst this may not be able to replace all camera operators, sports is the home of bleeding-edge technology. How long can it resist the technology to create any camera angle?

Source: intoPIX

Jean-Baptiste Lorent is next from intoPIX to explain what JPEG XS is. A new, ultra-low-latency, codec it meets the challenges of the industry’s move to IP, its increasing desire to move data rather than people and the continuing trend of COTS servers and cloud infrastructure to be part of the real-time production chain.

As Peter covered, uncompressed data rates are very high. The Tokyo Olympics will be filmed in 8K which racks up close to 80Gbps for 120fps footage. So with JPEG XS standing for Xtra Small and Xtra Speed, it’s no surprise that this new ISO standard is being leant on to help.

Tested as visually lossless to 7 or more encode generations and with latency only a few lines of video, JPEG XS works well in multi-stage live workflows. Jean-Baptiste explains that it’s low complexity and can work well on FPGAs and on CPUs.

JPEG XS can support up to 16-bit values, any chroma and any colour space. It’s been standardised to be carried in MPEG TSes, in SMPTE ST 2110 as 2110-22, over RTP (pending) within HEIF file containers and more. Worst case bitrates are 200Mbps for 1080i, 390Mbps for 1080p60 and 1.4Gbps for 2160p60.

Evolution of Standards-Based IP Workflows Ground-To-Cloud

Last in the presentations is John Mailhot from Imagine Communications and also co-chair of an activity group at the VSF working on standardising interfaces for passing media place to place. Within the data plane, it would be better to avoid vendors repeatedly writing similar drivers. Between ground and cloud, how do we standardise video arriving and the data you need around that. Similarly standardising new technologies like Amazon’s CDI is important.

John outlines the aim of having an interoperability point within the cloud above the low-level data transfer, closer to 7 than to 1 in the OSI model. This work is being done within AIMS, VSF, SMPTE and other organisations based on existing technologies.

Q&A
The video finishes with a Q&A and includes comments from AWS’s Evan Statton whose talk on CDI that evening is not part of this video. The questions cover comparing NDI with JPEG XS, how CDI uses networking to achieve high bandwidths and high reliability, the balance between minimising network and minimising CPU depending on workflow, the increasingly agile nature of broadcast infrastructure, the need for PTP in the cloud plus the pros and cons of standards versus specifications.

Watch now!
Speakers

Peter Wharton Peter Wharton
Director Corporate Strategy, TAG V.S.
President, Happy Robotz
Vice President of Membership, SMPTE
Jean-Baptiste Lorent Jean-Baptiste Lorent
Director Marketing & Sales,
intoPIX
John Mailhot John Mailhot
Co-Chair Cloud-Gounrd-Cloud-Ground Activity Group, VSF
Directory & NMOS Steering Member, AMWA
Systems Architect for IP Convergence, Imagine Communcations
Paul Briscoe Moderator: Paul Briscoe
Canadian Regional Governor, SMPTE
Consultant, Televisionary Consulting
Evan Statton Evan Statton
Principal Architect, Media & Entertainment
Amazon Web Services

Video: Get it in the mixer! Achieving better audio immersiveness


Immersive audio is pretty much standard for premium sports coverage and can take many forms. Typically, immersive audio is explained as ‘better than surround sound’ and is often delivered to the listener as object audio such as AC-4. Delivering audio as objects allows the listener’s decoder to place the sounds appropriately for their specific setup, whether they have 3 speakers, 7, a ceiling bouncing array or just headphones. This video looks at how these can be carefully manipulated to maximise the immersiveness of the experience and is available as a binaural version.

This conversation from SVG Europe, hosted by Roger Charlseworth brings together three academics who are applying their research to live, on-air sports. First to speak is Hyunkook Lee who discusses how to capture 360 degree sound fields using microphone arrays. In order to capture audio from all around, we need to use multiple microphones but, as Hyunkook explains, any difference in location between microphones can lead to a phase difference in the audio. This can be perceived as a delay in audio between two microphones gives us the spatial sound of the audio just as the spacing of our ears helps us understand the soundscape. This effect can be considered separately in the vertical and horizontal domain, the latter being important.

Talking about vertical separation, Hyunkook discusses the ‘Pitch-Height’ effect whereby the pitch of the sound affects our perception of its height rather than any delays between different sound sources. Modulating the amplitude, however, can be effective. Now, when bringing together into one mix multiple versions of the same audio which have been slightly delayed, this produces comb filtering of the audio. As such, a high-level microphone used to capture ambience can colour the overall audio. Hyunkook shows that this colouration can be mitigated by reducing the upper sound by 7dB which can be done by angling the audio up. He finished by playing his binaural recordings recorded on his microphone arrays. A binaural version of this video is also available.

Second up, is Ben Shirley who talks about supporting the sound supervisor’s mix with AI. Ben highlights that a sound supervisor will not just be in charge of the main programme mix, but also the comms system. As such, if that breaks – which could endanger the wider production – their attention will have to go to that rather than mixing. Whilst this may not be so much of an issue with simpler games, when producing high-end mixes with object audio, this is very skilled job which requires constant attention. Naturally, the more immersive an experience is, the more obvious it is when mistakes happen. The solution created by Ben’s company is to use AI to create a pitch effects mix which can be used as a sustaining feed which covers moments when the sound supervisor can’t give the attention needed, but also allows them more flexibility to work on the finer points of the mix rather than ‘chasing the ball’.

The AI-trained system is able to create a constant-power mix of the on-pitch audio. By analysing the many microphones, it’s also able to detect ball kicks which aren’t close to any microphones and, indeed, may not be ‘heard’ by those mics at all. When it detects the vestiges of a ball kick, it has the ability to pull from a large range of ball kick sounds and insert on-the-fly in place of the real ball kick which wasn’t usefully recorded by any mic. This comes into its own, says Ben, when used with VR or 360-degree audio. Part of what makes immersive audio special is the promise of customising the sound to your needs. What does that mean? The most basic meaning is that it understands how many speakers you have and where they are meaning that it can create a downmix which will correctly place the sounds for you. Ideally, you would be able to add your own compression to accommodate listening at a ‘constant’ volume when dynamic range isn’t a good thing, for instance, listening at night without waking up the neighbours. Ben’s example is that in-stadium, people don’t want to hear the commentary as they don’t need to be told what to think about each shot.

Last in the order is Felix Krückels who talks about his work in German football to better use the tools already available to deal with object audio in a more nuanced way, improving the overall mix by using existing plugins. Felix starts by showing how the closeball/field of play mic contains a lot of the audio that the crowd mics contain. In fact, Felix says the closeball mic contains 90% of the crowd sound. When mixing that into stereo and also 5.1 we see that the spill in the closeball mic, we can get colouration. Some stadia have dedicated left and right crowd mics. Felix then talks about personalisation in sound giving the example of watching in a pub where there will be lots of local crowd noise so having a mix with lots of in-stadium crowd noise isn’t helpful. Much better, in that environment, to have clear commentary and ball effects with a lower-than-normal ambience. Felix plays a number of examples to show how using plugins to vary the delays can help produce the mixes needed.

Watch now!
Binarual Version
Speakers

Felix Krückels Felix Krückels
Audio Engineer,
Consultant and Academic
Hyunkook Lee Hyunkook Lee
Director of the Centre for Audio and Psychoacoustic Engineering,
University of Huddersfield
Ben Ben Shirley Ben Shirley
Director and co-Founder at Salsa Sound and Senior Lecturer and researcher in audio technology,
University of Salford
Roger Charlesworth Moderator: Roger Charlesworth
Independent consultant on media production technology