Next Generation Audio Archives – The Broadcast Knowledge

Video: Get it in the mixer! Achieving better audio immersiveness

Posted on 27th November 2020 by Russell Trafford-Jones

Immersive audio is pretty much standard for premium sports coverage and can take many forms. Typically, immersive audio is explained as ‘better than surround sound’ and is often delivered to the listener as object audio such as AC-4. Delivering audio as objects allows the listener’s decoder to place the sounds appropriately for their specific setup, whether they have 3 speakers, 7, a ceiling bouncing array or just headphones. This video looks at how these can be carefully manipulated to maximise the immersiveness of the experience and is available as a binaural version.

This conversation from SVG Europe, hosted by Roger Charlseworth brings together three academics who are applying their research to live, on-air sports. First to speak is Hyunkook Lee who discusses how to capture 360 degree sound fields using microphone arrays. In order to capture audio from all around, we need to use multiple microphones but, as Hyunkook explains, any difference in location between microphones can lead to a phase difference in the audio. This can be perceived as a delay in audio between two microphones gives us the spatial sound of the audio just as the spacing of our ears helps us understand the soundscape. This effect can be considered separately in the vertical and horizontal domain, the latter being important.

Talking about vertical separation, Hyunkook discusses the ‘Pitch-Height’ effect whereby the pitch of the sound affects our perception of its height rather than any delays between different sound sources. Modulating the amplitude, however, can be effective. Now, when bringing together into one mix multiple versions of the same audio which have been slightly delayed, this produces comb filtering of the audio. As such, a high-level microphone used to capture ambience can colour the overall audio. Hyunkook shows that this colouration can be mitigated by reducing the upper sound by 7dB which can be done by angling the audio up. He finished by playing his binaural recordings recorded on his microphone arrays. A binaural version of this video is also available.

Second up, is Ben Shirley who talks about supporting the sound supervisor’s mix with AI. Ben highlights that a sound supervisor will not just be in charge of the main programme mix, but also the comms system. As such, if that breaks – which could endanger the wider production – their attention will have to go to that rather than mixing. Whilst this may not be so much of an issue with simpler games, when producing high-end mixes with object audio, this is very skilled job which requires constant attention. Naturally, the more immersive an experience is, the more obvious it is when mistakes happen. The solution created by Ben’s company is to use AI to create a pitch effects mix which can be used as a sustaining feed which covers moments when the sound supervisor can’t give the attention needed, but also allows them more flexibility to work on the finer points of the mix rather than ‘chasing the ball’.

The AI-trained system is able to create a constant-power mix of the on-pitch audio. By analysing the many microphones, it’s also able to detect ball kicks which aren’t close to any microphones and, indeed, may not be ‘heard’ by those mics at all. When it detects the vestiges of a ball kick, it has the ability to pull from a large range of ball kick sounds and insert on-the-fly in place of the real ball kick which wasn’t usefully recorded by any mic. This comes into its own, says Ben, when used with VR or 360-degree audio. Part of what makes immersive audio special is the promise of customising the sound to your needs. What does that mean? The most basic meaning is that it understands how many speakers you have and where they are meaning that it can create a downmix which will correctly place the sounds for you. Ideally, you would be able to add your own compression to accommodate listening at a ‘constant’ volume when dynamic range isn’t a good thing, for instance, listening at night without waking up the neighbours. Ben’s example is that in-stadium, people don’t want to hear the commentary as they don’t need to be told what to think about each shot.

Last in the order is Felix Krückels who talks about his work in German football to better use the tools already available to deal with object audio in a more nuanced way, improving the overall mix by using existing plugins. Felix starts by showing how the closeball/field of play mic contains a lot of the audio that the crowd mics contain. In fact, Felix says the closeball mic contains 90% of the crowd sound. When mixing that into stereo and also 5.1 we see that the spill in the closeball mic, we can get colouration. Some stadia have dedicated left and right crowd mics. Felix then talks about personalisation in sound giving the example of watching in a pub where there will be lots of local crowd noise so having a mix with lots of in-stadium crowd noise isn’t helpful. Much better, in that environment, to have clear commentary and ball effects with a lower-than-normal ambience. Felix plays a number of examples to show how using plugins to vary the delays can help produce the mixes needed.

Watch now!
Binarual Version
Speakers

	Felix Krückels Audio Engineer, Consultant and Academic
	Hyunkook Lee Director of the Centre for Audio and Psychoacoustic Engineering, University of Huddersfield
Ben	Ben Shirley Director and co-Founder at Salsa Sound and Senior Lecturer and researcher in audio technology, University of Salford
	Moderator: Roger Charlesworth Independent consultant on media production technology

Video: Next-generation audio in the European market – The state of play

Posted on 12th November 2020 by Russell Trafford-Jones

Next-generation audio refers to a range of new technologies which allow for immersive audio like 3D sound, for increased accessibility, for better personalisation and anything which delivers a step-change in the lister experience. NGA technologies can stand on their own but are often part of next-generation broadcast technologies like ATSC 3.0 or UHD/8K transmissions.

This talk from the Sports Video Group and Dolby presents one case study from a few that have happened in 2020 which delivered NGA over the air to homes. First, though, Dolby’s Jason Power brings us up to date on how NGA has been deployed to date and looks at what it is.

Whilst ‘3D sound’ is an easy to understand feature, ‘increased personalisation’ is less so. Jason introduces ideas for personalisation such as choosing which team you’re interested in and getting a different crowd mix dependant on that. The possibilities are vast and we’re only just starting to experiment with what’s possible and determine what people actually want or to change where your mics are, on the pitch or in the stands.

What can I do if I want to hear next-generation audio? Jason explains that four out of five TVs are now shipping with NGA audio and all of the five top manufacturers have support for at least one NGA technology. Such technologies are Dolby’s AC-4 and sADM. AC-4 allows delivery of Dolby Atmos which is an object-based audio format which allows the receiver much more freedom to render the sound correctly based on the current speaker set up. Should you change how many speakers you have, the decoder can render the sound differently to ensure the ‘stereo’ image remains correct.

To find out more about the technologies behind NGA, take a look at this talk from the Telos Alliance.

Next, Matthieu Parmentier talks about the Roland Garros event in 2020 which was delivered using sADM plus Dolby AC-4. sADM is an open specification for metadata interchange, the aim of which is to help interoperability between vendors. The S-ADM metadata is embedded in the SDI and then transported uncompressed as SMPTE 302M.

ATEME’s Mickaël Raulet completes the picture by explaining their approach which included setting up a full end-to-end system for testing and diagnosis. The event itself, we see, had three transmission paths. An SDR satellite backup and two feeds into the DVB-T2 transmitter at the Eiffel Tower.

The session ends with an extensive Q&A session where they discuss the challenges they faced and how they overcame them as well as how their businesses are changing.

Watch now!
Speakers

	Jason Power Senior Director of Commercial Partnerships & Standards, Dolby
	Mickaël Raulet Vice President of Innovation, ATEME
	Matthieu Parmentier Head of Data & Artificial Intelligence France Television
	Moderator:Roger Charlesworth Charlesworth Media

Video: Audio Metadata over IP

Posted on 6th August 2020 by Russell Trafford-Jones

Next-Generation Audio is gradually becoming this generation’s audio as new technologies seep into the mainstream. Dolby Atmos is one example of a technology which is being added to more and more services and which goes way beyond stereo and even 5.1 surround sound. But these technologies don’t just rely on audio, they need data, too to allow the decoders to understand the sound so they can apply the needed processing. It’s essential that this data, called metadata, keeps in step with the audio and, indeed, that it gets there in the first place.

Dolby have long used metadata along with surround sound to maintain the context in which the recording was mastered. There’s no way for the receiver to know what maximum audio level the recording was mixed to without being told, for instance. With NGA, the metadata needed can be much more complex. With Dolby Atmos, for example, the audio objects need position information along with the mastering information needed for surround sound.

Kent Terry from Dolby laboratories joins us to discuss the methods, both current and future, that we can use to convey metadata from point to point in the broadcast chain. He starts by looking at the tried and trusted methods of carrying data within the audio of SDI. This is the way that Dolby E and Dolby D are carried, as data within what appears to be an AES 3 stream. There are two SMPTE standards for this in a sample-accurate fashion, ST 2109 and ST 2116.

SMPTE 2109 allows for metadata to be carried over an AES 3 channel using SMPTE ST 337, 337 being the standard which defines how to put compressed audio over AES 3 which would normally expect PCM audio data. This allows for any metadata at all to be carried. SMPTE ST 2116, similarly, defines metadata transport over AES3 but specifically for ITU-R BS.2125 and BS.2076 which define how to carry the Audio Definition Model.

The motivation for these standards is to enable live workflows which don’t have a great way of delivering live metadata. There are a few types of metadata which are worth considering. Static metadata, which doesn’t change during the programme such as the number of channels or the sample rate. Dynamic metadata such as spacial location and dialogue levels. And importantly optional metadata and required metadata, the latter being essential for the functioning of the technology.

Kent says that live productions are held back in their choice of NGA technologies by the limitations of metadata carriage and this is one reason that work is being done in the IP space to create similar standards for all-IP programme production.

For IP there are two approaches. The first is to define a way to send metadata separately to the AES67 audio which is found within SMPTE ST 2110-30, which is done with the new AES standard AES-X242. The other way being developed is using SMPTE 2110-41 which allows for any metadata (not solely ST 292) to be carried in a time-synchronised way with the other 2110 essences. Both of these methods, Kent explains are actively being developed and are open to input from users.

Watch now!
Speaker

Kent Terry
Snr. Manager, Sound Technology, Office of the CTO,
Dolby Laboratories

Video: Next Generation TV Audio

Posted on 25th June 2020 by Russell Trafford-Jones

Often not discussed, audio is essential to television and film so as the pixels get better, so should the sound. All aspects of audio are moving forward with more processing power at the receiver, better compression at the sender and a seismic shift in how audio is handled, even in the consumer domain. It’s fair to say that Dolby have been busy.

Larry Schindel from Linear Acoustic is here thanks to the SBE to bring us up to date on what’s normally called ‘Next Generation Audio’ (NGA). He starts from the basics looking at how audio has been traditionally delivered by channels. Stereo sound is delivered as two channels, one for each speaker. The sound engineer choosing how the audio is split between them. With the move to 5.1 and beyond, this continued with the delivery of 6, 8 or even more channels of audio. The trouble is this was always fixed at the time it went through the sound suite. Mixing sound into channels makes assumptions on the layout of your speakers. Sometimes it’s not possible to put your speakers in the ideal position and your sound suffers.

Dolby Atmos has heralded a mainstream move to object-based audio where sounds are delivered with information about their position in the sound field as opposed to the traditional channel approach. Object-based audio leaves the downmixing to the receiver which can be set to take into account its unique room and speaker layout. It represents a change in thinking about audio, a move from thinking about the outputs to the inputs. Larry introduces Dolby Atmos and details the ways it can be delivered and highlights that it can work in a channel or object mode.

Larry then looks at where you can get media with Dolby Atmos. Cinemas are an obvious starting point, but there is a long list of streaming and pay-TV services which use it, too. Larry talks about the upcoming high-profile events which will be covered in Dolby Atmos showing that delivering this enhanced experience is something being taken seriously by broadcasters across the board.

For consumers, they still have the problem of getting the audio in the right place in their awkward, often small, rooms. Larry looks at some of the options for getting great audio in the home which include speakers which bounce sound off the ceiling and soundbars.

One of the key technologies for delivering Dolby Atmos is Dolby AC-4, the improved audio codec taking compression a step further from AC-3. We see that data rates have tumbled, for example, 5.1 surround on AC-3 would be 448Kbps, but can now be done in 144kbps with AC-4. Naturally, it supports channel and object modes and Larry explains how it can deliver a base mix with other audio elements over the top for the decoder to place allowing better customisation. This can include other languages or audio description/video description services. Importantly AC-4, like Dolby E, can be sent so that it doesn’t overlap video frames allowing it to accompany routed audio. Without this awareness of video, any time a video switch was made, the audio would become corrupted and there would be a click.

Dolby Atmos and AC-4 stand on their own and are widely applicable to much of the broadcast chain. Larry finishes this presentation by mentioning that Dolby AC-4 will be the audio of choice for ATSC 3.0. We’ve covered ATSC 3.0 extensively here at The Broadcast Knowledge so if you want more detail than there is in this section of the presentation, do dig in further.

Watch now!

Speaker

Larry Schindel
Senior Product Manager,
Linear Acoustic

Subscribe to get daily updates