Immersive audio is pretty much standard for premium sports coverage and can take many forms. Typically, immersive audio is explained as ‘better than surround sound’ and is often delivered to the listener as object audio such as AC-4. Delivering audio as objects allows the listener’s decoder to place the sounds appropriately for their specific setup, whether they have 3 speakers, 7, a ceiling bouncing array or just headphones. This video looks at how these can be carefully manipulated to maximise the immersiveness of the experience and is available as a binaural version.
This conversation from SVG Europe, hosted by Roger Charlseworth brings together three academics who are applying their research to live, on-air sports. First to speak is Hyunkook Lee who discusses how to capture 360 degree sound fields using microphone arrays. In order to capture audio from all around, we need to use multiple microphones but, as Hyunkook explains, any difference in location between microphones can lead to a phase difference in the audio. This can be perceived as a delay in audio between two microphones gives us the spatial sound of the audio just as the spacing of our ears helps us understand the soundscape. This effect can be considered separately in the vertical and horizontal domain, the latter being important.
Talking about vertical separation, Hyunkook discusses the ‘Pitch-Height’ effect whereby the pitch of the sound affects our perception of its height rather than any delays between different sound sources. Modulating the amplitude, however, can be effective. Now, when bringing together into one mix multiple versions of the same audio which have been slightly delayed, this produces comb filtering of the audio. As such, a high-level microphone used to capture ambience can colour the overall audio. Hyunkook shows that this colouration can be mitigated by reducing the upper sound by 7dB which can be done by angling the audio up. He finished by playing his binaural recordings recorded on his microphone arrays. A binaural version of this video is also available.
Second up, is Ben Shirley who talks about supporting the sound supervisor’s mix with AI. Ben highlights that a sound supervisor will not just be in charge of the main programme mix, but also the comms system. As such, if that breaks – which could endanger the wider production – their attention will have to go to that rather than mixing. Whilst this may not be so much of an issue with simpler games, when producing high-end mixes with object audio, this is very skilled job which requires constant attention. Naturally, the more immersive an experience is, the more obvious it is when mistakes happen. The solution created by Ben’s company is to use AI to create a pitch effects mix which can be used as a sustaining feed which covers moments when the sound supervisor can’t give the attention needed, but also allows them more flexibility to work on the finer points of the mix rather than ‘chasing the ball’.
The AI-trained system is able to create a constant-power mix of the on-pitch audio. By analysing the many microphones, it’s also able to detect ball kicks which aren’t close to any microphones and, indeed, may not be ‘heard’ by those mics at all. When it detects the vestiges of a ball kick, it has the ability to pull from a large range of ball kick sounds and insert on-the-fly in place of the real ball kick which wasn’t usefully recorded by any mic. This comes into its own, says Ben, when used with VR or 360-degree audio. Part of what makes immersive audio special is the promise of customising the sound to your needs. What does that mean? The most basic meaning is that it understands how many speakers you have and where they are meaning that it can create a downmix which will correctly place the sounds for you. Ideally, you would be able to add your own compression to accommodate listening at a ‘constant’ volume when dynamic range isn’t a good thing, for instance, listening at night without waking up the neighbours. Ben’s example is that in-stadium, people don’t want to hear the commentary as they don’t need to be told what to think about each shot.
Last in the order is Felix Krückels who talks about his work in German football to better use the tools already available to deal with object audio in a more nuanced way, improving the overall mix by using existing plugins. Felix starts by showing how the closeball/field of play mic contains a lot of the audio that the crowd mics contain. In fact, Felix says the closeball mic contains 90% of the crowd sound. When mixing that into stereo and also 5.1 we see that the spill in the closeball mic, we can get colouration. Some stadia have dedicated left and right crowd mics. Felix then talks about personalisation in sound giving the example of watching in a pub where there will be lots of local crowd noise so having a mix with lots of in-stadium crowd noise isn’t helpful. Much better, in that environment, to have clear commentary and ball effects with a lower-than-normal ambience. Felix plays a number of examples to show how using plugins to vary the delays can help produce the mixes needed.
Consultant and Academic
Director of the Centre for Audio and Psychoacoustic Engineering,
University of Huddersfield
Director and co-Founder at Salsa Sound and Senior Lecturer and researcher in audio technology,
University of Salford
Moderator: Roger Charlesworth
Independent consultant on media production technology