Video: AES Immersive Audio Academy 2: MPEG-H Audio

MPEG-H 3D Audio is an object-based audio coding standard. Object audio keeps parts of the audio as separate sound samples allowing them to be moved around the soundfield, unlike traditional audio where everything is mixed down into a static mix whether stereo or surround. The advantage of keeping some of the audio separate is that it can be adapted to nearly any set of speakers whether it be a single pair or an array of 25 + 4. This makes it a great cinema and home-theatre format but one which also works really well in headphones.

In this video, Yannik Grewe from Fraunhofer IIS gives an overview of the benefits of MPEG-H and the way in which it’s put together internally. The major benefit which will be noticed by most people is immersive content as it allows a better representation of the surround sound effect with options for interactivity. Personalisation is another big benefit where the listener can, for example, select a different language. Under-appreciated, but very important is the accessibility functionality available where dialogue-friendly versions of the audio can be selected or an extra audio description track can be added.

 

 

Yannik moves on, giving a demo of software that allows you to place object objects within a room relative to the listener. He then shows how the traditional audio workflow is changed by MPEG-H only to add an authoring stage which ensures the audio is correct and adds metadata to it. It’s this metadata that will do most of the work in defining the MPEG-H audio.

Within the MPEG-H metadata, Yannik explains there is some overall scene information which includes details about reproduction and setup, loudness and dynamic range control as well of the number of objects. Under that lie components such as a surround sound ‘bed’ with a number of separate audio tracks for speech. Each of these components can be made into an either-or group whereby only one can be chosen at a time. This is ideal for audio that is not intended to be played simultaneously with another. Metadata control means you can actually offer many versions of audio with no changes to the audio itself. Yannik concludes by introducing us to the MPEG-H Production Format (MPF)

Finally, Yannik takes us through the open-source software which is available to create, manage and test your MPEG-H audio setup.

Watch now!
Speaker

Yannik Grewe Yannik Grewe
Senior Engineer – Audio Production Technologies,
Fraunhofer IIS

Video: AES67 Over Wide Area Networks


AES67 is a widely adopted standard for moving PCM audio from place to place. Being a standard, it’s ideal for connecting equipment together from different vendors and delivers almost zero latency, lossless audio from place to place. This video looks at use cases for moving AES from its traditional home on a company’s LAN to the WAN.

Discovery’s Eurosport Technology Transformation (ETT) project is a great example of the compelling use case for moving to operations over the WAN. Eurosport’s Olivier Chambin explains that the idea behind the project is to centralise all the processing technology needed for their productions spread across Europe feeding their 60 playout channels.

Control surfaces and some interface equipment is still necessary in the European production offices and commentary points throughout Europe, but the processing is done in two data centres, one in the Netherlands, the other in the UK. This means audio does need to travel between countries over Discovery’s dual MPLS WAN using IGMPv3 multicast with SSM

From a video perspective, the ETT project has adopted 2110 for all essences with NMOS control. Over the WAN, video is sent as JPEG XS but all audio links are 2022-7 2110-30 with well over 10,000 audio streams in total. Timing is done using PTP aware switches with local GNSS-derived PTP with a unicast-over-WAN as a fallback. For more on PTP over WAN have a look at this RTS webinar and this update from Meinberg’s Daniel Boldt.

 

 

Bolstering the push for standards such as AES67 is self-confessed ‘audioholic’ Anthony P. Kuzub from Canada’s CBC. Chair of the local AES section he makes the point that broadcast workflows have long used AES standards to ensure vendor interoperability from microphones to analogue connectors, from grounding to MADI (AES10). This is why AES67 is important as it will ensure that the next generation of equipment can also interoperate.

Surrounding these two case studies is a presentation from Nicolas Sturmel all about the AES SC-02-12-M working group which aims to define the best ways of working to enable easy use of AES67 on the WAN. The key issue here is that AES67 was written expecting short links on a private network that you can completely control. Moving to a WAN or the internet with long-distance links on which your bandwidth or choice of protocols is limited can make AES67 perform badly if you don’t follow the best practices.

To start with, Nicolas urges anyone to check they actually need AES67 over the WAN to start with. Only if you need precise timing (for lip-sync for example) with PCM quality and low latencies from 250ms down to as a little as 5 milliseconds do you really need AES67 instead of using other protocols such as ACIP, he explains. The problem being that any ping on the internet, even to something fairly close, can easily take 16 to 40ms for the round trip. This means you’re guaranteed 8ms of delay, but any one packet could be as late as 20ms known as the Packet Delay Variation (PDV).

Not only do we need to find a way to transmit AES67, but also PTP. The Precise Time Protocol has ways of coping for jitter and delay, but these don’t work well on WAN links whether the delay in one direction may be different to the delay for a packet in the other direction. PTP also isn’t built to deal with the higher delay and jitter involved. PTP over WAN can be done and is a way to deliver a service but using a GPS receiver at each location, as Eurosport does, is a much better solution only hampered by cost and one’s ability to see enough of the sky.

The internet can lose packets. Given a few hours, the internet will nearly always lose packets. To get around this problem, Nicolas looks at using FEC whereby you are constantly sending redundant data. FEC can send up to around 25% extra data so that if any is lost, the extra information sent can be leveraged to determine the lost values and reconstruct the stream. Whilst this is a solid approach, computing the FEC adds delay and the extra data being constantly sent adds a fixed uplift on your bandwidth need. For circuits that have very few issues, this can seem wasteful but having a fixed percentage can also be advantageous for circuits where a predictable bitrate is much more important. Nicolas also highlights that RIST, SRT or ST 2022-7 are other methods that can also work well. He talks about these longer in his talk with Andreas Hildrebrand

The video concludes with a Q&A.

Watch now!
Speakers

Nicolas Sturmel Nicolas Sturmel
Product Manager – Senior Technologist,
Merging Technologies
Anthony P. Kuzub Anthony P. Kuzub
Senior Systems Designer,
CBC/Radio Canada
Olivier Chambin Olivier Chambin
Audio Broadcast Engineer, AioP and Voice-over-IP
Eurosport Discovery

Video: AES67 Beyond the LAN

It can be tempting to treat a good quality WAN connection like a LAN. But even if it has a low ping time and doesn’t drop packets, when it comes to professional audio like AES67, you can help but unconver the differences. AES67 was designed for tranmission over short distances meaning extremely low latency and low jitter. However, there are ways to deal with this.

Nicolas Sturmel from Merging Technologies is working as part of the AES SC-02-12M working group which has been defining the best ways of working to enable easy use of AES67 on the WAN wince the summer. The aims of the group are to define what you should expect to work with AES67, how you can improve your network connection and give guidance to manufacturers on further features needed.

WANs come in a number of flavours, a fully controlled WAN like many larger broadacsters have which is fully controlled by them. Other WANs are operated on SLA by third parties which can provide less control but may present a reduced operating cost. The lowest cost is the internet.

He starts by outlining the fact that AES67 was written to expect short links on a private network that you can completely control which causes problems when using the WAN/internet with long-distance links on which your bandwidth or choice of protocols can be limited. If you’re contributing into the cloud, then you have an extra layer of complication on top of the WAN. Virtualised computers can offer another place where jitter and uncertain timing can enter.

Link

The good news is that you may not need to use AES67 over the WAN. If you need precise timing (for lip-sync for example) with PCM quality and low latencies from 250ms down to as a little as 5 milliseconds do you really need AES67 instead of using other protocols such as ACIP, he explains. The problem being that any ping on the internet, even to something fairly close, can easily have a varying round trip time of, say, 16 to 40ms. This means you’re guaranteed 8ms of delay, but any one packet could be as late as 20ms. This variation in timing is known as the Packet Delay Variation (PDV).

Not only do we need to find a way to transmit AES67, but also PTP. The Precise Time Protocol has ways of coping for jitter and delay, but these don’t work well on WAN links whether the delay in one direction may be different to the delay for a packet in the other direction. PTP also isn’t built to deal with the higher delay and jitter involved. PTP over WAN can be done and is a way to deliver a service but using a GPS receiver at each location is a much better solution only hampered by cost and one’s ability to see enough of the sky.

The internet can lose packets. Given a few hours, the internet will nearly always lose packets. To get around this problem, Nicolas looks at using FEC whereby you are constantly sending redundant data. FEC can send up to around 25% extra data so that if any is lost, the extra information sent can be leveraged to determine the lost values and reconstruct the stream. Whilst this is a solid approach, computing the FEC adds delay and the extra data being constantly sent adds a fixed uplift on your bandwidth need. For circuits that have very few issues, this can seem wasteful but having a fixed percentage can also be advantageous for circuits where a predictable bitrate is much more important. Nicolas also highlights that RIST, SRT or ST 2022-7 are other methods that can also work well. He talks about these longer in his talk with Andreas Hildrebrand

Nocals finishes by summarising that your solution will need to be sent over unicast IP, possibly in a tunnel, each end locked to a GNSS, high buffers to cope with jitter and, perhaps most importantly, the output of a workflow analysis to find out which tools you need to deploy to meet your actual needs.

Watch now!
Speaker

Nicolas Sturmel Nicolas Sturmel
Network Specialist,
Merging Technologies

Video: Secrets of Near Field Monitoring

We don’t need to be running a recording studio to care about speaker placement. Broadcast facilities are full of audio monitoring rooms for a range of uses. The principles discussed in this talk by award-winning studio designer Carl Tatz can be put in to practice wherever you want to sit in a room and listen to decent, flat audio.

Joining Producer Mike Rodiguez who moderates this webinar for the Audio Engineering Society (AES), Carl focuses this discussion on getting the right sound in audio control rooms. This is done through the ‘Null Positioning Ensemble’ (NPE) which considers the mixing console, listener and the speakers ‘as one’ that can be moved around the room. The ensemble puts the two speakers at about 1.71m apart behind the console firing across the console. Their audio intersects 45cm in front of the console where the listener can sit forming an equilateral triangle. By sitting between the console and where the speakers cross, Carl says you hear the source rather than the speakers thus giving the best audio reproduction.

This effect works if the tweeters are at the same higher as the listener’s ears, says Carl, so should be adjusted to suit the listener. High frequencies are more directional than lower frequencies so for accurate listening, it’s important the speakers aren’t pointing too far off-axis. Exactly where to place your ensemble can seem daunting, but Carl has a calculator on his website which gives a great start allowing you to model your room as a rectangle and find out where the null points are going to be. The nulls are where sound cancels out due to reflections so moving your ensemble to avoid these nulls is the key to a great sound. Carl details how this is done and how, then, to optimise for the ‘real world’ room rather than the mathematical model.

 

 

Carl talks about the importance of sound treatment to remove reflections and stop the room from being too lively, with some specific suggestions. In general, the aim is to remove first reflections, have the back stony dead, the ceiling dead and bass traps in the corners. This should allow you to clap your hands without hearing reflection. But you can’t fix every problem with such treatment, Carl says, bringing up a frequency chart of a typical monitor setup which shows a 10dB dip around 125Hz. This is found in all monitoring setups and appears to develop from sound from the speakers bouncing off the floor under the console. He says that this needs to be filled in with subwoofers rather than being fixed with EQ or acoustic treatments.

Watch now!
Speakers

Carl Tatz Carl Tatz
Founder,
Carl Tatz Design LLC
Mike Rodriguez Moderator: Mike Rodriguez
Freelance Director & Producer