Next-Generation Audio is gradually becoming this generation’s audio as new technologies seep into the mainstream. Dolby Atmos is one example of a technology which is being added to more and more services and which goes way beyond stereo and even 5.1 surround sound. But these technologies don’t just rely on audio, they need data, too to allow the decoders to understand the sound so they can apply the needed processing. It’s essential that this data, called metadata, keeps in step with the audio and, indeed, that it gets there in the first place.
Dolby have long used metadata along with surround sound to maintain the context in which the recording was mastered. There’s no way for the receiver to know what maximum audio level the recording was mixed to without being told, for instance. With NGA, the metadata needed can be much more complex. With Dolby Atmos, for example, the audio objects need position information along with the mastering information needed for surround sound.
Kent Terry from Dolby laboratories joins us to discuss the methods, both current and future, that we can use to convey metadata from point to point in the broadcast chain. He starts by looking at the tried and trusted methods of carrying data within the audio of SDI. This is the way that Dolby E and Dolby D are carried, as data within what appears to be an AES 3 stream. There are two SMPTE standards for this in a sample-accurate fashion, ST 2109 and ST 2116.
SMPTE 2109 allows for metadata to be carried over an AES 3 channel using SMPTE ST 337, 337 being the standard which defines how to put compressed audio over AES 3 which would normally expect PCM audio data. This allows for any metadata at all to be carried. SMPTE ST 2116, similarly, defines metadata transport over AES3 but specifically for ITU-R BS.2125 and BS.2076 which define how to carry the Audio Definition Model.
The motivation for these standards is to enable live workflows which don’t have a great way of delivering live metadata. There are a few types of metadata which are worth considering. Static metadata, which doesn’t change during the programme such as the number of channels or the sample rate. Dynamic metadata such as spacial location and dialogue levels. And importantly optional metadata and required metadata, the latter being essential for the functioning of the technology.
Kent says that live productions are held back in their choice of NGA technologies by the limitations of metadata carriage and this is one reason that work is being done in the IP space to create similar standards for all-IP programme production.
For IP there are two approaches. The first is to define a way to send metadata separately to the AES67 audio which is found within SMPTE ST 2110-30, which is done with the new AES standard AES-X242. The other way being developed is using SMPTE 2110-41 which allows for any metadata (not solely ST 292) to be carried in a time-synchronised way with the other 2110 essences. Both of these methods, Kent explains are actively being developed and are open to input from users.
Snr. Manager, Sound Technology, Office of the CTO,