ST 2110-41 Archives – The Broadcast Knowledge

Video: Audio Metadata over IP

Posted on 6th August 2020 by Russell Trafford-Jones

Next-Generation Audio is gradually becoming this generation’s audio as new technologies seep into the mainstream. Dolby Atmos is one example of a technology which is being added to more and more services and which goes way beyond stereo and even 5.1 surround sound. But these technologies don’t just rely on audio, they need data, too to allow the decoders to understand the sound so they can apply the needed processing. It’s essential that this data, called metadata, keeps in step with the audio and, indeed, that it gets there in the first place.

Dolby have long used metadata along with surround sound to maintain the context in which the recording was mastered. There’s no way for the receiver to know what maximum audio level the recording was mixed to without being told, for instance. With NGA, the metadata needed can be much more complex. With Dolby Atmos, for example, the audio objects need position information along with the mastering information needed for surround sound.

Kent Terry from Dolby laboratories joins us to discuss the methods, both current and future, that we can use to convey metadata from point to point in the broadcast chain. He starts by looking at the tried and trusted methods of carrying data within the audio of SDI. This is the way that Dolby E and Dolby D are carried, as data within what appears to be an AES 3 stream. There are two SMPTE standards for this in a sample-accurate fashion, ST 2109 and ST 2116.

SMPTE 2109 allows for metadata to be carried over an AES 3 channel using SMPTE ST 337, 337 being the standard which defines how to put compressed audio over AES 3 which would normally expect PCM audio data. This allows for any metadata at all to be carried. SMPTE ST 2116, similarly, defines metadata transport over AES3 but specifically for ITU-R BS.2125 and BS.2076 which define how to carry the Audio Definition Model.

The motivation for these standards is to enable live workflows which don’t have a great way of delivering live metadata. There are a few types of metadata which are worth considering. Static metadata, which doesn’t change during the programme such as the number of channels or the sample rate. Dynamic metadata such as spacial location and dialogue levels. And importantly optional metadata and required metadata, the latter being essential for the functioning of the technology.

Kent says that live productions are held back in their choice of NGA technologies by the limitations of metadata carriage and this is one reason that work is being done in the IP space to create similar standards for all-IP programme production.

For IP there are two approaches. The first is to define a way to send metadata separately to the AES67 audio which is found within SMPTE ST 2110-30, which is done with the new AES standard AES-X242. The other way being developed is using SMPTE 2110-41 which allows for any metadata (not solely ST 292) to be carried in a time-synchronised way with the other 2110 essences. Both of these methods, Kent explains are actively being developed and are open to input from users.

Watch now!
Speaker

Kent Terry
Snr. Manager, Sound Technology, Office of the CTO,
Dolby Laboratories

Video: ST 2110-41 Fast Metadata – Under the Hood and Applications

Posted on 1st July 2020 by Russell Trafford-Jones

Why would you decode 12Gbps of data in order to monitor for an occasional SCTE switching signal or closed captions? SMPTE 2110 separates video, audio and metadata freeing up workflows to be more flexible. But so far, SMPTE 2110 only defines how to take the metadata used in SDI under SMPTE 291. The point of freeing up workflows is to allow innovation, so this talk looks at SMPTE ST 2110-41 which allows for more types of data to be carried, time synchronised with the other essences.

Paul Briscoe joins us at the podium to explain why we need to extend SMPTE ST 2110-40 beyond SDI-compatible metadata. He starts by explaining how ST 2110 was intended to work well with SDI. Since SDI already has a robust system for Ancillary Data (ANC Data), it made sense for 2110 to support that first in a 100% compatible way. One of the key characteristics of metadata is that it’s synchronised to the other media and we expect the decoders to be selective over what it acts on, explains Paul. The problem that he sees, however is that if you wanted to send some XML, JSON or other data which has never been included in the SDI signal, there is no way to send that within 2110 and keep the synchronisation. To prove his point, Paul puts up the structure of ST-291M in 2110-40 which still has references to lines and horizontal offsets. Furthermore, he points out that future 2110 workflows aren’t going to need to be tied to SDI-compatible metadata.

The Fast Metadata (FMX) proposal is to allow arbitrary metadata streams which can be time-aligned with a media stream. It would provide a standardised encoding method, be extensible, minimise IP address use and retain the option of being interoperable with ST 291 if needed.

Having chosen the KLV data structure, which is well known and scalable, SMPTE provides for this to be delivered with a timestamp and even early so that processing can be done ahead of time. This opens the door to carrying XMl, JSON and other data structure.

Paul explains how this has been implemented as an RTP stream hence using the RTP timestamps. Within the stream there are blobs of data. Paul explains the structure of a blob and how payloads which are smaller than a blob, as well as those which are larger, can be dealt with. Buffering and SDPs need to be supported, after all, this is a 2110 stream.

After doing into the detail of 2110-41, Paul explains there is a 2110-42 standard which can carry technical metadata about the stream. It provides in-band messaging of important stream details but is not intended to replace IS-04 or SDPs.

Find out more and Watch now!
Download the presentation
Speakers

Paul Briscoe
Televisionary Consulting

Video: Introduction To AES67 & SMPTE ST 2110

Posted on 1st May 2020 by Russell Trafford-Jones

While standardisation of video and audio over IP is welcome, this does leave us with a plethora of standards numbers to keep track of along with interoperability edge cases to keep track of. Audio-over-IP standard AES67 is part of the SMPTE ST-2110 standards suite and was born largely from RAVENNA which is still in use in it’s own right. It’s with this backdrop that Andreas Hildebrand from ALC NetworX who have been developing RAVENNA for 10 years now, takes the mic to explain how this all fits together. Whilst there are many technologies at play, this webinar focusses on AES67 and 2110.

Andreas explains how AES67 started out of a plan to unite the many proprietary audio-over-IP formats. For instance, synchronisation – like ST 2110 as we’ll see later – was based on PTP. Andreas gives an overview of this synchronisation and then we shows how they looked at each of the OSI layers and defined a technology that could service everyone. RTP, the Real-time Transport Protocol has been in use for a long time for transport of video and audio so made a perfect option for the transport layer. Andreas highlights the important timing information in the headers and how it can be delivered by unicast or IGMP multicast.

As for the audio, standard PCM is the audio of choice here. Andreas details the different format options available such as 24-bit with 8 channels and 48 samples per packet. By varying the format permutations, we can increase the sample rate to 96kHz or modify the number of audio tracks. To signal all of this format information, Session Description Protocol messages are sent which are small text files outlining the format of the upcoming audio. These are defined in RFC 4566. For a deeper introduction to IP basics and these topics, have a look at Ed Calverley’s talk.

The second half of the video is an introduction to ST-2110. A deeper dive can be found elsewhere on the site from Wes Simpson.
Andreas starts from the basis of ST 2022-6 showing how that was an SDI-based format where all the audio, video and metadata were combined together. ST 2110 brings the splitting of media, known as ‘essences’, which allows them to follow separate workflows without requiring lots of de-embedding and embedding processes.

Like most modern standards, ATSC 3.0 is another example, SMPTE ST 2110 is a suite of many standards documents. Andreas takes the time to explain each one and the ones currently being worked on. The first standard is ST 2110-10 which defines the use of PTP for timing and synchronisation. This uses SMPTE ST 2059 to relate PTP time to the phase of media essences.

2110-20 is up next and is the main standard that defines use of uncompressed video with headline features such as being raster/resolution agnostic, colour sampling and more. 2110-21 defines traffic shaping. Andreas takes time to explain why traffic shaping is necessary and what Narrow, Narrow-Linear, Wide mean in terms of packet timing. Finishing the video theme, 2110-22 defines the carriage of mezzanine-compressed video. Intended for compression like TICO and JPEG XS which have light, fast compression, this is the first time that compressed media has entered the 2110 suite.

2110-30 marks the beginning of the audio standards describing how AES67 can be used. As Andreas demonstrates, AES67 has some modes which are not compatible, so he spends time explaining the constraints and how to implement this. For more detail on this topic, check out his previous talk on the matter. 2110-31 introduces AES3 audio which, like in SDI, provides both the ability to have PCM audio, but also non-PCM audio like Dolby E and D.

Finishing up the talk, we hear about 2110-40 which governs transport of ancillary metadata and a look to the standards still being written, 2110-23 Single Video essence over multiple 2110-20 streams, 2110-24 for transport of SD signals and 2110-41 Transport of extensible, dynamic metadata.

Watch now!
Speaker

Andreas Hildebrand
Senior Product Manager,
ALC NetworX Gmbh.

Video: Live Closed Captioning and Subtitling in SMPTE 2110 (update)

Posted on 25th February 2020 by Adam K

The SMPTE ST 2110-40 standard specifies the real-time, RTP transport of SMPTE ST 291-1 Ancillary Data packets. It allows creation of IP essence flows carrying the VANC data familiar to us from SDI (like AFD, closed captions or ad triggering), complementing the existing video and audio portions of the SMPTE ST 2110 suite.

This presentation, by Bill McLaughlin from EEG, is an updated tutorial on subtitling, closed captioning, and other ancillary data workflows using the ST 2110-40 standard. Topics include synchronization, merging of data from different sources and standards conversion.

Building on Bill’s previous presentation at the IP Showcase), this talk at NAB 2019 demonstrates a big increase in the number of vendors supporting ST 2110-40 standard. Previously a generic packet analyser like Wireshark with dissector was recommended for troubleshooting IP ancillary data. But now most leading multiviewer / analyser products can display captioning, subtitling and timecode from 2110-40 streams. At the recent “JT-NM Tested Program” event 29 products passed 2110-40 Reception Validation. Moreover, 27 products passed 2110-40 Transmitter Validation which mean that their output can be reconstructed into SDI video signals with appropriate timing and then decoded correctly.

Bill points out that ST 2110-40 is not really a new standard at this point, it only defines how to carry ancillary data from the traditional payloads over IP. Special care needs to be taken when different VANC data packets are concatenated in the IP domain. A lot of existing devices are simple ST 2110-40 receivers which would require a kind of VANC funnel to create a combined stream of all the relevant ancillary data, making sure that line numbers and packet types don’t conflict, especially when signals need to be converted back to SDI.

There is a new ST 2110-41 standard being developed for additional ancilary data which do not match up with ancillary data standardised in ST 291-1. Another idea discussed is to move away from SDI VANC data format and use a TTML track (Timed Text Markup Language – textual information associated with timing information) to carry ancillary information.

Watch now!

Download the slides.

Speakers

Bill McLaughlin
VP of Product Development
EEG

Subscribe to get daily updates