Video: Everything You Want to Know About Captioning

Legally mandated in many countries, captions or subtitles are a vital part of both broadcast and streaming. But as the saying goes, the lower the bandwidth signals are the more complex to manage. Getting subtitles right in all their nuances is hard whether live or from post-production. And by getting it right, we’re talking about protocol, position, colour, delay, spelling and accuracy. So there’s a whole workflow just for subtitling which is what this video looks at.

EEG specialise in subtitling solutions so it’s no surprise their Sales Associate Matt Mello and VP of Product Development, Bill McLaughlin wanted to run a live Q&A session which, unusually was a pure one-hour Q&A with no initial presentation. All questions are shown on screen and are answered by Matt & Bill who look at the technology and specific products.

 

 

They start off by defining the terms ‘closed’ and ‘open’ captioning saying that open captions are shown in the picture itself, also known as ‘burnt in’. Closed, indicates they are hidden which often refers to the closed captions in the blanking of TV channels which are always sent but only displayed when the viewer asks their TV to decode and overlay them onto the picture as they watch. Whether closed or open, there is always the task of getting the subtitles at the right time and merging them into the video to ensure the words appear at the right time. As for the term subtitles vs captions, this really depends on where you are from. In the UK, ‘subtitles’ is used instead of ‘captions’ with the term ‘closed captions’ specifically referring to the North American closed captions standard. This is as opposed to Teletext subtitles which are different, but still inserted into the blanking of baseband video and only shown when the decoder is asked to display them. The Broadcast Knowledge uses the term subtitles to mean captions.

The duo next talk about live and pre-recorded subtitles with live being the tricky ones as generating live subtitling with minimal delay is difficult. The predominant method which replaced stenography is to have a person respeak the programme into a trained voice recognition programme which has a delay. However, the accuracy is much better than having a computer listen to the raw programme sound which may have all sorts of accents, loud background noise or overlapping speakers leaving the result much to be desired. Automatic solutions, however, don’t need scheduling unlike humans and there are now methods to input specialist vocabulary and indeed scripts ahead of time to help keep accuracy up.

Accuracy is another topic under the spotlight with Matt and Bill. They outline that accuracy is measured in different ways from a simplistic count of the number of incorrect words to weighted measures which look to see how important the incorrect words are and how much the meaning has changed. Looking at the videos from YouTube, we see that automated captions are generally less accurate than human-curated subtitles, but they do allow YouTube to meet its legal responsibility to stream with captions. Accuracy of around 98% should be taken, they advise, as effectively perfect with 95% being good and below 85% there’s a question whether it’s actually worth doing.

When you investigate services, you’ll inevitably see mention of EIA-608 and EIA-708 caption formats which are the North American SD and HD standards for carrying captions. These are also used for delivery to streaming services so retain relevance even though they originated for broadcast closed captions. One question asks if these subs can be edited after recording to which the response is ‘yes’ as part of a post-production workflow, but direct editing of the 608/708 file won’t work.

Other questions include subtitling in Zoom and other video conferencing apps, delivery of automated subtitles to a scoreboard, RTMP subtitling latency, switching between languages, mixing pre-captioned and live captioned material and converting to TTML captions for ATSC 3.0.

Watch now!
Speakers

Bill McLaughlin Bill McLaughlin
VP Product Development
EEG Enterprises
Matthew Mello Matthew Mello
Sales Associate,
EEG Enterprises

Video: Live Closed Captioning and Subtitling in SMPTE 2110 (update)

The SMPTE ST 2110-40 standard specifies the real-time, RTP transport of SMPTE ST 291-1 Ancillary Data packets. It allows creation of IP essence flows carrying the VANC data familiar to us from SDI (like AFD, closed captions or ad triggering), complementing the existing video and audio portions of the SMPTE ST 2110 suite.

This presentation, by Bill McLaughlin from EEG, is an updated tutorial on subtitling, closed captioning, and other ancillary data workflows using the ST 2110-40 standard. Topics include synchronization, merging of data from different sources and standards conversion.

Building on Bill’s previous presentation at the IP Showcase), this talk at NAB 2019 demonstrates a big increase in the number of vendors supporting ST 2110-40 standard. Previously a generic packet analyser like Wireshark with dissector was recommended for troubleshooting IP ancillary data. But now most leading multiviewer / analyser products can display captioning, subtitling and timecode from 2110-40 streams. At the recent “JT-NM Tested Program” event 29 products passed 2110-40 Reception Validation. Moreover, 27 products passed 2110-40 Transmitter Validation which mean that their output can be reconstructed into SDI video signals with appropriate timing and then decoded correctly.

Bill points out that ST 2110-40 is not really a new standard at this point, it only defines how to carry ancillary data from the traditional payloads over IP. Special care needs to be taken when different VANC data packets are concatenated in the IP domain. A lot of existing devices are simple ST 2110-40 receivers which would require a kind of VANC funnel to create a combined stream of all the relevant ancillary data, making sure that line numbers and packet types don’t conflict, especially when signals need to be converted back to SDI.

There is a new ST 2110-41 standard being developed for additional ancilary data which do not match up with ancillary data standardised in ST 291-1. Another idea discussed is to move away from SDI VANC data format and use a TTML track (Timed Text Markup Language – textual information associated with timing information) to carry ancillary information.

Watch now!

Download the slides.

Speakers

 

Bill McLaughlin Bill McLaughlin
VP of Product Development
EEG

Video: TR-1001 Replacing Video By Spreadsheet

Here to kill the idea of SDNs – Spreadsheet Defined Networks – is TR-1001 which defines ways to implement IP-based media facilities avoiding some typical mistakes and easing the support burden.

From the JT-NM (Joint Taskforce – Networked Media), TR-1001 promises to be a very useful document for companies implementing ST-2110 or any video-over-IP network Explaining what’s in it is EEG’s Bill McLaughlin at the VSF’s IP Showcase at NAB.

This isn’t the first time we’ve written about TR-1001 at The Broadcast Knowledge. Previously, Imagine’s John Mailhot has dived in deep as part of a SMPTE standards webcast. Here, Bill takes a lighter approach to get over the main aims of the document and adds details about recent testing which happened across several vendors.

Bill looks at the typical issues that people find when initially implementing a system with ST-2110 devices and summarises the ways in which TR-1001 mitigates these problems. The aim here is to enable, at least in theory, many nodes to be configured in an automatic and self-documenting way.

Bill explains that TR-1001 covers timing, discovery and connection of devices plus some of configuration and monitoring. As we would expect, ST-2110 itself defines the media transport and also some of the timing. Work is still to be done to help TR-1001 address security aspects.

Speaker

Bill McLaughlin Bill McLaughlin
VP Product Development,
EEG Enterprises

Video: Live Closed Captioning and Subtitling in SMPTE 2110-40

The ST 2110-40 standard specifies the real-time, RTP transport of SMPTE ST 291-1 Ancillary Data packets. It allows to create IP essence flow carrying VANC data known from SDI (like AFD, closed captions or triggering), complementing the existing video and audio portions of the SMPTE ST 2110 suite.

In this video, Bill McLaughlin introduces 2110-40 and shows its advantages for closed captioning. With video, audio and ancillary data broken into separate essence flows, you no longer need full SDI bandwidth to process closed captioning and transcription can be done by subscribing to a single audio stream which bandwith is less than 1 Mbps. That allows for a very high processing density, with up to 100 channels of closed captioning in 1 RU server.

Another benefit is that a single ST 2110-40 multicast containing closed captioning can be associated with multiple videos (e.g. for two different networks or dirty and clean feeds), typically using NMOS connection management. This translates into additional bandwidth savings and lower cost, as you don’t need separate CC/Subtitling encoders working in SDI domain.

Test and measurment equipment for ST 2110-40 is still under developmnent. However, with date rates of 50-100 kbps per flow monitoring is very managable and you can use COTS equipment and generic packet analyser like Wireshark with dissector available on Github.

Speaker

Bill McLaughlin
VP Product Development
EEG Enterprises