Video: Everything You Want to Know About Captioning

Legally mandated in many countries, captions or subtitles are a vital part of both broadcast and streaming. But as the saying goes, the lower the bandwidth signals are the more complex to manage. Getting subtitles right in all their nuances is hard whether live or from post-production. And by getting it right, we’re talking about protocol, position, colour, delay, spelling and accuracy. So there’s a whole workflow just for subtitling which is what this video looks at.

EEG specialise in subtitling solutions so it’s no surprise their Sales Associate Matt Mello and VP of Product Development, Bill McLaughlin wanted to run a live Q&A session which, unusually was a pure one-hour Q&A with no initial presentation. All questions are shown on screen and are answered by Matt & Bill who look at the technology and specific products.

 

 

They start off by defining the terms ‘closed’ and ‘open’ captioning saying that open captions are shown in the picture itself, also known as ‘burnt in’. Closed, indicates they are hidden which often refers to the closed captions in the blanking of TV channels which are always sent but only displayed when the viewer asks their TV to decode and overlay them onto the picture as they watch. Whether closed or open, there is always the task of getting the subtitles at the right time and merging them into the video to ensure the words appear at the right time. As for the term subtitles vs captions, this really depends on where you are from. In the UK, ‘subtitles’ is used instead of ‘captions’ with the term ‘closed captions’ specifically referring to the North American closed captions standard. This is as opposed to Teletext subtitles which are different, but still inserted into the blanking of baseband video and only shown when the decoder is asked to display them. The Broadcast Knowledge uses the term subtitles to mean captions.

The duo next talk about live and pre-recorded subtitles with live being the tricky ones as generating live subtitling with minimal delay is difficult. The predominant method which replaced stenography is to have a person respeak the programme into a trained voice recognition programme which has a delay. However, the accuracy is much better than having a computer listen to the raw programme sound which may have all sorts of accents, loud background noise or overlapping speakers leaving the result much to be desired. Automatic solutions, however, don’t need scheduling unlike humans and there are now methods to input specialist vocabulary and indeed scripts ahead of time to help keep accuracy up.

Accuracy is another topic under the spotlight with Matt and Bill. They outline that accuracy is measured in different ways from a simplistic count of the number of incorrect words to weighted measures which look to see how important the incorrect words are and how much the meaning has changed. Looking at the videos from YouTube, we see that automated captions are generally less accurate than human-curated subtitles, but they do allow YouTube to meet its legal responsibility to stream with captions. Accuracy of around 98% should be taken, they advise, as effectively perfect with 95% being good and below 85% there’s a question whether it’s actually worth doing.

When you investigate services, you’ll inevitably see mention of EIA-608 and EIA-708 caption formats which are the North American SD and HD standards for carrying captions. These are also used for delivery to streaming services so retain relevance even though they originated for broadcast closed captions. One question asks if these subs can be edited after recording to which the response is ‘yes’ as part of a post-production workflow, but direct editing of the 608/708 file won’t work.

Other questions include subtitling in Zoom and other video conferencing apps, delivery of automated subtitles to a scoreboard, RTMP subtitling latency, switching between languages, mixing pre-captioned and live captioned material and converting to TTML captions for ATSC 3.0.

Watch now!
Speakers

Bill McLaughlin Bill McLaughlin
VP Product Development
EEG Enterprises
Matthew Mello Matthew Mello
Sales Associate,
EEG Enterprises

Video: The Case To Caption Everything

To paraphrase a cliché, “you are free to put black and silence to air, but if you do it without captions, you’ll go to prison.” Captions are useful to the deaf, the hard of hearing as well as those who aren’t. And in many places, to not caption videos is seen as so discriminatory, there is a mandatory quota. The saying at the beginning alludes to the US federal and local laws which lay down fines for lack of compliance – though whether it’s truly possible to go to prison, is not clear.

The case for captioning:
“13.3 Million Americans watch British drama”

In many parts of the world ‘subtitles’ means the same as ‘captions’ does in countries such as the US. In this article, I shall use the word captions to match the terms used in the video. As Bill Bennett from ENCO Systems explains, Closed Captions are sent as data along with the video meaning you ask your receiver to turn off, or turn on, display of the text. 

In this talk from the Midwest Broadcast Multimedia Technology Conference, we hear not only why you should caption, but get introduced to the techniques for both creating and transmitting them. Bill starts by introducing us to stenography, the technique of typing on special machines to do real-time transcripts. This is to help explain how resource-intensive creating captions is when using humans. It’s a highly specialist skill which, alone, makes it difficult for broadcasters to deliver captions en masse.

The alternative, naturally, is to have computers doing the task. Whilst they are cheaper, they have problems understanding audio over noise and with multiple people speaking at once. The compromise which is often used, for instance by BBC Sports, is to have someone re-speaking the audio into the computer. This harnesses the best aspects of the human brain with the speed of computing. The re-speaker can annunciate and emphasise to get around idiosyncrasies in recognition.

Bill re-visits the numerous motivations to caption content. He talks about the legal reasons, particularly within the US, but also mentions the usefulness of captions for situations where you don’t want audio from TVs, such as receptions and shop windows as well as in noisy environments. But he also makes the point that once you have this data, the broadcaster can take the opportunity to use that data for search, sentiment analysis and archive retrieval among other things.

Watch now!
Download the presentation
Speaker

Bill Bennett Bill Bennett
Media Solutions Account Manager
ENCO Systems

Video: Live Closed Captioning and Subtitling in SMPTE 2110 (update)

The SMPTE ST 2110-40 standard specifies the real-time, RTP transport of SMPTE ST 291-1 Ancillary Data packets. It allows creation of IP essence flows carrying the VANC data familiar to us from SDI (like AFD, closed captions or ad triggering), complementing the existing video and audio portions of the SMPTE ST 2110 suite.

This presentation, by Bill McLaughlin from EEG, is an updated tutorial on subtitling, closed captioning, and other ancillary data workflows using the ST 2110-40 standard. Topics include synchronization, merging of data from different sources and standards conversion.

Building on Bill’s previous presentation at the IP Showcase), this talk at NAB 2019 demonstrates a big increase in the number of vendors supporting ST 2110-40 standard. Previously a generic packet analyser like Wireshark with dissector was recommended for troubleshooting IP ancillary data. But now most leading multiviewer / analyser products can display captioning, subtitling and timecode from 2110-40 streams. At the recent “JT-NM Tested Program” event 29 products passed 2110-40 Reception Validation. Moreover, 27 products passed 2110-40 Transmitter Validation which mean that their output can be reconstructed into SDI video signals with appropriate timing and then decoded correctly.

Bill points out that ST 2110-40 is not really a new standard at this point, it only defines how to carry ancillary data from the traditional payloads over IP. Special care needs to be taken when different VANC data packets are concatenated in the IP domain. A lot of existing devices are simple ST 2110-40 receivers which would require a kind of VANC funnel to create a combined stream of all the relevant ancillary data, making sure that line numbers and packet types don’t conflict, especially when signals need to be converted back to SDI.

There is a new ST 2110-41 standard being developed for additional ancilary data which do not match up with ancillary data standardised in ST 291-1. Another idea discussed is to move away from SDI VANC data format and use a TTML track (Timed Text Markup Language – textual information associated with timing information) to carry ancillary information.

Watch now!

Download the slides.

Speakers

 

Bill McLaughlin Bill McLaughlin
VP of Product Development
EEG

Video: What is 525-Line Analog Video?

With an enjoyable retro feel, this accessible video on understanding how analogue video works is useful for those who have to work with SDI rasters, interlaced video, black and burst, subtitles and more. It’ll remind those of us who once knew, a few things since forgotten and is an enjoyable primer on the topic for anyone coming in fresh.

Displaced Gamers is a YouTube channel and their focus on video games is an enjoyable addition to this video which starts by explaining why analogue 525-line video is the same as 480i. Using a slow-motion video of a CRT (Cathode Ray Tube) TV, the video explains the interlacing technique and why consoles/computers would often use 240p.

We then move on to timing looking at the time spent drawing a line of video, 52.7 microseconds, and the need for horizontal and vertical blanking. Blanking periods, the video explains are there to cover the time that the CRT TV would spend moving the electron beam from one side of the TV to the other. As this was achieved by electromagnets, while these were changing their magnetic level, and hence the position of the beam, the beam would need to be turned off – blanked.

The importance of these housekeeping manoeuvres for older computers was that this was time they could use to perform calculations, free from the task of writing data in to the video buffer. But this was not just useful for computers, broadcasters could use some of the blanking to insert data – and they still do. We see in this video a VHS video played with the blanking clearly visible and the data lines flashing away.

For those who work with this technology still, for those who like history, for those who are intellectually curious and for those who like reminiscing, this is an enjoyable video and ideal for sharing with colleagues.

Watch now!
Speaker

Chris Kennedy Chris Kennedy
Displaced Gamers,YouTube Channel