Video: Everything You Want to Know About Captioning

Legally mandated in many countries, captions or subtitles are a vital part of both broadcast and streaming. But as the saying goes, the lower the bandwidth signals are the more complex to manage. Getting subtitles right in all their nuances is hard whether live or from post-production. And by getting it right, we’re talking about protocol, position, colour, delay, spelling and accuracy. So there’s a whole workflow just for subtitling which is what this video looks at.

EEG specialise in subtitling solutions so it’s no surprise their Sales Associate Matt Mello and VP of Product Development, Bill McLaughlin wanted to run a live Q&A session which, unusually was a pure one-hour Q&A with no initial presentation. All questions are shown on screen and are answered by Matt & Bill who look at the technology and specific products.

They start off by defining the terms ‘closed’ and ‘open’ captioning saying that open captions are shown in the picture itself, also known as ‘burnt in’. Closed, indicates they are hidden which often refers to the closed captions in the blanking of TV channels which are always sent but only displayed when the viewer asks their TV to decode and overlay them onto the picture as they watch. Whether closed or open, there is always the task of getting the subtitles at the right time and merging them into the video to ensure the words appear at the right time. As for the term subtitles vs captions, this really depends on where you are from. In the UK, ‘subtitles’ is used instead of ‘captions’ with the term ‘closed captions’ specifically referring to the North American closed captions standard. This is as opposed to Teletext subtitles which are different, but still inserted into the blanking of baseband video and only shown when the decoder is asked to display them. The Broadcast Knowledge uses the term subtitles to mean captions.

The duo next talk about live and pre-recorded subtitles with live being the tricky ones as generating live subtitling with minimal delay is difficult. The predominant method which replaced stenography is to have a person respeak the programme into a trained voice recognition programme which has a delay. However, the accuracy is much better than having a computer listen to the raw programme sound which may have all sorts of accents, loud background noise or overlapping speakers leaving the result much to be desired. Automatic solutions, however, don’t need scheduling unlike humans and there are now methods to input specialist vocabulary and indeed scripts ahead of time to help keep accuracy up.

Accuracy is another topic under the spotlight with Matt and Bill. They outline that accuracy is measured in different ways from a simplistic count of the number of incorrect words to weighted measures which look to see how important the incorrect words are and how much the meaning has changed. Looking at the videos from YouTube, we see that automated captions are generally less accurate than human-curated subtitles, but they do allow YouTube to meet its legal responsibility to stream with captions. Accuracy of around 98% should be taken, they advise, as effectively perfect with 95% being good and below 85% there’s a question whether it’s actually worth doing.

When you investigate services, you’ll inevitably see mention of EIA-608 and EIA-708 caption formats which are the North American SD and HD standards for carrying captions. These are also used for delivery to streaming services so retain relevance even though they originated for broadcast closed captions. One question asks if these subs can be edited after recording to which the response is ‘yes’ as part of a post-production workflow, but direct editing of the 608/708 file won’t work.

Other questions include subtitling in Zoom and other video conferencing apps, delivery of automated subtitles to a scoreboard, RTMP subtitling latency, switching between languages, mixing pre-captioned and live captioned material and converting to TTML captions for ATSC 3.0.

Watch now!
Speakers