Video: Everything You Want to Know About Captioning

Legally mandated in many countries, captions or subtitles are a vital part of both broadcast and streaming. But as the saying goes, the lower the bandwidth signals are the more complex to manage. Getting subtitles right in all their nuances is hard whether live or from post-production. And by getting it right, we’re talking about protocol, position, colour, delay, spelling and accuracy. So there’s a whole workflow just for subtitling which is what this video looks at.

EEG specialise in subtitling solutions so it’s no surprise their Sales Associate Matt Mello and VP of Product Development, Bill McLaughlin wanted to run a live Q&A session which, unusually was a pure one-hour Q&A with no initial presentation. All questions are shown on screen and are answered by Matt & Bill who look at the technology and specific products.

 

 

They start off by defining the terms ‘closed’ and ‘open’ captioning saying that open captions are shown in the picture itself, also known as ‘burnt in’. Closed, indicates they are hidden which often refers to the closed captions in the blanking of TV channels which are always sent but only displayed when the viewer asks their TV to decode and overlay them onto the picture as they watch. Whether closed or open, there is always the task of getting the subtitles at the right time and merging them into the video to ensure the words appear at the right time. As for the term subtitles vs captions, this really depends on where you are from. In the UK, ‘subtitles’ is used instead of ‘captions’ with the term ‘closed captions’ specifically referring to the North American closed captions standard. This is as opposed to Teletext subtitles which are different, but still inserted into the blanking of baseband video and only shown when the decoder is asked to display them. The Broadcast Knowledge uses the term subtitles to mean captions.

The duo next talk about live and pre-recorded subtitles with live being the tricky ones as generating live subtitling with minimal delay is difficult. The predominant method which replaced stenography is to have a person respeak the programme into a trained voice recognition programme which has a delay. However, the accuracy is much better than having a computer listen to the raw programme sound which may have all sorts of accents, loud background noise or overlapping speakers leaving the result much to be desired. Automatic solutions, however, don’t need scheduling unlike humans and there are now methods to input specialist vocabulary and indeed scripts ahead of time to help keep accuracy up.

Accuracy is another topic under the spotlight with Matt and Bill. They outline that accuracy is measured in different ways from a simplistic count of the number of incorrect words to weighted measures which look to see how important the incorrect words are and how much the meaning has changed. Looking at the videos from YouTube, we see that automated captions are generally less accurate than human-curated subtitles, but they do allow YouTube to meet its legal responsibility to stream with captions. Accuracy of around 98% should be taken, they advise, as effectively perfect with 95% being good and below 85% there’s a question whether it’s actually worth doing.

When you investigate services, you’ll inevitably see mention of EIA-608 and EIA-708 caption formats which are the North American SD and HD standards for carrying captions. These are also used for delivery to streaming services so retain relevance even though they originated for broadcast closed captions. One question asks if these subs can be edited after recording to which the response is ‘yes’ as part of a post-production workflow, but direct editing of the 608/708 file won’t work.

Other questions include subtitling in Zoom and other video conferencing apps, delivery of automated subtitles to a scoreboard, RTMP subtitling latency, switching between languages, mixing pre-captioned and live captioned material and converting to TTML captions for ATSC 3.0.

Watch now!
Speakers

Bill McLaughlin Bill McLaughlin
VP Product Development
EEG Enterprises
Matthew Mello Matthew Mello
Sales Associate,
EEG Enterprises

Video: Milan Video Tech on WebAssembly, DRM, Video Monitoring & Error KPIs in IPTV/OTT

Web Assembly, low-latency streaming, DRM and monitoring are the topics of this Milan Video Tech meeting, part of the 24-hour mega meetup. To keep evolving your services, you need to understand the newest technologies and be ready to use them when the time is right. In this video, we look at a basic DRM workflow, experiment with the latest player tech. work out how to distribute your service monitoring to be able to quickly diagnose issues and how to use monitoring to your advantage.

Evolution provides live Casino feeds since 2006 as part of a B2B (business to business) offering. With offices in 20 countries and over 800 tables, there’s a lot to do. They offer browser-based playback which does achieve low latencies using current Websockets and HLS technologies, down to 1.5 seconds, but Behnam Kakavand explains how they’re improving on that with a move to WebAssembly.

WebAssembly allows you to run pre-compiled code on any browser on any platform where ‘pre-compiled is a euphemism for ‘optimised’. The code tuns up to 4 times faster than interpreted javascript and gives you flexibility on which language to use to code in such as C, Rust, Go etc. Bahnam runs through the reasons they chose the WASM player which revolve around high levels of control of the whole playback experience and a reluctance to use Apple’s LL-HLS as its latency gains are too slow as well as their reluctance to use WebRTC which is unattractive because its fixed AVC transport implementation.

Without using WebAssemly, Behnam shows that you get little playback control in the case of native HTML5 elements. With MSE there is a lot more control but it’s not available on iOS. Using Web Assembly they can use any codec, customise the buffers and reduce battery usage. Behnam explains the workflow they use to compile the code into WebAssembly and talks about their future plans such as bringing SIMD operations into WebAssembly, bring down battery use, reduce player bundle size and use web codecs.

 

 

Andrea Fassina gives a great overview of DRM playback. Talking against a whiteboard, he shows how the workflow checks for user authentication to gain access to the copyrighted content. When they choose a video the selection, the request is sent out and the video is fetched from storage. The licence checker is a browser component that safely sends tata to the DRM licence server to check if they are allowed to view the playback. The DRM licence proxy server aggregates service and user information with IDs. If a positive decision is made, licences are sent back which include the decryption key.

Akamai’s Luca Mogali shows how to create video monitoring dashboards with near real-time logs and CMCD KPIs. Luca shows how by adding some extra data into the URL a player uses to access the CDN, this data can be passed back almost immediately to a logging server. Grafana or other tools can then be used to visualise this data which can give essential insight into what’s working and what’s not.

Finishing off the video, Alexy Malikov from Elecard explains how the use a distributed monitoring system to get to the bottom of issues that customers face. The probes which can sit before/after key pieces of equipment are important to use in logical fault finding. Doing all the central monitoring server would be possible, but this wouldn’t account for problems arriving locally at your eiquipment. When you have that in place, Alexy shows a number of case studies that become much easier to diagnose with the probes present than without. His examples of issues that could be fixed/mitigated by distributed monitoring include stuttering during ad breaks, streams becoming unavailable, download speeds problematic, system unable to detect audio on occasion.

Watch now!
Speakers

Behnam Kakavand Behnam Kakavand
Video R&D Engineer
Evolution
Luca Moglia Luca Moglia
Senior Solutions Engineer,
Akamai
Alexey Malikov Alexey Malikov
Business Development Director EMEA,
Elecard
Andrea Fassina Andrea Fassina
Web Technologies Developer
videodeveloper.io

Video: Pay TV operators OTT D2C strategies

Direct to consumer (D2C) strategies have become all the rage in streaming producing Paramount+, Disney+ and Disovery+ from a list of many more. What is it that broadcasters are capitalising on by doing this and how do they get on with their rivals and partners, the telcos? This panel from Dataxis, moderated by Julian Clover probes to find out more.

Lydia Fairfax who leads partnerships for Discovery+ starts by saying that the strategy is to maintain the investment in linear channels, which have just seen their strongest Q1 ratings. This is done by working the budgets for linear alongside an incremental budget for Discovery+ which allows them to mirror their younger demographic by producing shows for that demographic which can then be trialled on the linear channels to understand what content will carry well. This is all part of a bid to ensure that Discovery+ content can have a life on linear so that investment is also not wasted. Work is ongoing to see whether showing the first episode of new content on a free to air (FTA) channel first and driving viewers to Discovery+ is a good way forward to whether releasing to FTA after an initial Discovery+ exclusivity window is the best way to maximise the value of content.

 

 

Antonella Dominici from TIMvision explains the role of TIMvision as, for the most part, an aggregator that works with big names like Discovery, Eurosport, Sky and many others to deliver a sophisticated offering to its Italian audience. Making its own content as well, Antonella explains they aren’t going up against Netflix, rather they are seeing specific niches in Italian TV and filling them with their original content. However, another USP over streaming giants is that they also deliver the major linear channels that Italians watch such as Sky Italy and RAI.

A different perspective is offered by Bulsatcom CEO Stanislav Georgiev. Now 21 years old, it’s well known in Bulgaria as a DTH platform and it’s Stanislav’s job, he says to make their OTT offering a major part of their business. They have the benefit of being a trusted brand and Stanislav sees their role as almost purely an aggregator. Turning to a question on the continued relevance of STBs, he says that the set-top box brings ‘order to the chaos’

The STB is still very much present, says Peter Røder Lristensen of 24i and whilst Android TV is growing both in STBs and on TVs, Peter says it’s not a matter of choosing the best, rather you need to be on every device else you’re not relevant. STBs have their benefits, Lydia reinforces, allowing broadcasters to push their brand and offer a shortcut from their linear channels direct to the Discovery+ app using the red button. Antonella says that she sees the STB catering to the ‘lean back’ viewers who much more want to be guided as to what to watch. She says that people who know what they want will just go into the app and search for it. Peter adds that creating consistency and integration across all the devices is key including using Google Voice as a starting point.

Watch now!
Speakers

Lydia Fairfax Lydia Fairfax
SVP, Head of Commercial Partnerships, EMEA
Discovery
Antonella Dominici Antonella Dominici
Vice President TIMVISION & Entertainment Products
TIM
Peter Røder Kristensen Peter Røder Kristensen
Product Management,
24i
Stanislav Georgiev Stanislav Georgiev
CEO
Bulsatcom
Julian Clover Julian Clover
Editor,
Broadband TV News

Video: VVC – The new Versatile Video Coding standard

The Codec landscape is a more nuanced place than 5 years ago, but there will always be a place for a traditional Codec that cuts file sizes in half while harnessing recent increases in computation. Enter VVC (Versatile Video Codec) the successor to HEVC, created by MPEG and the ITU by JVET (Joint Video Experts Team), which delivers up to 50% compression improvement by evolving the HEVC toolset and adding new features.

In this video Virginie Drugeon from Panasonic takes us through VVC’s advances, its applications and performance in this IEEE BTS webinar. VVC aims not only to deliver better compression but has an emphasis on delivering at higher resolutions with HDR and as 10-bit video. It also acknowledges that natural video isn’t the only video used nowadays with much more content now including computer games and other computer-generated imagery. To achieve this, VVC has had to up its toolset.

 

 

Any codec is comprised of a whole set of tools that carry out different tasks. The amount that each of these tools is used to encode the video is controllable, to some extent, and is what gives rise to the different ‘profiles’, ‘levels’ and ‘tiers’ that are mentioned when dealing with MPEG codecs. These are necessary to allow for lower-powered decoding to be possible. Artificially constraining the capabilities of the encoder gives maximum performance guarantees for both the encoder and decoder which gives manufacturers control over the cost of their software and hardware products. Virginie walks us through many of these tools explaining what’s been improved.

Most codecs split the image up into blocks, not only MPEG codecs but the Chinese AVS codecs and AV1 also do. The more ways you have to do this, the better compression you can achieve but this adds more complexity to the encoding so each generation adds more options to balance compression against the extra computing power now available since the last codec. VVC allows rectangles rather than just squares to be used and the size of sections can now be 128×128 pixels, also covered in this Bitmovin video. This can be done separately for the chroma and luma channels.

Virginie explains that the encoding is done through predicting the next frame and sending the corrections on top of that. This means that the encoder needs to have a decoder within it so it can see what is decoded and understand the differences. Virginie explains there are three types of prediction. Intra prediction uses the current frame to predict the content of a block, inter prediction which uses other frames to predict video data and also a hybrid mode which uses both, new to VVC. There are now 93 directional intra prediction angles and the introduction of matrix-based intra prediction. This is an example of the beginning of the move to AI for codecs, a move which is seen as inevitable by The Broadcast Knowledge as we see more examples of how traditional mathematical algorithms are improved upon by AI, Machine Learning and/or Deep Learning. A good example of this is super-resolution. In this case, Virginie says that they used machine learning to generate some matrices which are used for the transform meaning that there’s no neural network within the codec, but that the matrices were created based on real-world data. It seems clear that as processing power increases, a neural network will be implemented in future codecs (whether MPEG or otherwise).

For screen encoding, we see that intra block copying (IBC) is still present from HEVC, explained here from 17:30 IBC allows part of a frame to be copied to another which is a great technique for computer-generated content. Whilst this was in HEVC it was not in the basic package of tools in HEVC meaning it was much less accessible as support in the decoders was often lacking. Two new tools are block differential pulse code modulation & transform skip with adapted residual coding each discussed, along with IBC in this free paper.

Virginie moves on to Coding performance explaining that the JVET reference software called VTM has been used to compare against HEVC’s HM reference and has shown, using PSNR, an average 41% improvement on luminance with screen content at 48%. Fraunhofer HHI’s VVenc software has been shown to be 49%.

Along with the ability to be applied to screen content and 360-degree video, the versatility in the title of the codec also refers to the different layers and tiers it has which stretch from 4:2:0 10 bit video all the way up to 4:4:4 video including spatial scalability. The main tier is intended for delivery applications and the high for contribution applications with framerates up to 960 fps, up from 300 in HEVC. There are levels defined all the way up to 8K. Virginie spends some time explaining NAL units which are in common with HEVC and AVC, explained here from slide 22 along with the VCL (Video Coding Layer) which Virginie also covers.

Random access has long been essential for linear broadcast video but now also streaming video. This is done with IDR (Instantaneous Decoding Refresh), CRA (Clean Random Access) and GDR (Gradual Decoding Refresh). IDR is well known already, but GDR is a new addition which seeks to smooth out the bitrate. With a traditional IBBPBBPBBI GOP structure, there will be a periodic peak in bitrate because the I frames are much larger than the B and, indeed, P frames. The idea with GDR is to have the I frame gradually transmitted over a number of frames spreading out the peak. This disadvantage is you need to wait longer until you have your full I frame available.

Virginie introduces subpictures which are a major development in VVC allowing separately encoded pictures within the same stream. Effectively creating a multiplexed stream, sections of the picture can be swapped out for other videos. For instance, if you wanted a picture in picture, you could swap the thumbnail video stream before the decoder meaning you only need one decoder for the whole picture. To do the same without VVC, you would need two decoders. Subpictures have found use in 360 video allowing reduced bitrate where only the part which is being watched is shown in high quality. By manipulating the bitstream at the sender end.

Before finishing by explaining that VVC can be carried by both MPEG’s ISO BMFF and MPEG2 Transport Streams, Virginie covers Reference Picture Resampling, also covered in this video from Seattle Video Tech allows reference frames of one resolution to be an I frame for another resolution stream. This has applications in adaptive streaming and spatial scalability. Virginie also covers the enhanced timing available with HRD

Watch now!
Video is free to watch
Speaker

Virginie Drugeon Virginie Drugeon
Senior Engineer Digital TV Standardisation,
Panasonic