Synchronised origins in streaming means that a player can switch from one origin to another without any errors or having to restart decoding allowing a much more seamless viewing experience. Adam Ross, speaking from his experience on the Comcast linear video packing team, takes us through the pros and cons of two approaches to synchronisation. This discussion centres around video going into an encoder, transcoder and then packager. This video is either split from a single source which helps keep the video and audio clocks aligned or the clocks are aligned in the encoder or transcoder through communication site A and B.
Keeping segments aligned isn’t too difficult as we just need to keep naming the same and keep them timed together. Whilst not trivial, manifests have many more layers of metadata to synchronised in the form of short-term metadata like content currently present in the manifest and long-term metadata like the dash period. For DASH streams, the [email protected] and [email protected] need to be the same. SegmentTimelines need to have the same start number mapping to the same content. For HLS, variant playlists need to be the same as well as the sequence numbering.
Adam proposes two methods of doing this. the first is Co-operative Packaging where each site sends metadata between the packagers so that they each make the same, more informed decisions. However, this is complicated to implement and produces a lot of cross-site traffic which can live-point introduce latency. The alternative is a Minimal Synchronisation strategy which relies much more on determinism. Given the same output from the transcoder, the packagers should make the same decisions. Each packager does still need to look at the other’s manifest to ensure it stays in sync and it can resync if not deemed impactful. Overall this second method is much simpler.
AV1’s royalty-free status continues to be very appealing, but in raw compression is it losing ground now to the newer codecs such as VVC? EVC has also introduced a royalty-free model which could also detract from AV1’s appeal and certainly is an improvement over HEVC’s patent debacle. We have very much moved into an ecosystem of patents rather than the MPEG2/AVC ‘monoculture’ of the 90s within broadcast. What better way to get a feel for the codecs but to put them to the test?
Dan Grois from Comcast has been looking at the new codecs VVC and EVC against AV1 and HEVC. VVC and EVC were both released last year and join LCEVC as the three most recent video codecs from MPEG (VVC was a collaboration between MPEG and ITU). In the same way, HEVC is known as H.265, VVC can be called H.266 and it draws its heritage from the HEVC too. EVC, on the other hand, is a new beast whose roots are absolutely shared with much of MPEG’s previous DCT-based codecs, but uniquely it has a mode that is totally royalty-free. Moreover, its high-performant mode which does include patented technology can be configured to exclude any individual patents that you don’t wish to use thus adding some confidence that businesses remain in control of their liabilities.
Dan starts by outlining the main features of the four codecs discussing their partitioning methods and prediction capabilities which range from inter-picture, intra-picture and predicting chroma from the luma picture. Some of these techniques have been tackled in previous talks such as this one, also from Mile High Video and this EVC overview and, finally, this excellent deep dive from SMPTE in to all of the codecs discussed today plus LCEVC.
Dan explains the testing he did which was based on the reference encoder models. These are encoders that implement all of the features of a codec but are not necessarily optimised for speed like a real-world implementation would be. Part of the work delivering real-world implementations is using sophisticated optimisations to get the maths done quickly and some is choosing which parts of the standard to implement. A reference encoder doesn’t skimp on implementation complexity, and there is seldom much time to optimise speed. However, they are well known and can be used to benchmark codecs against each other. AV1 was tested in two configurations since
AV1 needs special treatment in this comparison. Dan explains that AV1 doesn’t have the same approach to GOPs as MPEG so it’s well known that fixing its QP will make it inefficient, however, this is what’s necessary for a fair comparison so, in addition to this, it’s also run in VBR mode which allows it to use its GOP structure to the full such as AV1’s invisible frames which carry data which can be referenced by other frames but which are never actually displayed.
The videos tested range from 4K 10bit down to low resolution 8 bit. As expected VVC outperforms all other codecs. Against HEVC, it’s around 40% better though carrying with it a factor of 10 increase in encoding complexity. Note that these objective metrics tend to underrepresent subjective metrics by 5-10%. EVC consistently achieved 25 to 30% improvements over HEVC with only 4.5x the encoder complexity. As expected AV1’s fixed QP mode underperformed and increased data rate on anything which wasn’t UHD material but when run in VBR mode managed 20% over HEVC with only a 3x increase in complexity.
As complicated as SD to HD conversions seemed at the time, that’s nothing on the plethora of combinations available now. Dealing with BT 601 and 709 colour spaces along with aspect ratios and even conversions from NTSC/PAL kept everyone busy. With frame rates, different HDR formats and wide colour gamut (HDR) being just some of the current options, this talk considers whether it would be better to bring in a ‘house format’ as opposed to simply declaring your company to be a ‘ProRes HQ’ house and accepting any content, HDR or SDR, in ProRes rather than being more specific regarding the lifestyle of your videos.
This talk from Chris Seeger from NBCUniversal and Yasser Syed from Comcast discuss their two-year effort to document common workflow video format combinations talking to companies from content providers to broadcasters to service distributors. The result is a joint ITU-ISO document, now in its second edition, which provides a great resource for new workflows today.
Yasser makes the point that, in recent years, the volume of scripted workflows has increased significantly. This can motivate broadcasters to find quicker and more efficient ways of dealing with media in what can be a high-value set of workflows that are increasingly being formed from a variety of video types.
Discussing signalling is important because it brings workflows together. Looking at videos we see that multiple sources arrive on left, need to identify correctly and then converted. This video talks about keeping separate video codecs and the identifying metadata needed for contribution and distribution which is best done automatically. All combinations are possible, but take advantages o the best content, having everything converted into a single, HDR-friendy mezzanine format is the way forward.
We know AI is going to stick around. Whether it’s AI, Machine Learning, Deep Learning or by another name, it all stacks up to the same thing: we’re breaking away from fixed algorithms where one equation ‘does it all’ to a much more nuanced approached with a better result. This is true across all industries. Within the Broadcast industry, one way it can be used is in video and audio compression. Want to make an image smaller? Downsample it with a Convolutional Neural Network and it will look better than Lanczos. No surprise, then, that this is coming in full force to a compression technology near you.
In this talk from Comcast’s Dan Grois, we hear the ongoing work to super-charge the recently released VVC by replacing functional blocks with neural-networks-based technologies. VVC has already achieved 40-50% improvements over HEVC. From the work Dan’s involved with, we hear that more gains are looking promising by using neural networks.
Dan explains that deep neural networks recognise images in layers. The brain does the same thing having one area sensitive to lines and edges, another to objects, another part of the brain to faces etc. A Deep Neural Network works in a similar way.
During the development of VVC, Dan explains, neural network techniques were considered but deemed too memory- or computationally-intensive. Now, 6 years on from the inception of VVC, these techniques are now practical and are likely to result in a VVC version 2 with further compression improvements.
Dan enumerates the tests so far swapping out each of the functional blocks in turn: intra- and inter-frame prediction, up- and down-scaling, in-loop filtering etc. He even shows what it would look like in the encoder. Some blocks show improvements of less than 5%, but added together, there are significant gains to be had and whilst this update to VVC is still in the early stages, it seems clear that it will provide real benefits for those that can implement these improvements which, Dan highlights at the end, are likely to require more memory and computation than the current version VVC. For some, this will be well worth the savings.
Views and opinions expressed on this website are those of the author(s) and do not necessarily reflect those of SMPTE or SMPTE Members.
This website is presented for informational purposes only. Any reference to specific companies, products or services does not represent promotion, recommendation, or endorsement by SMPTE