Video: AES67 & SMPTE ST 2110 Timing and Synchronization

Good timing is essential in production for AES67 audio and SMPTE ST 2110. Delivering timing is no longer a matter of delivering a signal throughout your facility, over IP timing is bidirectional and forms a system which should be monitored and managed. Timing distribution has always needed design and architecture, but the detail and understanding needed are much more. At the beginning of this talk, Andreas Hildebrand explains why we need to bother with such complexity, after all, we got along very well for many years without it! Non-IP timing signals are distributed on their own cables as part of their own system. There are some parts of the chain which can get away without timing signals, but when they are needed, they are on a separate cable. With IP, having a separate network for distribution of timing doesn’t make sense so, whether you have an analogue or digital timing signal, that needs to be moving into the IP domain. But how much accuracy in timing to you need? Network devices already widely use NTP which can achieve an accuracy of less than a millisecond. Andreas explains that this isn’t enough for professional audio. At 48Khz, AES samples happen at an accuracy of plus or minus 10 microseconds with 192KHz going down to 2.5 microseconds. As your timing signal has to be less than the accuracy you need, this means we need to achieve nanosecond precision.

Daniel Boldt from timing specialists Meinberg is the focus of this talk explaining how we achieve this nano-second precision. Enter PTP, the Precision Time Protocol. This is a cross-industry standard from the IEEE uses in telcoms, power, finance and in many others wherever a network and its devices need to understand the time. It’s not a static standard, Daniel explains, and it’s just about to see its third revision which, like the last, adds features.

Before finding out about the latest changes, Daniel explains how PTP works in the first place; how is it possible to accurately derive time down to the nanosecond over a network which will have variable propagation times? We see how timestamps are introduced into the network interface controller (NIC) at the last moment allowing the timestamps to be created in hardware which removes some of the variable delays that is typical in software. This happens, Daniel shows, in the switch as well as in the server network cards. This article will refer to either a primary clock or a grand master. Daniel steps us through the messages exchanged between the primary and secondary clock which is the interaction at the heart of the protocol. The key is that after the primary has sent a timestamp, the secondary sends its timestamp to the primary which replies saying the time it received the secondary the reply. The secondary ends up with 4 timestamps that it can combine to determine its offset from the primary’s time and the delay in receiving messages. Applying this information allows it to correct the clock very accurately.

PTP Primary-Secondary Message Exchange.
Source: Meinberg

Most broadcasters would prefer to have more than one grandmaster clock but if there are multiple clocks, how do you choose which to sync from? Timing systems have long used strata whereby clocks are rated based on accuracy, either for internal accuracy & stability or by what they are synched to. This is also true for PTP and is part of the considerations in the ‘Best Master Clock Algorithm’. The BMCA starts by allowing a time source to assess its own accuracy and then search for better options on the network. Clocks announce themselves to the network and by listening to other announcements, a clock can decide if it should become a primary clock if, for instance, it hears no announce messages at all. For devices which should never be a grand primary, you can force them never to decide to become grand masters. This is a requisite for audio devices participating in ST 2110-3x.

Passing PTP around the network takes some care and is most easily done by using switches which understand PTP. These switches either run a ‘boundary clock’ or are ‘transparent clocks’. Daniel explores both of these scenarios explaining how the boundary clock switch is able to run multiple primary and secondary clocks depending on what is connected on each interface. We also see what work the switches have to do behind the scenes to maintain timing precision in transparent mode. In summary, Daniel summaries boundary clocks as being good for hierarchical systems and scales well but requires continuous monitoring whereas transparent clocks are simpler to deploy and require minimal monitoring. The main issue with transparent clocks is that they don’t scale well as all your timing messages are still going back to one main clock which could get overwhelmed.

SMPTE 2022-7 has been a very successful standard as its reliance only on RTP has allowed it to be widely applicable to compressed and uncompressed IP flows. It is often used in 2110 networks, too, where two separate networks are run and brought together at the receiving device. That device, on a packet-by-packet basis, is free to derive its audio/video stream from either network. This requires, however, exactly the same timing on both networks so Daniel looks at an example diagram where this PTP sharing is shown.

PTP’s still evolving and in this next section, Daniel takes us through some of the coming improvements which are also outlined at Meinberg’s blog. These are profile isolation, multi-domain clocks, security improvements and more.

Andreas takes the final section of the webinar to explain how we use PTP in media networks. All receivers will have the same clock which could be derived from GPS removing the need to distribute PTP between sites. 2110 is based on RTP which requires a timestamp to be added to every packet delivered to the network. RTP is a wrapper around IP packets which includes a timestamp which can be derived from the media clock counter.

Andreas looks at how accurate RTP delivery is achieved, dealing with offset values, populating the timestamp from the PTP clock for realties streams and he explains how the playout delay is calculated from the link offset. Finally, he shows the relatively simple process of synchronisation art the playout device. With all the timestamps in the system, synchronising playback of audio, video and metadata using buffers can be achieved fairly easily. Unfortunately, timestamps are easily destroyed by secondary processing (for instance loudness adjustment for an audio stream). Clearly, if this happened, synchronisation at the receiver would be broken. Whilst this will be addressed by out-of-band messaging in future standards, for now, this is managed by a broadcast controller which can take delay information from processing stages and distribute this to receivers.

Watch now!
Speakers

Daniel Boldt Daniel Boldt
Head of Software Development,
Meinberg
Andreas Hildebrand Andreas Hildebrand
RAVENNA Technology Evangelist,
ALC NetworX

Video: No-Reference QoE Assessment: Knowledge-based vs. Learning-based

Automatic assessment of video quality is essential for creating encoders, selecting vendors, choosing operating points and, for online streaming services, in ongoing service improvement. But getting a computer to understand what looks good and what looks bad to humans is not trivial. When the computer doesn’t have the source video to compare against, it’s even harder.

In this talk, Dr. Ahmed Badr from SSIMWAVE looks at how video quality assessment (VQA) works and goes into detail on No-Reference (NR) techniques. He starts by stating the case for VQA which is an extension, and often replacement for subjective scoring by people. Clearly this is time-consuming, can be more expensive due to involvement of people (and the time) plus requires specific viewing conditions. When done well, a whole, carefully decorated room is required. So when it comes to analysing all the video created by a TV station or automating per-title encoding optimisation, we know we have to remove the human element.

Ahmed moves on to discuss the challenges of No Reference VQA such as identifying intended blur or noise. NR VQA is a two-step process with the first being extracting features from the video. These features are then mapped to a quality model which can be done with a machine learning/AI process which is the technique which Ahmed analyses next. The first task is to come up with a dataset of videos which should be carefully chosen, then it’s important to choose a metric to use for the training, for instance, MS-SSIM or VMAF. This is needed so that the learning algorithm can get the feedback it needs to improve. The last two elements are choosing what you are optimising for, technically called a loss function, and then choosing an AI model for use.

The data set you create needs to be aimed at exploring a certain aspect or range of aspects of video. It could be that you want to optimise for sports, but if you need a broad array of genres, optimising for reducing compression or scaling artefacts may be the main theme of the video dataset. Ahmed talks about the millions of video samples that they have collated and how they’ve used that to create their metric called SSIMPLUS which can work both with a reference and without.

Watch now!
Speaker

Dr. Ahmed Badr Dr. Ahmed Badr
SSIMWAVE

Video: Futuristic Codecs and a Healthy Obsession with Video Startup Time

These next 12 months are going to see 3 new MPEG standards being released. What does this mean for the industry? How useful will they be and when can we start using them? MPEG’s coming to the market with a range of commercial models to show it’s learning from the mistakes of the past so it should be interesting to see the adoption levels in the year after their release. This is part of the second session of the Vienna Video Tech Meetup and delves into startup time for streaming services.

In the first talk, Dr. Christian Feldmann explains the current codec landscape highlighting the ubiquitous AVC (H.264), UHD’s friend, HEVC (H.265), and the newer VP9 & AV1. The latter two differentiate themselves by being free to used and are open, particularly AV1. Whilst slow, both the latter are seeing increasing adoption in streaming, but no one’s suggesting that AVC isn’t still the go-to codec for most online streaming.

Christian then introduces the three new codecs, EVC (Essential Video Coding), LCEVC (Low-Complexity Enhancement Video Coding) and VVC (Versatile Video Coding) all of which have different aims. We start by looking at EVC whose aim is too replicate the encoding efficiency of HEVC, but importantly to produce a royalty-free baseline profile as well as a main profile which improves efficiency further but with royalties. This will be the first time that you’ve been able to use an MPEG codec in this way to eliminate your liability for royalty payments. There is further protection in that if any of the tools is found to have patent problems, it can be individually turned off, the idea being that companies can have more confidence in deploying the new technology.

The next codec in the spotlight is LCEVC which uses an enhancement technique to encode video. The aim of this codec is to enable lower-end hardware to access high resolutions and/or lower bitrates. This can be useful in set-top boxes and for online streaming, but also for non-broadcast applications like small embedded recorders. It can achieve a light improvement in compression over HEVC, but it’s well known that HEVC is very computationally heavy.

LCEVC reduces computational needs by only encoding a lower resolution version (say, SD) of the video in a codec of your choice, whether that be AVC, HEVC or otherwise. The decoder will then decode this and upscale the video back to the original resolution, HD in this example. This would look soft, normally, but LCEVC also sends enhancement data to add back in the edges and detail that would have otherwise been lost. This can be done in CPU whilst the other decoding could be done by the dedicated AVC/HEVC hardware and naturally encoding/decoding a quarter-resolution image is much easier than the full resolution.

Lastly, VVC goes under the spotlight. This is the direct successor to HEVC and is also known as H.266. VVC naturally has the aim of improving compression over HEVC by the traditional 50% target but also has important optimisations for more types of content such as 360 degree video and screen content such as video games.

To finish this first Vienna Video Tech Meetup, Christoph Prager lays out the reasons he thinks that everyone involved in online streaming should obsess about Video Startup Time. After defining that he means the time between pressing play and seeing the first frame of video. The longer that delay, the assumption is that the longer the wait, the more users won’t bother watching. To understand what video streaming should be like, he examines Spotify’s example who have always had the goal of bringing the audio start time down to 200ms. Christophe points to this podcast for more details on what Spotify has done to optimise this metric which includes activating GUI elements before, strictly speaking, they can do anything because the audio still hasn’t loaded. This, however, has an impact of immediacy with perception being half the battle.

“for every additional second of startup delay, an additional 5.8% of your viewership leaves”

Christophe also draws on Akamai’s 2012 white paper which, among other things, investigated how startup time puts viewers off. Christophe also cites research from Snap who found that within 2 seconds, the entirety of the audience for that video would have gone. Snap, of course, to specialise in very short videos, but taken with the right caveats, this could indicate that Akamai’s numbers, if the research was repeated today, may be higher for 2020. Christophe finishes up by looking at the individual components which go towards adding latency to the user experience: Player startup time, DRM load time, Ad load time, Ad tag load time.

Watch now!
Speakers

Christian Feldmann Dr. Christian Feldmann
Team Lead Encoding,
Bitmovin
Christoph Prager Christoph Prager
Product Manager, Analytics
Bitmovin
Markus Hafellner Markus Hafellner
Product Manager, Encoding
Bitmovin

Video: Colour Theory

Understanding the way colour is recorded and processed in the broadcast chain is vital to ensuring its safe passage. Whilst there are plenty of people who work in part of the broadcast chain which shouldn’t touch colour, being purely there for transport, the reality is that if you don’t know how colour is dealt with under the hood, it’s not possible to any technical validation of the signal beyond ‘it looks alright!’. The problem being, if you don’t know what’s involved in displaying it correctly, or how it’s transported, how can you tell?

Ollie Kenchington has dropped into the CPV Common Room for this tutorial on colour which starts at the very basics and works up to four case studies at the end. He starts off by simply talking about how colours mix together. Ollie explains the difference between the world of paints, where mixing together is an act of subtracting colours and the world of mixing light which is about adding colours together. Whilst this might seem pedantic, it creates profound differences regarding what colour two mixed colours create. Pigments such as paints look that way because they only reflect the colour(s) you see. They simply don’t reflect the other colours. This is why they are called subtractive; shine a blue light on something that is pure red, and you will just see black, because there is no red light to reflect back. Lights, however, provide lights and look that way because they are sending out the light you see. So mixing a red and blue light will create magenta. This is known as additive colour mixing and introduces color.adobe.com which lets you discover new colour palettes.

The colour wheel is next on the agenda which Ollie explains allows you to talk about the amplitude of a colour – the distance the colour is from the centre of the circle – and the angle that defines the colour itself. But as important as it is to describe a colour in a document, it’s all the more important to understand how humans see colours. Ollie lays out the way that rods & cones work in the eye. That there is a central area that sees the best detail and has most of the cones. The cones, we see, are the cells that help us see colour. The fact there aren’t many cones in our periphery is covered up by our brains which interpolate colour from what they have seen and what they know about our current environment. Everyone is colour blind, Ollie explains, in our peripheral vision but the brain makes up for it all from what it knows about what you have seen. Overall, in your eye, sensitivity to blue is by far much less than that you have for green and then red. This is because, in evolutionary terms, there is much less important information gained by seeing detail in blue than in green, the colour of plants. Red, of course, helps understanding shades of green and brown which are both colours native to plants. The upshot of this, Ollie explains, is that when we come to processing light, we have to do it in a way that takes into account the human sensitivity to different wavelengths. This means that we can show three rectangles next to each other, red, green and blue, see them as similar brightnesses but then see that under the hood, we’ve reduced the intensity of the blue by 89 per cent, the red by 70 and the green by only 41. When added together, these show the correct greyscale brightness.

The CIE 1931 colour space is the next topic. The CIE 1931 colourspace shows all the colours that the human eye can see. Ollie demonstrates, by overlaying it on the graph that ITU-R Rec.709 – broadcast’s most well-known and most widely-used colourspace only provides 35% coverage of what our eyes can see. This makes the call for Rec 2020 from the proponents of UHD and ‘better pixels’, which covers 75%, all the more relevant.

Ollie next focuses in on acquisition talking about CMOS chips in cameras which are monochromatic by nature. As each pixel of a CMOS sensor only records how many photons it received, it is intrinsically monochrome. Therefore, in order to show colour, you need to put a Bayer colour filter array in front. Essentially this describes a pattern of red, blue and green filters above this pixel. With the filter in place, you know that the value you read from a given pixel represents just that single colour. If you put red, blue and green filters over a range of pixels on the sensor, you are able to reconstruct the colour of the incoming scene.

Ollie then starts to talk about reducing colour date. We an do this at source by only recording 8, rather than 10-bits of colour, but Ollie shows us a clear demonstration of when that doesn’t look good; typically 8-bit video lets itself down on sunsets, flesh tones or similar subtle. gradients. The same principle drives the HDR discussion regarding 10-bit Vs. 12 bit. With PQ being built for 12-bit, but realistic live production workflows for the next few years being 10-bit which HLG expects, there is plenty of water to go under the bridge before we see whether PQ’s 12-bit advantage really comes into its own outside of cinemas. Ollie also explains colour subsampling which gets a thorough explanation detailing not only 4:4:4 and 4:2:2 but also the less common examples.

The next section looks at ‘scopes’ also known as ‘waveform monitors’. Ollie starts with the histogram which shows you how much of your picture is a certain brightness helping understanding how exposed your picture is overall. With the histogram, the horizontal axis shows brightness with the left being black and the right being white. Whereas the waveform shows the brightness on the horizontal and then the x axis shows you the position in the picture that a certain brightness happens. This allows you to directly associate brightness values with objects in the scene. This can be done with the luma signal or the separate RGB which then allows you to understand the colour of that area. Vectorscope

Ollie then moves on to discussing balancing contrast looking at lift (lifting the black point), gamma (affects central), gain (altering the white point) and mixing that with shadows, midtones and highlights. He then talks about how the surroundings affect your perceived brightness of the picture and shows it with great boxes in different surrounds. Ollie demonstrates this as part of the slides in the presentation very effectively and talks about the need for standards to control this. When grading, he discusses the different gamma that screens should be set to for different types of work and discusses the standard which says that the ambient light in the surrounding room should be about 10% as bright as the screen displaying pure white.

The last part of the talk presents case studies of programmes and films looking at the way they used colour, saturation, costume and lighting to enhance and underwrite the story that was being told. This takeaway is the need to think of colour as a narrative element. Something that can be informed from and understood by wardrobe, visual look intention, wardrobe and lighting. The conversation about colour and grading should start early in the filming process and a key point Ollie makes is that this is not a conversation that costs a lot, but having it early in the production is priceless in terms of its impact on the cost and results of the project.

Watch now!
Speakers

Ollie Kenchington Ollie Kenchington
Owner & Creative Director,
Korro Films, Korro Academy