Video: No-Reference QoE Assessment: Knowledge-based vs. Learning-based

Automatic assessment of video quality is essential for creating encoders, selecting vendors, choosing operating points and, for online streaming services, in ongoing service improvement. But getting a computer to understand what looks good and what looks bad to humans is not trivial. When the computer doesn’t have the source video to compare against, it’s even harder.

In this talk, Dr. Ahmed Badr from SSIMWAVE looks at how video quality assessment (VQA) works and goes into detail on No-Reference (NR) techniques. He starts by stating the case for VQA which is an extension, and often replacement for subjective scoring by people. Clearly this is time-consuming, can be more expensive due to involvement of people (and the time) plus requires specific viewing conditions. When done well, a whole, carefully decorated room is required. So when it comes to analysing all the video created by a TV station or automating per-title encoding optimisation, we know we have to remove the human element.

Ahmed moves on to discuss the challenges of No Reference VQA such as identifying intended blur or noise. NR VQA is a two-step process with the first being extracting features from the video. These features are then mapped to a quality model which can be done with a machine learning/AI process which is the technique which Ahmed analyses next. The first task is to come up with a dataset of videos which should be carefully chosen, then it’s important to choose a metric to use for the training, for instance, MS-SSIM or VMAF. This is needed so that the learning algorithm can get the feedback it needs to improve. The last two elements are choosing what you are optimising for, technically called a loss function, and then choosing an AI model for use.

The data set you create needs to be aimed at exploring a certain aspect or range of aspects of video. It could be that you want to optimise for sports, but if you need a broad array of genres, optimising for reducing compression or scaling artefacts may be the main theme of the video dataset. Ahmed talks about the millions of video samples that they have collated and how they’ve used that to create their metric called SSIMPLUS which can work both with a reference and without.

Watch now!
Speaker

Dr. Ahmed Badr Dr. Ahmed Badr
SSIMWAVE

Video: Futuristic Codecs and a Healthy Obsession with Video Startup Time

These next 12 months are going to see 3 new MPEG standards being released. What does this mean for the industry? How useful will they be and when can we start using them? MPEG’s coming to the market with a range of commercial models to show it’s learning from the mistakes of the past so it should be interesting to see the adoption levels in the year after their release. This is part of the second session of the Vienna Video Tech Meetup and delves into startup time for streaming services.

In the first talk, Dr. Christian Feldmann explains the current codec landscape highlighting the ubiquitous AVC (H.264), UHD’s friend, HEVC (H.265), and the newer VP9 & AV1. The latter two differentiate themselves by being free to used and are open, particularly AV1. Whilst slow, both the latter are seeing increasing adoption in streaming, but no one’s suggesting that AVC isn’t still the go-to codec for most online streaming.

Christian then introduces the three new codecs, EVC (Essential Video Coding), LCEVC (Low-Complexity Enhancement Video Coding) and VVC (Versatile Video Coding) all of which have different aims. We start by looking at EVC whose aim is too replicate the encoding efficiency of HEVC, but importantly to produce a royalty-free baseline profile as well as a main profile which improves efficiency further but with royalties. This will be the first time that you’ve been able to use an MPEG codec in this way to eliminate your liability for royalty payments. There is further protection in that if any of the tools is found to have patent problems, it can be individually turned off, the idea being that companies can have more confidence in deploying the new technology.

The next codec in the spotlight is LCEVC which uses an enhancement technique to encode video. The aim of this codec is to enable lower-end hardware to access high resolutions and/or lower bitrates. This can be useful in set-top boxes and for online streaming, but also for non-broadcast applications like small embedded recorders. It can achieve a light improvement in compression over HEVC, but it’s well known that HEVC is very computationally heavy.

LCEVC reduces computational needs by only encoding a lower resolution version (say, SD) of the video in a codec of your choice, whether that be AVC, HEVC or otherwise. The decoder will then decode this and upscale the video back to the original resolution, HD in this example. This would look soft, normally, but LCEVC also sends enhancement data to add back in the edges and detail that would have otherwise been lost. This can be done in CPU whilst the other decoding could be done by the dedicated AVC/HEVC hardware and naturally encoding/decoding a quarter-resolution image is much easier than the full resolution.

Lastly, VVC goes under the spotlight. This is the direct successor to HEVC and is also known as H.266. VVC naturally has the aim of improving compression over HEVC by the traditional 50% target but also has important optimisations for more types of content such as 360 degree video and screen content such as video games.

To finish this first Vienna Video Tech Meetup, Christoph Prager lays out the reasons he thinks that everyone involved in online streaming should obsess about Video Startup Time. After defining that he means the time between pressing play and seeing the first frame of video. The longer that delay, the assumption is that the longer the wait, the more users won’t bother watching. To understand what video streaming should be like, he examines Spotify’s example who have always had the goal of bringing the audio start time down to 200ms. Christophe points to this podcast for more details on what Spotify has done to optimise this metric which includes activating GUI elements before, strictly speaking, they can do anything because the audio still hasn’t loaded. This, however, has an impact of immediacy with perception being half the battle.

“for every additional second of startup delay, an additional 5.8% of your viewership leaves”

Christophe also draws on Akamai’s 2012 white paper which, among other things, investigated how startup time puts viewers off. Christophe also cites research from Snap who found that within 2 seconds, the entirety of the audience for that video would have gone. Snap, of course, to specialise in very short videos, but taken with the right caveats, this could indicate that Akamai’s numbers, if the research was repeated today, may be higher for 2020. Christophe finishes up by looking at the individual components which go towards adding latency to the user experience: Player startup time, DRM load time, Ad load time, Ad tag load time.

Watch now!
Speakers

Christian Feldmann Dr. Christian Feldmann
Team Lead Encoding,
Bitmovin
Christoph Prager Christoph Prager
Product Manager, Analytics
Bitmovin
Markus Hafellner Markus Hafellner
Product Manager, Encoding
Bitmovin

Video: Colour Theory

Understanding the way colour is recorded and processed in the broadcast chain is vital to ensuring its safe passage. Whilst there are plenty of people who work in part of the broadcast chain which shouldn’t touch colour, being purely there for transport, the reality is that if you don’t know how colour is dealt with under the hood, it’s not possible to any technical validation of the signal beyond ‘it looks alright!’. The problem being, if you don’t know what’s involved in displaying it correctly, or how it’s transported, how can you tell?

Ollie Kenchington has dropped into the CPV Common Room for this tutorial on colour which starts at the very basics and works up to four case studies at the end. He starts off by simply talking about how colours mix together. Ollie explains the difference between the world of paints, where mixing together is an act of subtracting colours and the world of mixing light which is about adding colours together. Whilst this might seem pedantic, it creates profound differences regarding what colour two mixed colours create. Pigments such as paints look that way because they only reflect the colour(s) you see. They simply don’t reflect the other colours. This is why they are called subtractive; shine a blue light on something that is pure red, and you will just see black, because there is no red light to reflect back. Lights, however, provide lights and look that way because they are sending out the light you see. So mixing a red and blue light will create magenta. This is known as additive colour mixing and introduces color.adobe.com which lets you discover new colour palettes.

The colour wheel is next on the agenda which Ollie explains allows you to talk about the amplitude of a colour – the distance the colour is from the centre of the circle – and the angle that defines the colour itself. But as important as it is to describe a colour in a document, it’s all the more important to understand how humans see colours. Ollie lays out the way that rods & cones work in the eye. That there is a central area that sees the best detail and has most of the cones. The cones, we see, are the cells that help us see colour. The fact there aren’t many cones in our periphery is covered up by our brains which interpolate colour from what they have seen and what they know about our current environment. Everyone is colour blind, Ollie explains, in our peripheral vision but the brain makes up for it all from what it knows about what you have seen. Overall, in your eye, sensitivity to blue is by far much less than that you have for green and then red. This is because, in evolutionary terms, there is much less important information gained by seeing detail in blue than in green, the colour of plants. Red, of course, helps understanding shades of green and brown which are both colours native to plants. The upshot of this, Ollie explains, is that when we come to processing light, we have to do it in a way that takes into account the human sensitivity to different wavelengths. This means that we can show three rectangles next to each other, red, green and blue, see them as similar brightnesses but then see that under the hood, we’ve reduced the intensity of the blue by 89 per cent, the red by 70 and the green by only 41. When added together, these show the correct greyscale brightness.

The CIE 1931 colour space is the next topic. The CIE 1931 colourspace shows all the colours that the human eye can see. Ollie demonstrates, by overlaying it on the graph that ITU-R Rec.709 – broadcast’s most well-known and most widely-used colourspace only provides 35% coverage of what our eyes can see. This makes the call for Rec 2020 from the proponents of UHD and ‘better pixels’, which covers 75%, all the more relevant.

Ollie next focuses in on acquisition talking about CMOS chips in cameras which are monochromatic by nature. As each pixel of a CMOS sensor only records how many photons it received, it is intrinsically monochrome. Therefore, in order to show colour, you need to put a Bayer colour filter array in front. Essentially this describes a pattern of red, blue and green filters above this pixel. With the filter in place, you know that the value you read from a given pixel represents just that single colour. If you put red, blue and green filters over a range of pixels on the sensor, you are able to reconstruct the colour of the incoming scene.

Ollie then starts to talk about reducing colour date. We an do this at source by only recording 8, rather than 10-bits of colour, but Ollie shows us a clear demonstration of when that doesn’t look good; typically 8-bit video lets itself down on sunsets, flesh tones or similar subtle. gradients. The same principle drives the HDR discussion regarding 10-bit Vs. 12 bit. With PQ being built for 12-bit, but realistic live production workflows for the next few years being 10-bit which HLG expects, there is plenty of water to go under the bridge before we see whether PQ’s 12-bit advantage really comes into its own outside of cinemas. Ollie also explains colour subsampling which gets a thorough explanation detailing not only 4:4:4 and 4:2:2 but also the less common examples.

The next section looks at ‘scopes’ also known as ‘waveform monitors’. Ollie starts with the histogram which shows you how much of your picture is a certain brightness helping understanding how exposed your picture is overall. With the histogram, the horizontal axis shows brightness with the left being black and the right being white. Whereas the waveform shows the brightness on the horizontal and then the x axis shows you the position in the picture that a certain brightness happens. This allows you to directly associate brightness values with objects in the scene. This can be done with the luma signal or the separate RGB which then allows you to understand the colour of that area. Vectorscope

Ollie then moves on to discussing balancing contrast looking at lift (lifting the black point), gamma (affects central), gain (altering the white point) and mixing that with shadows, midtones and highlights. He then talks about how the surroundings affect your perceived brightness of the picture and shows it with great boxes in different surrounds. Ollie demonstrates this as part of the slides in the presentation very effectively and talks about the need for standards to control this. When grading, he discusses the different gamma that screens should be set to for different types of work and discusses the standard which says that the ambient light in the surrounding room should be about 10% as bright as the screen displaying pure white.

The last part of the talk presents case studies of programmes and films looking at the way they used colour, saturation, costume and lighting to enhance and underwrite the story that was being told. This takeaway is the need to think of colour as a narrative element. Something that can be informed from and understood by wardrobe, visual look intention, wardrobe and lighting. The conversation about colour and grading should start early in the filming process and a key point Ollie makes is that this is not a conversation that costs a lot, but having it early in the production is priceless in terms of its impact on the cost and results of the project.

Watch now!
Speakers

Ollie Kenchington Ollie Kenchington
Owner & Creative Director,
Korro Films, Korro Academy

Video: 5G Technology

5G seems to offer so much, but there is a lot of nuance under the headlines. Which of the features will telcos actually provide? When will the spectrum become available? How will we cope with the new levels of complexity? Whilst for many 5G will simply ‘work’, when broadcasters look to use it for delivering programming, they need to look a few levels deeper.

In this wide-ranging video from the SMPTE Toronto Section, four speakers take us through the technologies at play and they ways they can be implemented to cut through the hype and help us understand what could actually be achieved, in time, using 5G technology.

Michael J Martin is first up who covers topics such as spectrum use, modulation, types of cells, beam forming and security. Regarding spectrum, Michael explains that 5G uses three frequency bands, the sub 1GHz spectrum that’s been in use for many years, a 3Ghz range and a millimetre range at 26Ghz.

“It’s going to be at least a decade until we get 5G as wonderful as 4G is today.”

Michael J Martin
Note that some countries already use other frequencies such as 1.8GHz which will also be available.The important issue is that the 26Ghz spectrum will typically not be available for over a year, so 5G roll-out starts in some of the existing bands or in the 3.4Ghz spectrum. A recurring theme in digital RF is the use of OFDM which has long been used by DVB and has been adopted by ATSC 3.0 as their modulation, too. OFDM allows different levels of robustness so you can optimise reach and bandwidth.

Michael highlights a problem faced in upgrading infrastructure to 5G, the amount of towers/sites and engineer availability. It’s simply going to take a long time to upgrade them all even in a small, dense environment. This will deal with the upgrade of existing large sites, but 5G provides also for smaller cells, (micro, pico and femto cells). These small cells are very important in delivering the millimetre wavelength part of the spectrum.

Network Slicing
Source: Michael J. Martin, MICAN Communications

We look at MIMO and beam forming next. MIMO is an important technology as it, effectively, collects reflected versions of the transmitted signals and processes them to create stronger reception. 5G uses MIMO in combination with beam forming where the transmitter itself electronically manipulates the transmitter array to focus the transmission and localise it to a specific receiver/number of receivers.

Lastly, Michael talks about Network Slicing which is possibly one of the most anticipated features of 5G by the broadcast community. The idea being that the broadcaster can reserve its own slice of spectrum so when sharing an environment with 30,000 other receivers, they will still have the bandwidth they need.

Our next speaker is Craig Snow from Huawei outlines how secondary networks can be created for companies for private use which, interestingly, partly uses separate frequencies from public network. Network slicing can be used to separate your enterprise 5G network into separate networks fro production, IT support etc. Craig then looks at the whole broadcast chain and shows where 5G can be used and we quickly see that there are many uses in live production as well as in distribution. This can also mean that remote production becomes more practical for some use cases.

Craig moves on to look at physical transmitter options showing a range of sub 1Kg transmitters, many of which have in-built Wi-Fi, and then shows how external microwave backhaul might look for a number of your buildings in a local area connecting back to a central tower.

Next is Sayan Sivanathan who works for Bell Mobility and goes in to more detail regarding the wider range of use cases for 5G. Starting by comparing it to 4G, highlighting the increased data rates, improved spectrum efficiency and connection density of devices, he paints a rosy picture of the future. All of these factors support use cases such as remote control and telemetry from automated vehicles (whether in industrial or public settings.)  Sayan then looks at the deployment status in the US, Europe and Korea. He shows the timeline for spectrum auction in Canada, talks through photos of  5G transmitters in the real world.

Global Mobile Data Traffic (Exabytes per month)
Source: Ericsson Mobility Report, Nov 2019

Finishing off today’s session is Tony Jones from MediaKind who focuses in on which 5G features are going to be useful for Media and Entertainment. One is ‘better video on mobile’. Tony picks up on a topic mentioned by Michael at the beginning of the video: processing at the edge. Edge processing, meaning having compute power at the closest point of the network to your end user allows you to deliver customised manifest and deal with rights management with minimal latency.

Tony explains how MediaKind worked with Intel and Ericsson to deliver 5G remote production for the 2018 US Open. 5G is often seen as a great way to make covering golf cheaper, more aesthetically pleasing and also quicker to rig.

The session ends with a Q&A

Watch now!
Speakers

Michael J Martin Michael J Martin
MICAN Communications
Blog: vividcomm.com
Tony Jones Tony Jones
Principal Technologist
MediaKind Global
Craig Snow Craig Snow
Enterprise Accounts Director,
Huawei
Sayan Sivanathan Sayan Sivanathan
Senior Manager – IoT, Smart Cities & 5G Business Development
Bell Mobility