Video: Banding Impairment Detection

It’s one of the most common visual artefacts affecting both video and images. The scourge of the beautiful sunset and the enemy of natural skin tones, banding is very noticeable as it’s not seen in nature. Banding happens when there is not enough bit depth to allow for a smooth gradient of colour or brightness which leads to strips of one shade and an abrupt change to a strip of the next, clearly different, shade.

In this Video Tech talk, SSIMWAVE’s Dr. Hojat Yeganeh explains what can be done to reduce or eliminate banding. He starts by explaining how banding is created during compression, where the quantiser has reduced the accuracy of otherwise unique pixels to very similar numbers leaving them looking the same.

Dr. Hojat explains why we see these edges so clearly. By both looking at how contrast is defined but also by referencing Dolby’s famous graph showing contrast steps against luminance where they plotted 10-bit HDR against 12-bit HDR and show that the 12-bit PQ image is always below the ‘Barten limit’ which is the threshold beyond which no contrast steps are visible. It shows that a 10-bit HDR image is always susceptible to showing quantised, i.e. banded, steps.

Why do we deliver 10-bit HDR video if it can still show banding? This is because in real footage, camera noise and film grain serve to break up the bands. Dr. Hojat explains that this random noise amounts to ‘dithering’. Well known in both audio and video, when you add random noise which changes over time, humans stop being able to see the bands. TV manufacturers also apply dithering to the picture before showing which can further break up banding, at the cost of more noise on the image.

How can you automatically detect banding? We hear that typical metrics like VMAF and SSIM aren’t usefully sensitive to banding. SSIMWAVE’s SSIMPLUS metric, on the other hand, has been created to also be able to create a banding detection map which helps with the automatic identification of banding.

The video finishes with questions including when banding is part of artistic intention, types of metrics not identifiable by typical metrics, consumer display limitations among others.

Watch now!

Dr. Hojat Yeganeh Dr. Hojat Yeganeh
Senior Member Technical Staff,

Video: Broadcast Fundamentals: High Dynamic Range

Update: Unfortunately CVP choose to take down this video within 12 hours of this article going live. But there’s good news if you’re interested in HDR. Firstly, you can find the outline and some of the basics of the talk explained below. Secondly, at The Broadcast Knowledge there are plenty of talks discussing HDR! Here’s hoping CVP bring the video back.

Why is High Dynamic Range is like getting a giraffe on a tube train? HDR continues its ascent. Super Bowl LIV was filmed in HDR this year, Sky in the UK has launched HDR and many of the big streaming services support it including Disney+, Prime and Netflix. So as it slowly takes its place, we look at what it is and how it’s achieved in the camera and in production.

Neil Thompson, an Sony Independent Certified Expert, takes a seat in the CVP Common Room to lead us through HDR from the start and explain how giraffes are part of the equation. Dynamic Range makes up two thirds of HDR, so he starts by explaining what it is with an analogy to audio. When you turn up the speakers so they start to distort, that’s the top of your range. The bottom is silence – or rather what you can hear over the quiet hiss that all audio systems have. Similarly in cameras, you can have bright pixels which are a different brightness to the next which represents the top of your range, and the dithering blacks which are the bottom of your range. In video, if you go too bright, all pixels become white even if the subject’s brightness varies which the equivalent of the audio distortion.

With the basic explanation out of the way, Neil moves on to describing the amount or size of dynamic range (DR) which can be done either in stops, contrast ratio or signal to noise ratio. He compares ‘stops’ to a bucket of water with some sludge at the bottom where the range is between the top of sludge and the rim of the bucket. One stop, he explains, is a halving of the range. With the bucket analogy, if you can go half way down the bucket and still hit clear water, you have 1 stop of dynamic range. If you can then go a quarter down with clean water, you have 2 stops. By the time you get to 1/32nd you have 5 stops. If going to 1/64 of the height of the bucket means you end up in the sludge, your system has 5 stops of dynamic range. Reducing the sludge so there’s clear water at 1/64th the height, which in cameras means reducing the noise in the blacks, is one way of increasing the dynamic range of your acquisition.

Update: Unfortunately CVP choose to take down this video within 12 hours of this article going live. But there’s good news if you’re interested in HDR. Firstly, you can find the outline and some of the basics of the talk explained below. Secondly, at The Broadcast Knowledge there are plenty of talks discussing HDR! Here’s hoping CVP bring the video back.

If you would like to know how lenses fit into the equation of gathering light, check out this talk from Cannon’s Larry Thorpe.

Neil looks next at the range of light that we see in real life from sunlight to looking at the stars at night. Our eye has 14 stops of range, though with our iris, we can see the equivalent of 24 stops. Similarly, cameras use an iris to regulate the light incoming which helps move the restricted dynamic range of the camera into the right range of brightness for our shot.

Of course, once you have gathered the light, you need to display it again. Displays’ ability to produce light is measured in ‘nits’, which is the amount of light per metre squared. Knowing how many nits a displays helps you understand the brightness it can show with 1000 nits, currently, being a typical HDR display. Of course, dynamic range is as much about the blacks as the brightness. OLED screens are fantastic at having low blacks, though their brightness can be quite low. LEDs, conversely, Neil explains, can go very bright but the blacks do suffer. You have to also take into account the location of a display device to understand what range it needs. In a dim gallery you can spend longer caring about the blacks, but many places are so bright, the top end is much more important than the blacks.

With the acquisition side explained, Neil moves on to transmission of HDR and it’s like getting a giraffe on a tube train. Neil relates the already familiar ‘log profiles’. There are two HDR curves, known as transfer functions, PQ from Dolby and HLG (Hybrig Log Gamma). Neil looks at which profiles are best for each part of the production workflow and then explains how PQ differs from HLG in terms of expressing brightness levels. In HLG, the brightest part of the signal tells the display device to output as brightly as it can. A PQ signal, however, reserves the brightest signal for 10,000 nits – far higher than displays available today. This means that we need to do some work to deal with the situation where your display isn’t as bright as the one used to master the signal. Neil discusses how we do that with metadata.

Finishing off the talk, Neil takes questions from the audience, but also walks through a long list of questions he brought along including discussing ‘how bright is too bright?’, what to look for in an engineering monitor, lighting for HDR and costs.

Watch now!

Neil Thompson Neil Thompson
Freelance Engineer & Trainer

Video: Canon Lenses – A Tale of Three Formats

Lenses are seen by some a black art, by some as a mass of complex physics equations and others who see them as their creative window onto the stories that need to be told. Whilst there is an art behind using lenses, and it’s true making them is complex, understanding how to choose lenses doesn’t require PhD academia.

SMPTE Fellow Larry Thorpe from Canon is here to make the complex accessible as he kicks off talking about lens specifications. He discusses the 2/3-inch image format comparing it with super 35 and full frame. He outlines the specs that are most discussed when purchasing and choosing lenses and shows the balancing act that all lenses are, wanting to maximise sharpness whilst minimising chromatic aberration. On the subject of sharpness, Larry moves on to discussing the way the camera’s ability to sample the video interacts with the lenses ability to capture optical resolution.

Larry considers a normal 1920×1080 HD raster with reference to the physical size of a TV 2/3inch sensor. That works out to be approximately 100 line pairs per millimetre. Packing that into 1mm is tricky if you wish to also maintain quality of the lines. The ability to transfer this resolution is captured by the MTF – the Modulation Transfer Function. This documents the contrast you would see then certain frequencies are viewed through the lens. Larry shows that for a typical lens, this 100 line pairs would have 70% of the original contrast. The higher the frequency, the lower the contrast until it just becomes a flat grey. Larry then looks at a 4K lens showing that it’s needs are 200 line pairs per mm and looking at the MTF, we see that we’re only reaching 50% contrast

Aberrations are important to understand as every lens suffers from them. Larry walks through the 5 classical aberrations, focus and chromatic. To the beginner, chromatic aberrations are, perhaps, the most obvious where colours are seen on the edge of objects, often purple. This is also known as colour fringing. Larry talks about how aperture size can minimise the effect and keeping your image above the 50% contrast limit in the MTF will keep chromatic aberration from being obvious. As a reality check, we then see the limits that have been calculated as limits beyond which it’s simply not possible to improve. Using these graphs we see why 4K lenses offer less opportunity to stop down than HD lenses.

Sharpness zones are zones in lenses optimised for different levels of sharpness. Within the centre, unsurprisingly is the highest sharpness as that’s where most action is. There is then a middle and an outer zone which are progressively less sharp. The reason for this is to recognise that it’s not possible to make the whole image sharp to the same degree. By doing this we are able to create a flatter central zone but with a manage decrease at the corners.

Larry moves on to cover HDR an mentions a recent programme on Fox which was shot in 1080p HDR making the point that HDR is not a ‘4K technology’. He also makes the point that HDR is about the low-lights as well as the specular highlights, so a lens’s ability to be low-noise in the blacks is important an whilst this is not often a problem for SDR, with HDR we are now seeing this coming up more often. For dramas and similar genres, it’s actually very important to be able to shoot whole scenes in low light and Larry shows that the large number of glass elements in lenses is responsible for the low light performance being suboptimal. With up to 50% of light not making it through the lens, this light can be reflected internally and travels around the lens splashing the blacks. Larry explains that coating elements can correct a lot of this and careful choice of the internal surface of the lens mechanisms is also important in minimising such reflections.

Telephoto lenses are lenses which have variable zoom. Larry shows how Canon developed a lens so fully frame a 6 foot athlete from 400 metres away so that they were fully framed on a 2/3″ sensor, but still with a wide angle lens of 60 degrees. With such a long zoom, internal stabilisation is imperative which is done by a very quick active feedback sensor.

So far, Larry has talked about the TV’s standardised 2/3″ image sensor. He now moves on to cover motion format sizes. He shows that for Super 35, you only need 78 line pairs per millimetre which has the knock-on effect of allowing sharper pictures. Next Larry talks about the different versions of ‘full frame’ formats emphasising the creative benefits of larger formats. One is giving a larger field of view which Larry both demonstrates and explains, another is greater sharpness and by having a camera which can choose how much of the sensor you actually use, you can put all sorts of different lenses on. Depth of field is a well known benefit of larger frame formats. The depth of field is much lower which, creatively, is often much desired, though it should be noted that for entertainment shows in TV, that’s much less desirable whilst in films, this is an intrinsic part of the ‘grammar.

As the talk comes to a conclusion, Larry discusses debayering whereby a single sensor has to record red, green and blue. He explains the process and the disadvantages versus separate sensors in larger cameras. As part of this conversion, he shows how oversampling can improve sharpness and avoid aliasing. the talk finishes with an overview of solid storage options

Watch now!

Larry Thorpe Larry Thorpe
National Marketing Executive,
Canon USA Inc.

Video: ATSC 3.0 Seminar Part III

ATSC 3.0 is the US-developed set of transmission standards which is fully embracing IP technology both over the air and for internet-delivered content. This talk follows on from the previous two talks which looked at the physical and transmission layers. Here we’re seeing how IP throughout has benefits in terms of broadening choice and seamlessly moving from on-demand to live channels.

Richard Chernock is back as our Explainer in Chief for this session. He starts by explaining the driver for the all-IP adoption which focusses on the internet being the source of much media and data. The traditional ATSC 1.0 MPEG Transport Stream island worked well for digital broadcasting but has proven tricky to integrate, though not without some success if you consider HbbTV. Realistically, though, ATSC see that as a stepping stone to the inevitable use of IP everywhere and if we look at DVB-I from DVB Project, we see that the other side of the Atlantic also sees the advantages.

But seamlessly mixing together a broadcaster’s on-demand services with their linear channels is only benefit. Richard highlights multilingual markets where the two main languages can be transmitted (for the US, usually English and Spanish) but other languages can be made available via the internet. This is a win in both directions. With the lower popularity, the internet delivery costs are not overburdening and for the same reason they wouldn’t warrant being included on the main Tx.

Richard introduces ISO BMFF and MPEG DASH which are the foundational technologies for delivering video and audio over ATSC 3.0 and, to Richard’s point, any internet streaming services.

We get an overview of the protocol stack to see where they fit together. Richard explains both MPEG DASH and the ROUTE protocol which allows delivery of data using IP on uni-directional links based on FLUTE.

The use of MPEG DASH allows advertising to become more targeted for the broadcaster. Cable companies, Richard points out, have long been able to swap out an advert in a local area for another and increase their revenue. In recent years companies like Sky in the UK (now part of Comcast) have developed technologies like Adsmart which, even with MPEG TS satellite transmissions can receive internet-delivered targeted ads and play them over the top of the transmitted ads – even when the programme is replayed off disk. Any adopter of ATSC 3.0 can achieve the same which could be part of a business case to make the move.

Another part of the business case is that ATSC not only supports 4K, unlike ATSC 1.0, but also ‘better pixels’. ‘Better pixels’ has long been the way to remind people that TV isn’t just about resolution. ‘Better pixels’ includes ‘next generation audio’ (NGA), HDR, Wide Colour Gamut (WCG) and even higher frame rates. The choice of HEVC Main 10 Profile should allow all of these technologies to be used. Richard makes the point that if you balance the additional bitrate requirement against the likely impact to the viewers, UHD doesn’t make sense compared to, say, enabling HDR.

Richard moves his focus to audio next unpacking the term NGA talking about surround sound and object oriented sound. He notes that renderers are very advanced now and can analyse a room to deliver a surround sound experience without having to place speakers in the exact spot you would normally need. Options are important for sound, not just one 5.1 surround sound track is very important in terms of personalisation which isn’t just choosing language but also covers commentary, audio description etc. Richard says that audio could be delivered in a separate pipe (PLP – discussed previously) such that even after the
video has cut out due to bad reception, the audio continues.

The talk finishes looking at accessibility such as picture-in-picture signing, SMPTE Timed Text captions (IMSC1), security and the ATSC 3.0 standards stack.

Watch now!

Richard Chernock Richard Chernock
Former CSO,
Triveni Digital