Video: JPEG XS Interoperability Activity Group Update


JPEG XS is a low-latency, light-compression codec often called a ‘mezzanine’ codec. Encoding within milliseconds, JPEG XS can compress full-bandwidth signals by 4x or more allowing scope for several generations of compression without significant degradation. The low-latency and resilience to de-generation make it ideal for enabling remote production.

John Dale from Media Links joins us to look at what’s being done within the Video Services Forum (VSF) to ensure interoperability. As a new standard, JPEG XS is yet to be or is still being implemented in many companies’ products. Therefore this is the perfect time to be looking at how to standardise interconnects,

Running JPEG XS over MPEG TS is one approach which is being written up in ‘VSF TR-07’ (Technical Reference 7) which will be imminently completed. It defines capabilities for 2K, 4K and 8K video with and without HDR. They have split the video formats into capability sets meaning that a vendor can comply with the specification by stating which subset(s) it can cope with. All formats up to 1080p60 are under capability set ‘A’ with ‘B’ covering UHD resolutions. After this work, they will look at JPEG XS over ST 2110-22 instead of MPEG TS. This is yet to start and will share much of the work from previous work.

Watch now!
Speaker

John Dale John Dale
Company Director and CMO,
Media Links.

Video: AV1 Real-Time Screen Content Coding

We saw in this week’s AV1 panel, AV1 encoding times have dropped into a practical range and it’s starting to gain traction. One of the key differentiators of the codec, along only with VVC is the inclusion by default of tools aimed at encoding screens and computer graphics rather than natural video.

Zoe Liu, CEO of Visionular talks at RTE2020 about these special abilities of AV1 to encode screen content. The video starts with a refresher on AV1 in general, it’s arrival on the scene from the Alliance of Open Media and the en/decoder ecosystem around it such as SVT-AV1 we talked about two days ago, dav1d, rav1e etc. as well as a look at the hardware encoders being readied from the likes of Samsung.

Turning her focus to screen content, Zoe explains that screen content is different for a number of reasons. For content like this presentation, much of the video stays static a lot of the time, then there is a peak as the slide changes. This gives rise to the idea of allowing for variable frame rates but also optimising for the depth of the colour palette. Motion on screens can be smoother and also has more distinct patterns in the form of identical letters. This seems to paint a very specific picture of what screen content is, when we all know that it’s very variable and usually has mixed uses. However, having tools to capture these situations as they arise is critical for the times when it matters and it’s these coding tools that Zoe highlights now.

One common technique is to partition the screen into variable-sized blocks and AV1 brings more partition shapes than in HEVC. Motion compensation has been the mainstay of MPEG encoding for a long time. AV1 also uses motion compensation and for the first time brings in motion vectors which allow for rotation and zooming. Zoe explains the different modes available including compound motion modes of which there are 128.

Capitalising on the repetitive nature screen content can have, Intra Block Copy (IntraBC) is a technique used to copy part of a frame to other parts of the frame. Similar to motion vectors which point to other frames, this helps replication within the frame. This is used as part of the prediction and therefore can be modified before the decode is finished allowing for small variations. Palette Mode CFL (Chrome from Luma) is a predictor for colour based on the luma signal and some signalling from the encoder.

Zoe highlights to areas where screen content reacts badly to encoding tools normally beneficial such as temporal filtering which is usually associated with 8% gains in efficiency at the encoder, but this can make motion vectors much more complicated in screen content and hurt compression efficiency. Similarly, when partitioning screen content lower sizes often work well for natural video, but the opposite is true for screen content.

The talk finishes with Zoe explaining how Visionular’s own AV1 implementation performed on standardised 4K against other implementations, their implementation of scalable video coding for RTC and the overall compression improvements.

Zoe Liu also contributed to this more detailed overview

Watch now!
Speaker

Zoe Liu Zoe Liu
CEO,
Visionular

Video: Line by Line Processing of Video on IT Hardware

If the tyranny of frame buffers is let to continue, line-latency I/O is rendered impossible without increasing frame-rate to 60fps or, preferably, beyond. In SDI, hardware was able to process video line-by-line. Now, with uncompressed SDI, is the same possible with IT hardware?

Kieran Kunhya from Open Broadcast Systems explains how he has been able to develop line-latency video I/O with SMPTE 2110, how he’s coupled that with low-latency AVC and HEVC encoding and the challenges his company has had to overcome.

The commercial drivers are fairly well known for reducing the latency. Firstly, for standard 1080i50, typically treated as 25fps, if you have a single frame buffer, you are treated to a 40ms delay. If you need multiple buffers for a workflow, this soon stacks up so whatever the latency of your codec – uncompressed or JPEG XS, for example – the latency will be far above it. In today’s covid world, companies are looking for cutting the latency so people can work remotely. This has only intensified the interest that was already there for the purposes of remote production (REMIs) in having low-latency feeds. In the Covid world, low latency allows full engagement in conversations which is vital for news anchors to conduct interviews as well as they would in person.

IP, itself, has come into its own during recent times where there has been no-one around to move an SDI cable, being able to log in and scale up, or down, SMPTE ST 2110 infrastructure remotely is a major benefit. IT equipment has been shown to be fairly resilient to supply chain disruption during the pandemic, says Kieran, due to the industry being larger and being used to scaling up.

Kieran’s approach to receiving ST 2110 deals in chunks of 5 to 10 lines. This gives you time to process the last few lines whilst you are waiting for the next to arrive. This processing can be de-encapsulation, processing the pixel values to translate to another format or to modify the values to key on graphics.

As the world is focussed on delivering in and out of unusual and residential places, low-bitrate is the name of the game. So Kieran looks at low-latency HEVC/AVC encoding as part of an example workflow which takes in ST 2110 video at the broadcaster and encodes to MPEG to deliver to the home. In the home, the video is likely to be decoded natively on a computer, but Kieran shows an SDI card which can be used to deliver in traditional baseband if necessary.

Kieran talks about the dos and don’ts of encoding and decoding with AVC and HEVC with low latency targetting an end-to-end budget of 100ms. The name of the game is to avoid waiting for whole frames, so refreshing the screen with I-frame information in small slices, is one way of keeping the decoder supplied with fresh information without having to take the full-frame hit of 40ms (for 1080i50). Audio is best sent uncompressed to ensure its latency is lower than that of the video.

Decoding requires carefully handling the slice boundaries, ensuring deblocking i used so there are no artefacts seen. Compressed video is often not PTP locked which does mean that delivery into most ST 2110 infrastructures requires frame synchronising and resampling audio.

Kieran foresees increasing use of 2110 to MPEG Transport Stream back to 2110 workflows during the pandemic and finishes by discussing the tradeoffs in delivering during Covid.

Watch now!
Speaker

Kieran Kunhya Kieran Kunhya
CEO & Founder, Open Broadcast Systems

Video: Super Resolution: What’s the buzz and why does it matter?

“Enhance!” the captain shouts as the blurry image on the main screen becomes sharp and crisp again. This was sci-fi – and this still is sci-fi – but super-resolution techniques are showing that it’s really not that far-fetched. Able to increase the sharpness of video, machine learning can enable upscaling from HD to UHD as well as increasing the frame rate.

Bitmovin’s Adithyan Ilangovan is here to explain the success they’ve seen with super-resolution and though he concentrates on upscaling, this is just as relevant to improving downscaling. Here are our previous articles covering super resolution.

Adithyan outlines two main enablers of super-resolution, allowing it to displace the traditional methods such as bicubic and Lanczos. Enabler one is the advent of machine learning which now has a good foundation of libraries and documentation for coders allowing it to be fairly accessible to a wide audience. Furthermore, the proliferation of GPUs and, particularly for mobile devices, neural engines is a big help. Using the GPUs inside CPUs or in desktop PCI slots allows the analysis to be done locally without transferring great amounts of video to the cloud solely for the purpose of processing or identification. Furthermore, if your workflow is in the cloud, it’s now easy to rent GPUS and FPGAs to handle such workloads.

Using machine learning doesn’t only allow for better upscaling on a frame-by-frame basis, but we are also able to allow it to form a view of the whole file, or at least the whole scene. With a better understanding of the type of video it’s analysing (cartoon, sports, computer screen etc.) it can tune the upscaling algorithm to deal with this optimally.

Anime has seen a lot of tuning for super-resolution. Due to Anime’s long history, there are a lot of old cartoons which are both noisy and low resolution which are still enjoyed now but would benefit from more resolution to match the screens we now routinely used.

Adithyan finishes by asking how we should best take advantage of super-resolution. Codecs such as LCEVC use it directly within the codec itself, but for systems that have pre and post-processing before the encoder, Adithyan suggests it’s viable to consider reducing the bitrate to reduce the CDN costs knowing the using super-resolution on the decoder, the video quality can actually be maintained.

The video ends with a Q&A.

Watch now!
Download the slides
Speaker

Adithyan Ilangovan Adithyan Ilangovan
Encoding Engineer,
Bitmovin