“Enhance!” the captain shouts as the blurry image on the main screen becomes sharp and crisp again. This was sci-fi – and this still is sci-fi – but super-resolution techniques are showing that it’s really not that far-fetched. Able to increase the sharpness of video, machine learning can enable upscaling from HD to UHD as well as increasing the frame-rate.
Adithyan outlines two main enablers of super resolution, allowing it to displace the traditional methods such as bicubic and Lanczos. Enabler one is the advent of machine learning which now has a good foundation of libraries and documentation for coders allowing it to be fairly accessible to a wide audience. Furthermore, the proliferation of GPUs and, particularly for mobile devices, neural engines is a big help. Using the GPUs inside CPUs or in desktop PCI slots allows the analysis to be done locally without transferring great amounts of video to the cloud solely for the purpose of processing or identification. Furthermore, if your workflow is in the cloud, it’s now easy to rent GPUS and FPGAs to handle such workloads.
Using machine learning doesn’t only allow for better upscaling on a frame-by-frame basis, but we are also able to allow it to form a view of the whole file, or at least whole scene. With abetter understanding of the type of video it’s analysing (cartoon, sports, computer screen etc.) it can tune the upscaling algorithm to deal with this optimally.
Anime has seen a lot of tuning for super resolution. Due to Anime’s long history, there are a lot of old cartoons which are both noisy and low resolution which are still enjoyed now but would benefit from more resolution to match the screens we now routinely used.
Adithyan finishes by asking how we should best take advantage of super resolution. Codecs such as LCEVC use it directly within the codec itself, but for systems which have pre and post-processing before the encoder, Adithyan suggests it’s viable to consider reducing the bitrate to reduce the CDN costs knowing the using super-resolution on the decoder, the video quality can actually be maintained.
Too long has video been dominated by natural scenes and compression has been about optimising for skin tones. Recently we have seen technologies taking care of displaying other types of video correctly like computer displays such as computer games, as seen in VVC and also animation optimisation for upscalers as we explore in this talk.
Anime, a Japanese genre of animation, is not very different from an objective point of video from most video cartoons; the drawing style is black lines on relatively simple, solid areas of colour. Anime itself is a clearly distinct genre whose fans are much more sensitive to quality, but for codecs and scalers, 2D animation, in general, is a style that easily shows artefacts.
Up- and down-scaling is the process of making an image of say 1080 pixels high and 1920 wide larger, for instance 2160×3840 or smaller, say to SD resolution. Achieving this without jagged edges or blurriness is difficult and conventional maths can do a decent job, but often leaves something to be desired. Christopher Kennedy from Crunchyroll explains the testing he’s done looking at a super resolution upscaling technique which uses machine learning to improve the quality of upscaled anime video.
Waifu2x is an opensource algorithm which uses Convolutional Neural Networks (CNNs) to scale images and remove artefacts. To start with, Christopher explains the background of traditional algorithmic upscaling discussing the fact that better-looking algorithms take longer so TVs often choose the fastest leading them to look pretty bad if fed SD video. Better for the streaming provider to spend the time doing an upconversion to 4K so allow the viewer a better final quality on their set.
Machine Learning needs a training set and one thing which has contributed to waifu2x’s success in Anime is that it has been trained only on examples of anime leaving it well practised in improving this type of image. Christopher presents the results of his tests comparing standard bilinear and bicubic scaling with waifu2x showing the VMAF, PSNR and SSIM scores.
Finishing off the video, Christopher talks about the time this waifu2x takes to run, the cost of running it in the cloud and he shares some of the command lines he used.
If we ever had a time when most displays were the same resolution, those days are long gone with smartphone and tablets with extremely high pixel density nestled in with laptop screens of various resolutions and 1080-line TVs which are gradually being replaced with UHD variants. This means that HD videos are nearly always being upscaled which makes ‘getting upscaling right’ a really worthwhile topic. The well-known basic up/downscaling algorithms have been around for a while, and even the best-performing Lanczos is well over 20 years old. The ‘new kid on the block’ isn’t another algorithm, it’s a whole technique of inferring better upscaling using machine learning called ‘super resolution’.
Nick Chadwick from Mux has been running the code and the numbers to see how well super resolution works. Taking to the stage at Demuxed SF, he starts by looking at where scaling is used and what type it is. The most common algorithms are nearest neighbour, bi-cubic, bi-linear and lanczos with nearest neighbour being the most basic and least-well performing. Nick shows, using VMAF that using these for up and downscaling, that the traditional opinions of how well these algorithms perform are valid. He then introduces some test videos which are designed to let you see whether your video path is using bi-linear or bi-cubic upscaling, presenting his results of when bi-cubic can be seen (Safari on a MacBook Pro) as opposed to bi-linear (Chrome on a MacBook Pro). The test videos are available here.
In the next part of the talk, Nick digs a little deeper into how super resolution works and how he tested ffmpeg’s implementation of super resolution. Though he hit some difficulties in using this young filter, he is able to present some videos and shows that they are, indeed, “better to view” meaning that the text looks sharper and is easier to see with details being more easy pick out. It’s certainly possible to see some extra speckling introduced by the process, but VMAF score is around 10 points higher matching with the subjective experience.
The downsides are a very significant increase in computational power needed which limits its use in live applications plus there is a need for good, if not very good, understanding of ML concepts and coding. And, of course, it wouldn’t be the online streaming community if clients weren’t already being developed to do super-resolution on the decode despite most devices not being practically capable of it. So Nick finishes off his talk discussing what’s in progress and papers relating to the implementation of super resolution and what it can borrow from other developing technologies.
Views and opinions expressed on this website are those of the author(s) and do not necessarily reflect those of SMPTE or SMPTE Members.
This website is presented for informational purposes only. Any reference to specific companies, products or services does not represent promotion, recommendation, or endorsement by SMPTE