Video: Scaling up Anime with Machine Learning and Smart Real Time Algorithms

Too long has video been dominated by natural scenes and compression has been about optimising for skin tones. Recently we have seen technologies taking care of displaying other types of video correctly like computer displays such as computer games, as seen in VVC and also animation optimisation for upscalers as we explore in this talk.

Anime, a Japanese genre of animation, is not very different from an objective point of video from most video cartoons; the drawing style is black lines on relatively simple, solid areas of colour. Anime itself is a clearly distinct genre whose fans are much more sensitive to quality, but for codecs and scalers, 2D animation, in general, is a style that easily shows artefacts.

Up- and down-scaling is the process of making an image of say 1080 pixels high and 1920 wide larger, for instance 2160×3840 or smaller, say to SD resolution. Achieving this without jagged edges or blurriness is difficult and conventional maths can do a decent job, but often leaves something to be desired. Christopher Kennedy from Crunchyroll explains the testing he’s done looking at a super resolution upscaling technique which uses machine learning to improve the quality of upscaled anime video.

Waifu2x is an opensource algorithm which uses Convolutional Neural Networks (CNNs) to scale images and remove artefacts. To start with, Christopher explains the background of traditional algorithmic upscaling discussing the fact that better-looking algorithms take longer so TVs often choose the fastest leading them to look pretty bad if fed SD video. Better for the streaming provider to spend the time doing an upconversion to 4K so allow the viewer a better final quality on their set.

Machine Learning needs a training set and one thing which has contributed to waifu2x’s success in Anime is that it has been trained only on examples of anime leaving it well practised in improving this type of image. Christopher presents the results of his tests comparing standard bilinear and bicubic scaling with waifu2x showing the VMAF, PSNR and SSIM scores.

Finishing off the video, Christopher talks about the time this waifu2x takes to run, the cost of running it in the cloud and he shares some of the command lines he used.

Reference links:

Watch now!
Speaker

Christopher Kennedy Christopher Kennedy
Staff Video Engineer,
Crunchyroll

Video: International IP Production Networks


Optical Transport Networking (OTN) is a telco-grade technology which simplifies the transport of high-bandwidth data such as uncompressed video. Taking the place of SDH and Ethernet, OTN is an ITU-created recommendation called G.709 which dates back to 2009. With OTN, transport and decoding of multiple signals are simplified with the ability to carry many different data types including SDH and Ethernet.

Telstra’s Steven Dargham joins the VSF’s summer sessions to explain why Telstra has created an international network for live broadcast production based on OTN and to discusses some case studies. Using SMPTE ST 2110-20 and -22, Telstra as seen that remote production can be done without so much equipment at the game.

Steven takes some time to outline the Latency-Bandwidth-Quality triangle where one of these will always suffer at the expense of another or both the others. Understanding this balance and compromise leads to understanding the choice of video codec to use such as TICO, VC-2, JPEG XS etc. Steve talks through a table showing the pros and cons of the codecs available to chose from.

The video ends with Steven talking us through case studies on moving Telco between Japan and UK, their work for the IAAF Athletics using these to explain why they are able to keep AWS ingo.

Watch now!
Speaker

Steven Dargham

Video: Low Latency Live Streaming At Scale

Low latency can be a differentiator for a live streaming service, or just a way to ensure you’re not beaten to the punch by social media or broadcast TV. Either way, it’s seen as increasingly important for live streaming to be punctual breaking from the past where latencies of thirty to sixty seconds were not uncommon. As the industry has matured and connectivity has enough capacity for video, simply getting motion on the screen isn’t enough anymore.

Steve Heffernan from MUX takes us through the thinking about how we can deliver low latency video both into the cloud and out to the viewers. He starts by talking about the use cases for sub-second latency – anything with interaction/conversations – and how that’s different from low-latency streaming which is one to many, potentially very large scale distribution. If you’re on a video call with ten people, then you need sub-second latency else the conversation will suffer. But distributing to thousands or millions of people, the sacrifice in potential rebuffering of operating sub-second, isn’t worth it, and usually 3 seconds is perfectly fine.

Steve talks through the low-latency delivery chain starting with the camera and encoder then looking at the contribution protocol. RTMP is still often the only option, but increasingly it’s possible to use WebRTC or SRT, the latter usually being the best for streaming contribution. Once the video has hit the streaming infrastructure, be that in the cloud or otherwise, it’s time to look at how to build the manifest and send the video out. Steve talks us through the options of Low-Latency HLS (LHLS) CMAF DASH and Apple’s LL-HLS. Do note that since the talk, Apple removed the requirement for HTTP/2 push.

The talk finishes off with Steve looking at the players. If you don’t get the players logic right, you can start off much farther behind than necessary. This is becoming less of a problem now as players are starting to ‘bend time’ by speeding up and slowing down to bring their latency within a certain target range. But this only underlines the importance of the quality of your player implementation.

Watch now!
Speaker

Steve Heffernan Steve Heffernan
Founder & Head of Product, MUX
Creator of video.js

Video: Migrating to IP – Top Questions from Broadcasters


Moving to IP can be difficult. For some, it’s about knowing where to even start. For others, it’s a matter of understanding some of the details which is the purpose of this talk from Leader US which looks at the top questions that Leader’s heard from its customer base:

  • How do we look at it?
  • How do we test it?
  • How is the data sent?
  • What is PTP?
  • How do we control it?
  • What is NMOS?
  • What are the standards involved?

These questions, and more, are covered in this webinar.

Steve Holmes from Lader Us details the IP relevant basics starting with the motivations: weight, cost, scale, density, and independent essences. We can then move on to the next questions covering RTP itself and how 2022-6 was built upon it. SMPTE ST 2022-6 splits up a regular SDI signal into sections and encapsulates them, uncompressed. This is one big difference from SMPTE ST 2110 where all essences are sent separately. For some, this is not a benefit, but for general broadcast workflows, it can sometimes be tricky getting them into alignment and some workflows are aimed at delivering an incoming bundle of PIDs so being able to separate them is a backward step.

With this groundwork laid, Steve explains how seamless redundancy works with SMPTE 2022-7 going on to then describe the difficulty of keeping jitter low and the importance of sender profiles in ST 2110. Steve finishes this section with a discussion of NMOS specifications such as IS-05 and IS-06. The session ends with a Q&A.

Watch now!
Speaker

Steve Holmes Steve Holmes
Freelance consultant