Video: Benjamin Bross and Adam Wieckowski on Fraunhofer HHI, VVC, and Compression

VVC was finalised in mid-2020 after five years of work. AVC’s still going strong and is on its 26th version, so it’s clear there’s still plenty of work ahead for those involved in VVC. Heavily involved in AVC, HEVC and now VVC is the Fraunhofer Heinrich Hertz Institute (HHI) who are patent holders in all three and for VVC they are, for the first time, developing a free, open-source encoder and decoder for the standard.

In this video from OTTVerse.com, Editor Krishna Rao speaks to Benjamin Bross and Adam Więckowsk both from Fraunhofer HHI. Benjamin has previously been featured on The Broadcast Knowledge talking at Mile High Video about VVC which would be a great video to check out if you’re not familiar with this new codec given before its release.

They start by discussing how the institute is supported by the German government, money received from its patents and similar work as well as the companies who they carry out research for. One benefit of government involvement is that all the papers they produce are made free to access. Their funding model allows them the ability to research problems very deeply which has a number of benefits. Benjamin points out that their research into CABAC which is a very efficient, but complex entropy encoding technique. In fact, at the time they supported introducing it into AVC, which remember is 19 years old, it was very hard to find equipment that would use it and certainly no computers would. Fast forward to today and phones, computers and pretty much all encoders are able to take advantage of this technique to keep bitrates down so that ability to look ahead is beneficial now. Secondly, giving an example in VVC, Benjamin explains they looked at using machine learning to help optimise one of the tools. This was shown to be too difficult to implement but could be replaced by matrix multiplication which and was implemented this way. This matrix multiplication, he emphasises, wouldn’t have been able to be developed without having gone into the depths of this complex machine learning.

Krishna suggests there must be a lot of ‘push back’ from chip manufacturers, which Benjamin acknowledges though, he says people are just doing their jobs. It’s vitally important, he continues, for chip manufacturers to keep chip costs down or nothing would actually end up in real products. Whilst he says discussions can get quite heated, the point of the international standardisation process is to get the input at the beginning from all the industries so that the outcome is an efficient, implementable standard. Only by achieving that does everyone benefit for years to come.e

The conversation then moves on to the open source initiative developing VVenC and VVdeC. These are separate from the reference implementation VTM although the reference software has been used as the base for development. Adam and Benjamin explain that the idea of creating these free implementations is to create a standard software which any company can take to use in their own product. Reference implementations are not optimised for speed, unlike VVenC and VVdeC. Fraunhofer is expecting people to take this software and adapt it for, say 360-degree video, to suit their product. This is similar to x264 and x265 which are open source implementations of AVC and HEVC. Public participation is welcomed and has already been seen within the Github project.

Adam talks through a slide showing how newer versions of VVenC have increased speed and bitrate with more versions on their way. They talk about how some VVC features can’t really be seen from normal RD plots giving the example of open vs closed GOP encoding. Open GOP encoding can’t be used for ABR streaming, but with VVC that’s now a possibility and whilst it’s early days for anyone having put the new type of keyframes through their paces which enable this function, they expect to start seeing good results.

The conversation then moves on to encoding complexity and the potential to use video pre-processing to help the encoder. Benjamin points out that whilst there is an encode increase to get to the latest low bitrates, to get to the best HEVC can achieve, the encoding is actually quicker. Looking to the future, he says that some encoding tools scale linearly and some exponentially. He hopes to use machine learning to understand the video and help narrow down the ‘search space’ for certain tools as it’s the search space that is growing exponentially. If you can narrow that search significantly, using these techniques becomes practical. Lastly, they say the hope is to get VVenC and VVdeC into FFmpeg at which point a whole suite of powerful pre- and post- filters become available to everyone.

Watch now!
Full transcript of the video
Speakers

Benjamin Bross Benjamin Bross
Head of Video Coding Systems Group,
Fraunhofer Heinrich Hertz Institute (HHI)
Adam Więckowski Adam Więckowski
Research Assistant
Fraunhofer HHI
Krishna Rao Vijayanagar Moderator: Krishna Rao Vijayanagar
Editor,
OTTVerse.com

Video: Machine Learning for Per-title Encoding

AI’s continues its march into streaming with this new approach to optimising encoder settings to keep down the bitrate and improve quality for viewers. By its more appropriate name, ‘machine learning’, computers learn how to characterise video to avoid hundreds of encodes whilst determining the best way to encode video assets.

Daniel Silhavy from Fraunhofer FOKUS takes the stand at Mile High Video 2020 to detail the latest technique in per-title and per-scene encoding. Daniel starts by outlining the problem with fixed ABR which is that efficiencies are gained by being flexible both with resolution and with bitrate.

Netflix were the best-known pioneers of the per-title encoding idea where, for each different video asset, many, many encodes are done to determine the best overall bitrate to choose. This is great because it will provide for animation-based files to be treated differently than action films or sports. Efficiency is gained.

However, per-title delivers an average benefit. There are still parts of the video which are simple and could see reduced bitrate and arts where complexity isn’t accounted for. When bitrate is higher than necessary to achieve a certain VMAF score, Danel calls this ‘wasted quality’. This means bitrate was used making the quality better than we needed it to be. Whilst better quality sounds like a boon, it’s not always possible for it to be seen, hence having a target VMAF at a lower level.

Naturally, rather than varying the resolution mix and bitrate for each file, it would be better to do it for each scene. Working this way, variations in complexity can be quickly accounted for. This can also be done without machine learning, but more encodes are needed. The rest of the talk looks at using machine learning to take a short-cut through some of that complexity.

The standard workflow is to perform a complexity analysis on the video, working out a VMAF score at various bitrate and resolution combinations. This produces a ‘Convex hull estimation’ allowing determination of the best parameters which then feed in to the production encoding stage.

Machine learning can replace the section which predicts the best bitrate-resolution pairs. Fed with some details on complexity, it can avoid multiple encodes and deliver a list of parameters to the encoding stage. Moreover, it can also receive feedback from the player allowing further optimisation of this prediction module.

Daniel shows a demo of this working where we see that the end result has fewer rungs on the ABS ladder, a lower-resolution top rung and fewer resolutions in general, some repeated at different bitrates. This is in common with the findings of Facebook which we covered last week who found that if they removed their ‘one bitrate per resolution rule’ they could improve viewers’ experience. In total, for an example Fraunhofer received from a customer, they saw a 53% reduction in storage needed.

Watch now!
Download the slides
Speakers

Daniel Silhavy
Scientist & Project Manager,
Fraunhofer FOKUS

On-Demand Webinar: How to Prove Value with AI and Machine Learning

This webinar is now available online.

We’ve seen AI entering our lives in many ways over the past few years and we know that this will continue. Artificial Intelligence and Machine Learning are techniques that are so widely applicable they will touch all aspects of our lives before too many more years have passed. So it’s natural for us to look at the broadcast industry and ask “How will AI help us?” We’ve already seen machine learning entering into codecs and video processing showing that up/downscaling can be done better by machine learning than with the traditional ‘static’ algorithms such as bicubic, lanczos and nearest neighbour. This webinar examines the other side of things; how can we use the data available within our supply chains and from our viewers to drive efficiencies and opportunities for better monetisation?

There isn’t a strong consensus on the difference between AI and Machine learning. One is that that Artificial Intelligence is a more broad term of smart computing. Others say that AI has a more real-time feedback mechanism compared to Machine Learning (ML). ML is the process of giving a large set of data to a computer and giving it some basic abilities so that it can learn for itself. A great example of this is the AI network monitoring services available that look at all the traffic flowing through your organisation and learn how people use it. It can then look for unusual activity and alert you. To do this without fixed thresholds (which for network use really wouldn’t work) is really not feasible for humans, but computers are up to that task.

For conversations such as this, it usually doesn’t matter how the computer achieves it, AI, ML or otherwise. The points how can you simplify content production? How can you get better insights into the data you have? How can you speed up manual tasks?

David Short from IET Media moderates this session with Steve Callanan who’s company WIREWAX is working to revolutionise video creation, asset management and interactive video services joined by Hanna Lukashevich from Fraunhofer IDMT (Institute for Digital Media Technology) who uses machine learning to understand and create music and sound. Grant Franklin Totten completes the panel with his experience at Al Jazeera who have been working on using AI in broadcast since 2018 as a way to help maintain editorial and creative compliance as well as detecting fake news and bias checking.

Watch now!
Speakers

David Short Moderator: David Short
Vice Chair,
IET Media Technical Network
Steve Callanan Steve Callanan
Founder,
WIREWAX
Hanna Lukashevich Hanna Lukashevich
Head of Semantic Music Technologies,
Fraunhofer IDMT
Grant Franklin Totten Grant Franklin Totten
Head of Media & Emerging Platforms,
Al Jazeera Media Network

Video: Low Latency Streaming

There are two phases to reducing streaming latency. One is to optimise the system you already have, the other is to move to a new protocol. This talk looks at both approaches achieving parity with traditional broadcast media through optimisation and ‘better than’ by using CMAF.

In this video from the Northern Waves 2019 conference, Koen van Benschop from Deutsche Telekom examines the large and low-cost latency savings you can achieve by optimising your current HLS delivery. With the original chunk sizes recommended by Apple being 10 seconds, there are still many services out there which are starting from a very high latency so there are savings to be had.

Koen explains how the total latency is made up by looking at the decode, encode, packaging and other latencies. We quickly see that the player buffer is one of the largest, the second being the encode latency. We explore the pros and cons of reducing these and see that the overall latency can fall to or even below traditional broadcast latency depending, of course, on which type (and which country’s) you are comparing it too.

While optimising HLS/DASH gets you down to a few seconds, there’s a strong desire for some services to beat that. Whilst the broadcasters themselves may be reticent to do this, not wanting to deliver online services quicker than their over-the-air offerings, online sports services such as DAZN can make latency a USP and deliver better value to fans. After all, DAZN and similar services benefit from low-second latency as it helps bring them in line with social media which can have very low latency when it comes to key events such as goals and points being scored in live matches.

Stefan Arbanowski from Fraunhofer leads us through CMAF covering what it is, the upcoming second edition and how it works. He covers its ability to use .m3u8 (from HLS) and .mpd (from DASH) playlist/manifest files and that it works both with fMP4 and ISO BMFF. One benefit from DASH is it’s Common Encryption standard. Using this it can work with PlayReady DRM, Fairplay and others.

Stefan then takes a moment to consider WebRTC. Given it proposes latency of less than one second, it can sound like a much better idea. Stefan outlines concerns he has about the ability to scale above 200,000 users. He then turns his attention back to CMAF and outlines how the stream is composed and how the player logic works in order to successfully play at low latency.

Watch now!
Speakers

Koen van Benschop Koen van Benschop
Senior Manager TV Headend and DRM,
Deutsche Telekom
Stefan Arbanowski Stefan Arbanowski
Director Future Applications and Media,
Fraunhofer FOKUS