Video: Reliable and Uncompressed Video on AWS

Uncompressed video in the cloud is an answer to the dreams that many people are yet to have, but the early adopters of cloud workflows, those that are really embedding the cloud into their production and playout efforts are already asking for it. AWS have developed a way of delivering this between computers within their infrastructure and have invited a vendor to explain how they are able to get this high-bandwidth content in and out.

On The Broadcast Knowledge we don’t normally feature such vendor-specific talks, but AWS is usually the sole exception to the rule as what’s done in AWS is typically highly informative to many other cloud providers. In this case, AWS is first to the market with an in-prem, high-bitrate video transfer technology which is in itself highly interesting.

LTN’s Alan Young is first to speak, telling us about the traditional broadcast workflows of broadcasters giving the example of a stadium working into the broadcaster’s building which then sends out the transmission feeds by satellite or dedicated links to the transmission and streaming systems which are often located elsewhere. LTN feel this robs the broadcaster of flexibility and cost savings from lower-cost internet links. The hybrid that he sees working in medium-term is feeding the cloud directly from the broadcaster. This allows production workflows to take place in the cloud. After this has happened, the video can either come back to the broadcaster before on-pass to transmission or go directly to one or more of the transmission systems. Alan’s view is the interconnecting network between the broadcaster and the cloud needs to be reliable, high quality, low-latency and able to handle any bandwidth of signal – even uncompressed.

Once in the cloud, AWS Cloud Digital Interface (CDI) is what allows video to travel reliably from one computer to another. Andy Kane explains what the drivers were to create this product. With the mantra that ‘gigabits are the new megabits’, they looked at how they could move high-bandwidth signals around AWS reliably with the aim of abstracting the difficulty of infrastructure away from the workflow. The driver for uncompressed in the cloud is reducing re-encoding stages since each of them hits latency hard and, for professional workflows, we’re trying to keep latency as close to zero as possible. By creating a default interface, the hope is that inter-vendor working through a CDI interface will help interoperability. LTN estimate their network latency to be around 200ms which is already a fifth of a second, so much more latency on top of that is going to creep up to a second quite easily.

David Griggs explains some of the technical detail of CDI. For instance, it has the ability to send data of any format be that raw packetised video, audio, ancillary data or compressed data using UDP, multicast between EC2 instances within a placement group. With a target latency of less than one frame, it’s been tested up to UHD 60fps and is based on the Elastic Fabric Adapter which is a free option for EC2 instances and uses kernel bypass techniques to speed up and better control network transfers. CPU use scales linearly so where 1080p60 takes 20% of a CPU, UHD would take 80%. Each stream is expected to have its own CPU.

The video ends with Alan looking at the future where all broadcast functionality can be done in the cloud. For him, it’s an all-virtual future powered by the increasingly accessible high-bandwidth internet connectivity coming in a less than the cost of bespoke, direct links. David Griggs adds that this is changing the financing model moving from a continuing effort to maximise utilisation of purchased assets, to a pay as you go model using just the tools you need for each production.

Watch now!
Download the slides
Please note, if you follow the direct link the video featured in this article is the seventh on the linked page.

Speakers

David Griggs David Griggs
Senior Product Manager,
AWS
Andy Kane Andy Kane
Principal Business Development Manager,
AWS
Alan Young Alan Young
CTO and Head of Strategy,
LTN Global

Video: IP-based Networks for UHD and HDR Video

If you get given a video signal, would you know what type it was? Life used to be simple, an SD signal would decode in a waveform monitor and you’d see which type it was. Now, with UHD and HDR, this isn’t all the information you need. Arguably this gets easier with IP and is possibly one of the few things that does. This video from AIMS helps to clear up why IP’s the best choice for UHD and HDR.

John Mailhot from Imagine Communications joins Wes Simpson from LearnIPVideo.com to introduce us to the difficulties wrangling with UHD and HDR video. Reflecting on the continued improvement of in-home displays’ ability to show brighter and better pictures as well as the broadcast cameras’ ability to capture much more dynamic range, John’s work at Imagine is focussed on helping broadcasters ensure their infrastructure can enable these high dynamic range experiences. Streaming services have a slightly easier time delivering HDR to end-users as they are in complete control of the distribution chain whereas often in broadcast, particularly with affiliates, there are many points in the chain which need to be HDR/UHD capable.

John starts by looking at how UHD was implemented in the early stages. UHD, being twice the horizontal and twice the vertical resolution of HD is usually seen as 4xHD, but, importantly, John points out that this is true for resolution but, as most HD is 1080i, it also represents a move to 1080p, 3Gbps signals. John’s point is that this is a strain on the infrastructure which was not necessarily tested for initially. Given the UHD signal, initially, was carried by four cables, there is now 4 times the chance of a signal impairment due to cabling.

Square Division Multiplexing (SQD) is the ‘most obvious’ way to carry UHD signals with existing HD infrastructure. The picture is simply cut into four quarters and each quarter is sent down one cable. The benefit here is that it’s easy to see which order the cables need to be connected to the equipment. The downsides included a frame-buffer delay (half a frame) each time the signal was received, difficulties preventing drift of quadrants if they were treated differently by the infrastructure (i.e. there was a non-synced hand-off). One important problem is that there is no way to know an HD feed is from a UHD set or just a lone 3G signal.

2SI, two-sample interleave, was another method of splitting up the signal which was standardised by SMPTE. This worked by taking a pair of samples and sending them down cable 1, then the next pair down cable 2, the pair of samples under the first pair went down cable 3 then the pair under 2 went down 4. This had the happy benefit that each cable held a complete picture, albeit very crudely downsampled. However, for monitoring applications, this is a benefit as you can DA one feed and send this to a monitor. Well, that would have been possible except for the problem that each signal had to maintain 400ns timing with the others which meant DAs often broke the timing budget if they reclocked. It did, however, remove the half-field latency burden which SQD carries. The main confounding factor in this mechanism is that looking at the video from any cable on a monitor isn’t enough to understand which of the four feeds you are looking at. Mis-cabling equipment leads to subtle visual errors which are hard to spot and hard to correct.

Enter the VPID, the Video Payload ID. SD SDI didn’t require this, HD often had it, but for UHD it became essential. SMPTE ST 425-5:2019 is the latest document explaining payload ID for UHD. As it’s version five, you should be aware that older equipment may not parse the information in the correct way a) as a bug and b) due to using an old standard. The VPID carries information such as interlaced/progressive, aspect ratio, transfer characteristics (HLG, SDR etc.), frame rate etc. John talks through some of the common mismatches in interpretation and implementation of VPID.

12G is the obvious baseband solution to the four-wires problem of UHD. Nowadays the cost of a 12G transceiver is only slightly more than 3G ones, therefore 12G is a very reasonable solution for many. It does require careful cabling to ensure the cable is in good condition and not too long. For OB trucks and small projects, 12G can work well. For larger installations, optical connections are needed, one signal per fibre.

The move to IP initially went to SMPTE ST 2022-6, which is a mapping of SDI onto IP. This meant it was still quite restrictive as we were still living within the SDI-described world. 12G was difficult to do. Getting four IP streams correctly aligned, and all switched on time, was also impractical. For UHD, therefore SMPTE ST 2110 is the natural home. 2110 can support 32K, so UHD fits in well. ST 2110-22 allows use of JPEG XS so if the 9-11Gbps bitrate of UHDp50/60 is too much it can be squeezed down to 1.5Gbps with almost no latency. Being carried as a single video flow removes any switch timing problems and as 2110 doesn’t use VPID, there is much more flexibility to fully describe the signal allowing future growth. We don’t know what’s to come, but if it’s different shapes of video rater, new colour spaces or extensions needed for IPMX, these are possible.

John finishes his conversation with Wes mentioning two big benefits of moving to IT-based infrastructure. One is the ability to use the free Wireshark or EBU List tools to analyse video. Whilst there are still good reasons to buy test equipment, the fact that many checks can be done without expensive equipment like waveform monitors is good news. The second big benefit is that whilst these standards were being made, the available network infrastructure has moved from 25 to 100 to 400Gbps links with 800Gbps coming in the next year or two. None of these changes has required any change in the standards, unlike with SDI where improvements in signal required improvements in baseband. Rather, the industry is able to take advantage of this new infrastructure with no effort on our part to develop it or modify the standards.

Watch now!
Speakers

John Mailhot John Mailhot
Systems Architect, IP Convergence,
Imagine Communications
Wes Simpson Wes Simpson
RIST AG Co-Chair, VSF
President & Founder, LearnIPvideo.com

Video: Low-latency DASH Streaming Using Open Source Tools

Low Latency Dash also known as LL-DASH is a modification of MPEG DASH to allow it to operate with close to two seconds’ latency bringing it down to meet, or beat, standard broadcast signals.

Brightcove’s Bo Zhang starts by outlining the aims and methods of getting there. For instance, he explains, the HTTP 1.1 Chunked Transfer element is key to low-latency streaming as it allows the server to start sending a video segment as its being written, not waiting until the file is complete. LL-DASH also has the ability to state an availability window (‘availabilityTimeOffset’).

As LL-MPEG DASH is a living standard, there are updates on the way: Resync points will allow a player to receive a list of places where it can join a stream using SAP types in the ISO-BMFF spec, the server can send a ‘service description’ to the player which can use the information to adjust its latency. Event messages can now be inserted in the middle of segments.

Bo then moves on to explain that he and the team have set up and experiment to gain experience with LL-DASH and test overall latency. He shows that they decided to stream RTMP out of OBS, into a github project called ‘node-gpac-dash’ then to the dash.js player all. between Boston and Seattle. This test runs at 800×600, 30fps with a bitrate of 2.5Mbps and shows results of between 2.5 and 5 seconds depending on the network conditions.

As Bo moves towards the Q&A, he says that low-latency streaming is less scalable because a TCP connection needs to be kept open between the player and the CDN which is a burden.
Another compromise is that smaller chunk sizes in LL-DASH give reduced latency but IO increases meaning sometimes you may have to increase the chunk sizes (and hence latency of the stream) to allow for better performance. He also adds that adverts are more difficult with low-latency streams due to the short amount of time to request and receive the advertising.

Watch now!</a
More detail about the experiments in this talk can be found in Bo’s
blog post.
Speakers

Bo Zhang Bo Zhang
Staff Video System Engineer, Research
Brightcove

Video: AV1 Real-Time Screen Content Coding

We saw in this week’s AV1 panel, AV1 encoding times have dropped into a practical range and it’s starting to gain traction. One of the key differentiators of the codec, along only with VVC is the inclusion by default of tools aimed at encoding screens and computer graphics rather than natural video.

Zoe Liu, CEO of Visionular talks at RTE2020 about these special abilities of AV1 to encode screen content. The video starts with a refresher on AV1 in general, it’s arrival on the scene from the Alliance of Open Media and the en/decoder ecosystem around it such as SVT-AV1 we talked about two days ago, dav1d, rav1e etc. as well as a look at the hardware encoders being readied from the likes of Samsung.

Turning her focus to screen content, Zoe explains that screen content is different for a number of reasons. For content like this presentation, much of the video stays static a lot of the time, then there is a peak as the slide changes. This gives rise to the idea of allowing for variable frame rates but also optimising for the depth of the colour palette. Motion on screens can be smoother and also has more distinct patterns in the form of identical letters. This seems to paint a very specific picture of what screen content is, when we all know that it’s very variable and usually has mixed uses. However, having tools to capture these situations as they arise is critical for the times when it matters and it’s these coding tools that Zoe highlights now.

One common technique is to partition the screen into variable-sized blocks and AV1 brings more partition shapes than in HEVC. Motion compensation has been the mainstay of MPEG encoding for a long time. AV1 also uses motion compensation and for the first time brings in motion vectors which allow for rotation and zooming. Zoe explains the different modes available including compound motion modes of which there are 128.

Capitalising on the repetitive nature screen content can have, Intra Block Copy (IntraBC) is a technique used to copy part of a frame to other parts of the frame. Similar to motion vectors which point to other frames, this helps replication within the frame. This is used as part of the prediction and therefore can be modified before the decode is finished allowing for small variations. Palette Mode CFL (Chrome from Luma) is a predictor for colour based on the luma signal and some signalling from the encoder.

Zoe highlights to areas where screen content reacts badly to encoding tools normally beneficial such as temporal filtering which is usually associated with 8% gains in efficiency at the encoder, but this can make motion vectors much more complicated in screen content and hurt compression efficiency. Similarly, when partitioning screen content lower sizes often work well for natural video, but the opposite is true for screen content.

The talk finishes with Zoe explaining how Visionular’s own AV1 implementation performed on standardised 4K against other implementations, their implementation of scalable video coding for RTC and the overall compression improvements.

Zoe Liu also contributed to this more detailed overview

Watch now!
Speaker

Zoe Liu Zoe Liu
CEO,
Visionular