You can quickly and easily ‘scale up’ in the cloud, but how? Life is seldom as easy as just clicking a button and the times you do find a button, chances are that will help you scale your outputs. But what happens when you need to scale your inputs? What should you consider when creating your scaling architecture in the cloud? Why is scaling down more difficult than scaling up for a peak? This webinar highlights what you need to know.
Karel Boek from Raskenlund starts by explaining that, while CDNs allow scaling for delivery to end-users, there are fewer solutions for scaling up your ingest. Even if you’re streaming using WebRTC, which isn’t cacheable via CDNs, there are companies such as NanoCosmos who will scale that for you. But for ingest, scaling gets more bespoke more quickly.
There is, Karel explains, the option to outsource entire operation to AWS. For many, this is on the face of it, ideal as there’s not that much work to be done. However, you may need to use more customisation than is possible on this general service and, more importantly, there’s a reason which also affects the second option: creating some of your own workflows but using the cloud to scale.
The problem with cloud autoscalers is that they’re built for HTTP. Karel details how they look at metrics from your servers to determine the point at which they need to be scaled. These could be metrics such as the number of HTTP connections, CPU usage, bandwidth etc. Although Google does allow custom metrics, you may quickly find that a key metric such as GPU load isn’t supported leaving you having to scale without the most important data driving the decision making. Worse, when it comes to scaling down, autoscalers don’t understand ingest. As ingest streams stop, the scaler could be looking at a server which is taking a feed but has very low utilisation and therefore gets killed distributing the stream.
Building your own system is the only way to fully mitigate or remove these problems as you’re putting yourself in full control of creating a system sensitive to the ‘unusual’ metrics of ingesting streams which are very different from serving HTTP files that many autoscalers are built around. Karel looks at the elements of scaling a solution including load balancers, proxy servers and creating an algorithm which listens to metrics and makes up- and down-scaling decisions.
Karel advises writing down the logic for when and how to scale up so that’s it’s clear and well thought-through. Similarly, you need a strategy for Load balancing (i.e. why is round-robin the right/wrong choice for you?) and a scaling down plan. In order to scale down with minimal impact, you need to scale up well. You should use as many clues as you can to group similar feeds onto similar servers. This means a whole server is more likely to be free than if you mix and match long-lived and short-lived feeds on the same server, say.
Finally, Karel details the three main Pitfalls: Scaling down, time taken to scale up (can you wait 3 minutes?), and creating upper limits on your scaling to prevent your algorithm autoscaling you into debt by spinning up tens, hundreds or thousands of unnecessary servers.