A real-life incident that happened with one of our customers.
A customer of ours having offices in the US and EU has a nice & innovative video conferencing application with some really cool features for collaborative meetings. They came to us for helping them fix some critical bugs and load balance their video backend. A piece of Interesting information we came to know is that they were running only one media server but a really huge one with 72 cores! The reason for running such a large server was that they wanted a lag-free & smooth video experience for all. In the beginning, when they had a small server, they were facing issues with video quality. Therefore, they took the biggest possible server for consistent video quality for all without even realizing that the video quality issue was due to the server. After digging deep, we made some interesting discoveries about their architecture and suggested some changes to their video infrastructure which includes downgrading the media server to an 8-core media server and having a horizontal load balancer to distribute the load effectively. After the suggested changes, their video infra bill was down by ~80%.
Here is the comparison.
Before:
A 72-core instance in AWS in the EU Frankfurt region costs $3.492/hour which becomes $2514.24 per month.
After:
An 8-core instance in AWS AWS in the EU Frankfurt region costs $ 0.348/hour which becomes $250.56
A horizontal load balancer instance also costs approximately the same, i.e. $250 /month.
So the total becomes $500/ month. A savings of ~80% per month on the cloud server bill!
When the CEO of the company got to know of the media server bill, he was skeptical about the business viability of the service because of the cloud bill that used to be paid every month. After the change, the prospect of the service seems more promising to him for business viability.
Load balancing WebRTC Media Servers, The Need
The rush for creating video conferencing apps is going to stay especially using WebRTC. As WebRTC 1.0 is already standardized by the Internet Engineering Task Force (IETF) by the date this post is being written, it is going to become mainstream in the coming times with the advent of 5G. Having said that, building a video conferencing app still is much more complicated than building a pure web app. Why? Because too many things need to be taken care of to create a production-ready video conferencing app. Those too many things can broadly be divided into 2 major parts. One is to code the app and test it on the local network(LAN ). Once it is successfully tested locally, it is time to take it to the cloud to make it available to a host of other users through the Internet. This is where dev-ops plays a critical role.
Now let's understand why it is so important.
Let's assume you have built the service to cater to 50 users in conferencing mode in each room. Now if you have taken a good VPS like c5-xLarge on a cloud provider like AWS, let's assume it can support up to 10 conference rooms. What will happen if there is a need for an 11th room? In this case, you need to run another server that can handle another 10 rooms. But how will you know when the 11th room request will come? If don't want to check manually every time a new room creation request comes, then there are 2 options. Either you tell your user who is requesting the 11th room that the server capacity is full and wait until a room becomes free OR create a logic so that a new server can be created magically whenever the new room creation request comes!! Now this situation is called auto-scaling and this is the magical effect of doing proper dev-ops on your cloud provider. The point to note here is that the way you are creating new servers as the demand grows, similarly you have to delete the servers when the demand reduces. Else the bill from your cloud vendor will go over the roof!!
Here is a brief summary of how a typical load-balancing mechanism works. I am not going to discuss the core logic of when to scale as that can be completely dependent on the business requirement. If there is a need to be up-scaled or down-scaled( short form for creating or deleting servers on demand, programmatically) according to dynamic demand, then there has to be a control mechanism inside the application to let the cloud know that there is more demand for rooms, that's why more number of servers need to be created now to cater to the demand surge. Then the cloud has to be informed about the details of the VPS needed to be created like instance type, EBS volume needed, etc along with other needed parameters for the cloud to be able to create the server. Once the server is created, the cloud has to inform the application server back that the VPS has been created and is ready for use. Then the application server will use the newly created server for the newly created room and thus cater to the new room creation request successfully. A similar but opposite approach has to be taken when the rooms get released after usage. In this case, we need to let the cloud know that we don't need some specific servers and they need to be deleted as they won't be used until a new room creation request comes. When a new room creation request comes, one can again ask the cloud to create new servers and cater to the request for creating a new room successfully. This is how one will typically manage their dev ops to dynamically create and delete VPS according to the real-time need.
WebRTC auto-scaling/load-balancing, the strategies
Now that we understand what is DevOps in brief, let us also understand the general strategies to follow to do the dev ops, especially for the video conferencing use case. It can be broadly divided into 2 scenarios based on varied levels of automation that can be brought in to satisfy one's business requirement. Though there can be a lot of variations of automation that can be brought in, let me describe 2 strategies for the sake of simplicity that can satisfy a majority of the business requirements.
Strategy-1: Cloud agnostic Semi-automatic load balancing
In this strategy, the point is to automate the load distribution mechanism effectively to up-scale and down-scale the media servers while keeping the media servers in a cloud-agnostic manner. In this strategy, media server creation and deletion are not the scopes of load balancing. They can be independently created and updated in the load balancer in some manner so that there are enough servers always available to cater to when there is a surge in demand.
Pros:
Multi-cloud strategy
Better command and control
Less complex to implement
Cons:
Lesser automation
Strategy-2: Uni cloud Fully automatic load balancing
In this strategy, the point is to automate the load distribution mechanism effectively upscale and downscale while bringing in more automation while tightly coupling to a cloud provider.
In this, a cloud provider's APIs can be integrated to create and destroy servers in a completely on-demand manner, without much manual intervention. In this approach, the load balancer can create servers from a specific cloud using APIs in case of an upscaling need and delete a server whenever the load decreases.
Pros:
Greater automation
Highly resource-efficient
Cons:
More complex to implement
Dependent on a single cloud vendor
There is no general rule that one should follow a specific load-balancing approach. It completely depends on the business requirement for which one needs load balancing. One should properly understand one's business requirements and then decide the kind of load-balancing strategy that will be suitable. If you need help in deciding a good load-balancing strategy for your video infrastructure, feel free to have an instant meeting or a scheduled one with one of our core technical guys using this link.
Note: The load balancer mentioned in the above real-life incident is a WebRTC-specific stateful load balancer developed from scratch by us only for the purpose of auto-scaling WebRTC media servers. It is known as CWLB and more details about it can be found here.
Comments