Building a scalable production grade WebRTC video app

centedge
Jul 29, 2022
7 min read

They say the customer is always right. But this is not always true in case of building a production grade WebRTC video calling app which also can scale.

Why? Because the customer many times wants to build a world class video app like zoom / google meet within a time period of 3 months with a budget of $$$$ / 1$$$$.

How do they come to such conclusions about time and money?

They came to such conclusions because they read it on the internet that WebRTC is free as well as opensource and one can build an app like using zoom /google meet using WebRTC by downloading a random open source package with WebRTC as a keyword, from Github. They concluded afterwards that all things needed for a app like gmeet are available, either in WebRTC or in the open source package. They just need to build a new UI layer and a dashboard around WebRTC to challenge zoom / gmeet!

A notion that WebRTC is free and open source, things like gmeet can easily being built using it without much effort, is slowly built in mind of the customer. This notion is creating a lot of confusion between companies like us and the prospective customers. It takes us time to make the prospective customers understand the reality about WebRTC and the effort it takes to build production grade video applications using WebRTC.

Once the initial understanding is built that building a live video application is more than simple WebRTC, another issue arises. This time the issue is about building large enough video rooms which can possibly cater to may be a million users. A million users in one Room!

Again we need put efforts to understand the customers thought process by asking the right set of questions. It turns out that the customer currently has a teaching learning application where one teacher teaches one student at a time. They started by using gmeet for free to conduct such sessions but later they realized that they need more control along with deep integration of the calling feature to their dashboard. That's why they are currently looking for an alternative which can provide deep integration to their dashboard while being cost effective. But they anticipate that their product will have explosive growth and will reach a million students within couple of years. That's why they want to build a large enough video conferencing application which can scale to million users when it happens, in a couple of years.

Here again we need to put efforts to make our customers understand how a WebRTC application really works. We need discuss and explain to them the various kind of WebRTC architectures like p2p, full mesh multiparty, conferencing, live streaming, Webinar etc. Though we prefer to not to use much WebRTC jargons in the discussion, some time it becomes unavoidable when we need top explain them things like SFU, MCU, ICE/STUN/TURN, Media server, Recording Server, FFMPEG, GStreamer etc. After one or two rounds of discussions, they themselves realize that their current need can be very well satisfied in a p2p call occasionally with a TURN server. After all these discussions, they understand that building a production grade scalable p2p video call, takes much less time and resources than building a scalable production grade video conferencing application. It is a also a good starting point to test a market and the product, before investing more resources in building a scalable video conferencing application. It is immaterial that they choose a P2P app or conferencing app, in a couple of rounds of discussions, they equip themselves with all the necessary knowledge to understand the reality with WebRTC. From here on-wards, it becomes a rewarding experience to help the customer achieve his /her business objective.

After going though the above mentioned situation for a couple of time, we decides to build a scalable production grade WebRTC p2p video calling application with a loosely coupled UI. This way UI can be modified according to individual needs where as the architecture, the front end functionality and the back end stays the same. Though building a simple p2p app seems easy but building a scalable, production grade p2p video calling app with certain level of bad network tolerance is not so straight forward. Why? Because one need to take care of the below mentioned things in a production grade app which are generally not present in a simple p2p app available on Github.

Audio / Video management: This feature includes all possible things a user may need while using the app like muting / un-muting the mic, switching on / off the camera, changing of existing mic / camera to a new mic / camera for rest of the call, allowing moderator controls for remote media input change( so that a teacher can mute / un-mute the mic or switch on /off the camera of student in case of a need ) etc.

Capturing images / statistics: This feature helps in capture an image from the real time video stream for a purpose like vKYC(video Know Your Customer). With this feature, an bank agent can capture the image while the ban's customer shows his / her photo identity card during the call for bank's verification purposes. Collecting real time call and network statistics are also important for quality control and monitoring purposes. Also real time network monitoring can raise timely alerts to users when their network quality degrades.

Auto re-connection: This is primarily important for maintaining the call even when the network fails. A network typically fails when one's device changes the network connection while a call is going on. It happens when a users' device switches between WiFi and mobile network modes like 3G/4G / LTE etc. The network temporarily fails when the switch happens and comes back once the switch is over. In order to auto reconnect, an application need to detect network failure, wait until the network reconnects and restart the media communication as quickly as possible after the successful re-connection. In the WebRTC terminology, restoring the media communication is called ICE restart.

Integrations: Other application integrations like whiteboards for collaboration, text chat option, file sharing etc also play an important role for some users. Either these features should be there or a provision should be there for the integration of these features in case of a need.

Recording : Recording is another important feature in any WebRTC application. Though it may not be needed in all kinds of WebRTC applications, it becomes necessary for applications like video call centers, video health etc. Recording can happen either in client side or server side. Ideally a server side recording is preferred as it allows to post process the raw recording for multiple purposes. As an example, a video recorded in WebM format, the default recording format for WebRTC, consumes 800 MB - 1000 MB of space to store an hour long video recording which is a lot. In the server side, one can use a tool called FFMPEG to compress it, watermark it and convert it to MP4 format which can reduce the size to < 100MB for the same video. Once can use client side recording and send the file to server once the recording is done as an alternative strategy if server side recording is not possible( like a P2P call).

In call Media Manipulation: There may be a need for masking some portion of the video stream while in call for either for security or convenience purposes. A widely used feature these days for such a need is called background removal, where the background of a user sharing a camera is either blurred or replaced with another image of a coffee shop / office desk / meeting room etc. There may be other such use cases as well.

All of the above mentioned features along with a robust architecture ready for scale makes a production grade application. It takes a lot of efforts and time to build such an application with an excellent team with deep understanding of WebRTC and related technologies, frontend, backend, and many other such things.

If you are a customer looking forward to build a production grade scalable video calling app, then by now you know that you need to have a rock star team with solid understanding of WebRTC and related technologies along with sufficient time and resources at your disposal to venture on such an adventure. If you don't have a rock star team or time at your disposal, then we are here to help in any of below mentioned way.

CP2P is our scalable production grade P2P video calling app ready for production deployment as a managed service. It comes with a very minimal UI ready for retro fitting. Either you can share your UI design and we build it for you or share the fully designed UI for integration. We can integrate , deploy in your servers / our servers and manage it for you in a cost effective manner. The link to view details and check it in action is there at the bottom of this page.

CVR is our scalable production grade video conferencing / live streaming / Webinar application ready for production deployment as a managed service. It uses our in house built from scratch WebRTC load balancer CWLB to distribute and balance load in real time with a utilization efficiency of 75%. It also uses CR, an in house advanced recording engine developed from scratch to record meetings. It uses Mediasoup as it's core media server. The link to know more about it and schedule a demo is at the bottom of this page.

If you think that you don't want to use any of these products, but want o develop it from scratch, we as an consulting company can help you build your own product from scratch. In this case, you need more resources and time then the previously mentioned options. If you have more resources and time at your disposal, then this can be a path to trade. The only thing to make sure in this case is that you have a dependable rock star team who can work with us for building the product.

In case you don't have a rock start team, then there is a reason to worry. But why worry when we are here. We have an instructor led online / offline, full time training program where we can convert any fullstack / frontend / backend developer with sound javascript knowledge to rock star WebRTC developer. The timelines for the training program are as below.

5 - 7 days (for the WebRTC fundamentals program)
10 -14 days ( for the WebRTC fundamentals and advanced WebRTC with Mediasoup program)

I hope I was able to share enough information about building a scalable production grade video conferencing app. If you still have doubts or questions, you can reach out to me either on sp@centedge.io or on hello@centedge.io.

The link to details on CP2P is here.

The link to details on CVR is here.

The link to schedule a free 30 mins discussion with one of our experts to resolve all your WebRTC related queries is here.

If you are student / developer looking forward to learn more about basics of WebRTC by yourself through working examples, here is a github repo with working examples on successful MediaStream acquisition, building a basic signalling server, and building a working P2P call app.

1 Comment

Roberto

Nov 27, 2023

When it comes to building a WebRTC video calling application, there are often misconceptions about time and cost. Many customers believe that they can create an application like Zoom or Google Meet in a short time and on a limited budget by simply using WebRTC and downloading open source packages from GitHub. By the way, if you ever need help recording screen on Mac, check out this cool tutorial from Movavi: https://www.movavi.com/support/how-to/how-to-record-screen-on-mac . html It's always a good idea to have screen recording skills on hand!

Edited