Demystifying a WebRTC video calling application for the beginner

So you are a developer(frontend/backend/full-stack) curious about developing applications using WebRTC. You are searching over the internet for the last couple of days or even months to learn the basics and to build a basic WebRTC video calling app along with a basic understanding. Though there are a few Github repositories available with the code for building a very basic p2p video calling app, none of them have the details about the inner workings of the code. The code in those repositories just runs in your localhost with a command or two where you can connect 2 tabs of your browser with a video call. When you try to read through those codes, you find a bunch of API calls that are very unfamiliar and sometimes even illogical.

In order to demystify the inner workings of a basic video calling app using WebRTC, we need to follow a 3 step beginner-friendly approach which also is commonsensical. Let’s start.

The very first step of building a video calling app is to understand how to acquire the camera and/or mic of the device you want to use for calling using any of these browsers, chrome/firefox/edge/safari. It can be a desktop/laptop / mobile device as long as any one of these browsers is present. Without the camera and mic, the video call has no meaning at all. There can be a use case where you are going to use WebRTC in the p2p mode with data channels for file sharing only but we are not going to discuss this in this blog post today. The way to acquire the camera and mic from the browser is to use an API called getUserMedia. The below line of code will acquire the camera and mic from the browser.

const stream = await navigator.mediaDevices.getUserMedia({audio:true,video:true});

With the above line of code, we will be able to acquire the camera and mic with some preconditions. The above line of code won’t work if you are not running on HTTPS. If you try to use the above line of code with HTTP, it will fail.

If you have successfully acquired the camera and mic, you are ready for step 2 of building a WebRTC video calling app. In this step, you need to build a simple signaling server so that some messages can be exchanged between both the caller and callee. This step is all about building the capability to create a server-side application that will connect to both the caller and callee, and let them share some secret messages with each other whenever needed. Nodejs is the server-side framework that is going to be used as a signaling server in this example and WebSocket as the connectivity mechanism to connect both the users.

Here is the sample code.

const https=require('https');
const WebSocket=require('ws');
const WebSocketServer=WebSocket.Server;
const httpsServer=https.createServer(serverConfig,handleRequest);
httpsServer.listen(HTTPS_PORT,'0.0.0.0');
const wss=newWebSocketServer({server: httpsServer});

wss.on('connection',function(ws){
    ws.on('message',function(message){
    
    })
})

With the above lines of code, one has a basic signaling server ready to listen to messages from the caller/callee.

Now you are ready for building the real video calling application using the work we did in the last 2 steps. Here are the steps that are needed to establish the call.

The Caller (peer A) connects to the signaling server and waits for the callee(peer B)
The Callee peer B connects the signaling server and also informs peer A that he /she is available for a call
Peer B clicks the call button and boom! the call is connected where both peer A and peer B can see and listen to each other.

Here are the real steps that happen behind the scenes to establish the call.

Peer B first creates a new PeerConnection object while passing the available ICE servers as a parameter, which helps in sending and receiving the media streams.

const pc = new RTCPeerConnection({iceServers});

Then it acquires the local camera and mic and adds those camera and mic tracks to the PeerConnection. This will make the PeerConnection ready to send the media feeds as soon as the connection is established, i.e. when both the user agree to use a common network configuration acceptable to both)

stream.getTracks()
      .forEach((track) =>
        pc.addTrack(track, stream)
      );

Then it creates an offer to generate an offer SDP(session description protocol) which contains a large number of information (approx. 80 -100 lines of information) in plain text format. It contains information like network settings, available media stream(audio/video/screen share/anything else), codecs currently available to encode and decode media data packets, and many other things.

const offer = await pc
      .createOffer()
      .catch(function (error) {
        alert("Error when creating an offer", error);
      });

Once SDP is generated, the local description of PeerConnection is set using the offer. In simple terms, it asks the browser for the final confirmation of the validity of all the options available in SDP. Once the local description is set, the SDP aka settings can’t be changed anymore and the SDP is then sent to remote peer A to let its browser do all the things that peer B’s browser just did.

 await pc.setLocalDescription(offer);
 //send the offer to peer A using the signalling channel

As soon as the local description is set , it starts generating ice candidates( in simple terms, the current network configurations of peer B) and sends it to peer A to check if the network parameters are acceptable to his / her device to receive media streams.

pc.onicecandidate = function (event) {
      if (event.candidate) {
       //send the ice candidate to the other peer using the   
       //signalling channel 
      }
    };

Once the SDP is received by peer A’s browser sent via the signaling server, peer A first creates a PeerConnection object while passing the ICE servers as a parameter for the same purpose. As soon as the PeerConnecion is created, it uses the offer SDP provided by peer A to set its remote description. This is needed to be done to let the browser know of the other peer’s details so that the browser can create an answer SDP as an answer to the offer at a later stage.

const pc = new RTCPeerConnection({iceServers});
pc.setRemoteDescription(new RTCSessionDescription(offer));

It is a repeat of step 2 for peer A where it acquires his / her own media streams and adds them to the Peerconnection to be ready to send once the connection is established.

stream.getTracks()
      .forEach((track) =>
        pc.addTrack(track, stream)
      );

Then It creates an answer by calling the create answer API on the PeerConnection object and generates the answer SDP. Once the answer SDP is generated, the local description is set on peer A’s side to ask the browser for one final confirmation. Once confirmed, the answer is then sent to peer B via the signaling channel for peer B’s browser’s acceptance of this answer.

const answer = await pc
      .createAnswer()
      .catch(function (error) {
        alert("Error when creating an answer", error);
      });
await pc.setLocalDescription(answer);
//send the offer to peer A using the signalling channel

Once the answer SDP is received on user B’s side, it calls the set remote description API to ask the browser for acceptance of the other user’s SDP. Once the browser confirms, the connection for media transport is now established.

pc.setRemoteDescription(
      new RTCSessionDescription(answer)
    );

Step 5 is repeated by peer A’s browser for the exact same purpose. Both the browsers have knowledge of each other’s network configuration by now. After this, both of the peers agree to use one network configuration among all the possible network configuration options given by both of their browsers. The mutually selected network configuration aka ice candidate is then used for the actual media transport between both of the users.

pc.onicecandidate = function (event) {
      if (event.candidate) {
       //send the ice candidate to the other peer using the   
       //signalling channel 
      }
    };

Once the connection is established, each of the PeerConnection objects starts sending their respective media streams to the remote user. As soon as the media reaches the other side, an event named ontrack is triggered on the PeerConnection object to let the browser know that other peer media has already reached and is ready to be consumed. The local browser then extracts the media from its PeerConnection object and displays it in a video element.

pc.ontrack = (event) => { 
    if(event.streams && event.streams[0]){
    //The remote stream is now available at event.streams[0]. It     
    //can be attached to the srcObject of a video element to 
    //display the remote stream to the peer.
    } 
}

Now the call is successfully established where both peer A and peer B can communicate with each other in real-time with their respective camera and mic.

Once all the above-mentioned steps are done correctly, a WebRTC video call can be established successfully. Here is the link to the Github repo where all the above steps are created in separate folders along with working code for your reference.

Do keep in mind that this is for learning and understanding the inner workings of how a simple WebRTC p2p call works. If you want to build a production grade p2p call which you can deploy to a cloud and use it for a commercial venture, you need to check this out.

If you want to build a production grade video calling app by yourself as an extension to this project, you need to check this post learn more about all the necessary features in a production grade app.

Also keep in mind that you need a robust architecture to build a production grade app. The code in the Github repository created for this example, has been created for learning purpose only and is not fit for production usage. If you are interested in scheduling a discussion with a principal consultant at Centedge to do the right architecture for you, you can schedule a free 30 mins consultation cal using this link

Share The Post

Submit a Comment Cancel reply

How WebRTC Signaling Simplifies Real-Time Communication

In today's digital age, the demand for real-time communication has surged dramatically. From video conferencing tools to live streaming applications, Read more

The Power of Cloud Contact Centers without IVR: Transforming Customer Interactions Across Industries

In today's fast-paced business world, customers expect more from brands than just products or services—they want personalized, seamless, and immediate Read more

Demystifying a WebRTC video calling application for the beginner

Submit a Comment Cancel reply

Industries

Solutions

Reach Us

Follow us on: