Share the camera or share the screen, do it all with these browser APIs.

centedge
Jul 30, 2020
4 min read

For the last 5 months, the demand for video conferencing has been skyrocketed. Majority of the human population on our planet have been locked up in their respective homes and all the work is getting done through video conferencing. The most primary requirement for a video conferencing is to share the camera and microphone with occasional screen sharing with everybody else so that the individual can be seen, heard and understood properly. Majority of these video conferences now a days run directly from a browser without the need to install any external software or even browser extension. The browsers these days have got some magical powers to do all thing related to camera, microphones and screen share. In this post, we will explore the magical powers of the browser to share these things on demand and the open secret behind these magical powers.

The open secret

The much awaited open secret is this browser API named navigator.mediadevices. This api provides the functionalities which includes getUserMedia to acquire the camera and microphones on request, enumerateDevices to list out all the available devices and getDisplayMedia to capture screen or application window or browser tab etc. These are the most commonly used apis in a typical video conferencing application.

Video conferencing applications can retrieve the current list of connected devices and also listen for changes, since many cameras and microphones connect through USB and can be connected and disconnected during the lifecycle of the application. Since the state of a media device can change at any time, it is recommended that applications register for device changes by using the necessary navigator.mediadevices apis in order to properly handle changes.

Media constraints

The next thing that needs discussion is media constraints which defines how one can access the camera and microphone or the screen share while passing specific instructions to the browser.

Capture camera using getUserMedia

For example, if there are 3 cameras available

to a browser, then a specific instruction can be given to browser as a constraint to access a specific camera out of the available 3 cameras for the video call.

The specific constraints are defined in a MediaTrackConstraint object, one for audio and one for video. The attributes in this object are of type ConstraintLong, ConstraintBoolean, ConstraintDouble or ConstraintDOMString. These can either be a specific value (e.g., a number, boolean or string), a range (LongRange or DoubleRange with a minimum and maximum value) or an object with either an ideal or exact definition. For a specific value, the browser will attempt to pick something as close as possible. For a range, the best value in that range will be used. When exact is specified, only media streams that exactly match that constraints will be returned.

// Camera with a resolution as close to 640x480 as possible
{
    "video": {
        "width": 640,
        "height": 480
    }
}

// Camera with a resolution in the range 640x480 to 1024x768
{
    "video": {
        "width": {
            "min": 640,
            "max": 1024
        },
        "height": {
            "min": 480,
            "max": 768
        }
    }
}

// Camera with the exact resolution of 1024x768
{
    "video": {
        "width": {
            "exact": 1024
        },
        "height": {
            "exact": 768
        }
    }
}

To determine the actual configuration of a certain track of a media stream has, we can call MediaStreamTrack.getSettings() which returns the MediaTrackSettings currently applied.

It is also possible to update the constraints of a track from a media device we have opened, by calling applyConstraints() on the track. This lets an application re-configure a media device without first having to close the existing stream.

Capture screen using getDisplayMedia

An application that wants to be able to perform screen capturing and recording must use the Display Media API. The function getDisplayMedia() (which is part of navigator.mediaDevices is similar to getUserMedia() and is used for the purpose of opening the content of the display (or a portion of it, such as a window). The returned MediaStream works the same as when using getUserMedia().

The constraints for getDisplayMedia() differ from the ones used for regular video or audio input.

{
    video: {
        cursor: 'always' | 'motion' | 'never',
        displaySurface: 'application' | 'browser' | 'monitor' | 'window'
    }
}

The code snipet above shows how the special constraints for screen recording works. Note that these might not be supported by all browser that have display media support.

Tips and tricks

A MediaStream represents a stream of media content, which consists of tracks (MediaStreamTrack) of audio and video. You can retrieve all the tracks from MediaStream by calling MediaStream.getTracks(), which returns an array of MediaStreamTrack objects.

A MediaStreamTrack has a kind property that is either audio or video, indicating the kind of media it represents. Each track can be muted by toggling its enabled property. A track has a Boolean property remote that indicates if it is source by a RTCPeerConnection and coming from a remote peer.