WebRTC: Real-Time Communication in the Browser
WebRTC enables direct peer-to-peer communication between browsers and devices.
WebRTC: Web Real-Time Communication
WebRTC (Web Real-Time Communication) is an open standard that enables browsers and mobile applications to exchange audio, video, and data directly with each other in real time, without requiring a plugin, a native app, or a dedicated media server. It is the technology behind video calls, screen sharing, and peer-to-peer file transfer in the browser.
What Is WebRTC
WebRTC is a collection of APIs and protocols built into modern browsers that allow two devices to establish a direct peer-to-peer connection and exchange media or data streams. Unlike WebSockets, which require a server to relay all messages, WebRTC is designed to send data directly between participants once the connection is established. This peer-to-peer architecture dramatically reduces latency and server bandwidth costs for real-time media applications.
WebRTC is an open standard maintained by the W3C and IETF and is supported natively in Chrome, Firefox, Safari, and Edge without any plugins or additional software. When you make a video call in Google Meet, participate in a voice call on Discord in the browser, or use a peer-to-peer file sharing tool, WebRTC is almost certainly the underlying technology making it possible.
Despite being peer-to-peer in its data transfer, WebRTC still requires a server for the initial connection setup. This process, called signalling, allows the two peers to discover each other, exchange network information, and negotiate the parameters of their connection before any media flows directly between them.
What WebRTC Can Transmit
| Data Type | API Used | Common Use Case |
|---|---|---|
| Audio | MediaStream, RTCPeerConnection | Voice calls, podcasting, voice messages, live audio broadcasting |
| Video | MediaStream, RTCPeerConnection | Video calls, screen sharing, remote desktop, live streaming |
| Arbitrary Data | RTCDataChannel | Peer-to-peer file transfer, real-time gaming, collaborative editing, chat without a server |
How WebRTC Works
Establishing a WebRTC connection is more complex than opening a WebSocket because two peers on the internet must find a way to connect directly to each other, often through firewalls, NAT routers, and other network obstacles. The process involves three distinct phases: signalling, network discovery, and media negotiation.
- Both peers connect to a signalling server, which is a standard WebSocket or HTTP server used only to coordinate the connection setup. The signalling server is not involved in transmitting media once the connection is established.
- Peer A creates an offer, which is an SDP (Session Description Protocol) document describing the media formats and capabilities it supports, and sends it to Peer B via the signalling server
- Peer B receives the offer, creates an answer SDP describing its own capabilities, and sends it back to Peer A via the signalling server
- Both peers simultaneously gather ICE candidates, which are potential network paths through which the connection could be established, including local IP addresses, public IP addresses discovered via a STUN server, and relay addresses provided by a TURN server
- ICE candidates are exchanged between peers through the signalling server. The ICE framework tries each candidate pair and selects the best one that works.
- Once a viable network path is found, a direct peer-to-peer connection is established and media or data flows directly between the two devices without passing through any server
- All media transmitted over WebRTC is encrypted using DTLS and SRTP, making it secure by default
Peer A Signalling Server Peer B
| | |
|-------- Offer SDP ----->| |
| |-------- Offer SDP ------->|
| |<------- Answer SDP -------|
|<------- Answer SDP -----| |
| | |
|--- ICE Candidates ------>|--- ICE Candidates ------->|
|<-- ICE Candidates -------|<-- ICE Candidates --------|
| | |
|<========= Direct Peer-to-Peer Connection ===========>|
| | |
Core WebRTC APIs
The browser exposes three primary APIs that together provide all of WebRTC's functionality. Most WebRTC applications use all three in combination.
| API | Purpose |
|---|---|
| getUserMedia() | Requests access to the user's camera and microphone and returns a MediaStream containing audio and video tracks. This is the starting point for any application that captures local media. |
| RTCPeerConnection | The core API that manages the peer-to-peer connection. It handles ICE candidate gathering, SDP offer and answer exchange, codec negotiation, and the actual transmission of media streams between peers. |
| RTCDataChannel | An API that allows arbitrary data to be sent between peers over the WebRTC connection, similar to a WebSocket but peer-to-peer. Supports both reliable ordered delivery and unreliable low-latency delivery. |
// Step 1: Get access to camera and microphone
const localStream = await navigator.mediaDevices.getUserMedia({
video: true,
audio: true
});
// Display local video in a video element
document.getElementById('localVideo').srcObject = localStream;
// Step 2: Create a peer connection
const peerConnection = new RTCPeerConnection({
iceServers: [
{ urls: 'stun:stun.l.google.com:19302' }
]
});
// Step 3: Add local tracks to the connection
localStream.getTracks().forEach(track => {
peerConnection.addTrack(track, localStream);
});
// Step 4: Handle incoming remote tracks
peerConnection.ontrack = (event) => {
document.getElementById('remoteVideo').srcObject = event.streams[0];
};
// Step 5: Handle ICE candidates (send to remote peer via signalling)
peerConnection.onicecandidate = (event) => {
if (event.candidate) {
signalingServer.send(JSON.stringify({ candidate: event.candidate }));
}
};
// Step 6: Create and send an offer
const offer = await peerConnection.createOffer();
await peerConnection.setLocalDescription(offer);
signalingServer.send(JSON.stringify({ offer }));
STUN and TURN Servers
Most devices are not directly reachable on the internet. They sit behind NAT routers and firewalls that assign private IP addresses internally and share a single public IP address externally. WebRTC uses two types of servers to overcome this obstacle.
- STUN (Session Traversal Utilities for NAT): A lightweight server that tells a peer its own public IP address and port as seen from the outside world. The peer includes this information as an ICE candidate, allowing the other peer to attempt a direct connection to it. STUN servers are cheap to operate and freely available. Google provides a public STUN server at
stun.l.google.com:19302. STUN works for the majority of connections where both peers are behind ordinary NAT routers. - TURN (Traversal Using Relays around NAT): A relay server that forwards media between peers when a direct connection cannot be established, typically because one or both peers are behind a strict corporate firewall or a symmetric NAT that blocks direct connections. When STUN fails, the ICE framework falls back to TURN. TURN servers consume significant bandwidth because all media passes through them, so they are typically operated by the application provider rather than used from a public source. They are essential for ensuring connectivity in all network environments.
WebRTC vs WebSocket
| Feature | WebRTC | WebSocket |
|---|---|---|
| Connection Model | Peer-to-peer between browsers after initial signalling | Client-to-server. All messages pass through the server. |
| Primary Use | Real-time audio, video, and data between users | Real-time data between a client and a server |
| Server Involvement | Only for signalling setup. Media flows directly between peers. | Required for every message. The server is always in the path. |
| Latency | Lowest possible. Direct path between peers with no server relay. | Low, but adds round-trip through the server |
| Media Support | Native audio and video with built-in codec negotiation and adaptive bitrate | No native media support. Raw binary data only. |
| Encryption | Mandatory. DTLS for data, SRTP for media. | Optional. Provided by TLS when using wss:// |
| Complexity | High. Requires signalling server, STUN/TURN, SDP negotiation. | Low to moderate. Simple persistent connection to a server. |
| Best For | Video calls, voice calls, screen sharing, P2P file transfer | Live chat, notifications, real-time dashboards, multiplayer game state |
Real-World Use Cases for WebRTC
- Video conferencing: Google Meet, Whereby, and Jitsi use WebRTC to deliver low-latency video calls directly between participants in the browser without requiring any plugin or download.
- Screen sharing: The
getDisplayMedia()API, which extends WebRTC, allows users to share their entire screen, a specific window, or a browser tab with remote participants in real time. - Peer-to-peer file transfer: RTCDataChannel allows files to be sent directly between browsers without uploading to a server. Tools like ShareDrop use this to enable fast, private file sharing on a local network.
- Live streaming: WebRTC can broadcast low-latency live video from a browser to viewers, with sub-second latency compared to the ten or more seconds typical of HLS-based streaming.
- Voice over IP: Browser-based telephony and customer support platforms use WebRTC to connect support agents with customers over voice, replacing traditional telephone infrastructure.
- Real-time collaborative tools: Applications that need to synchronise drawing canvases, shared whiteboards, or game state directly between participants can use RTCDataChannel for ultra-low-latency peer-to-peer data exchange.
- IoT and device streaming: WebRTC can stream video from cameras and sensors directly to a browser interface with minimal latency, making it suitable for remote monitoring and control applications.
Frequently Asked Questions
- Is WebRTC secure?
Yes. Security is mandatory in WebRTC rather than optional. All data transmitted over an RTCDataChannel is encrypted using DTLS (Datagram Transport Layer Security). All audio and video media is encrypted using SRTP (Secure Real-time Transport Protocol). The browser enforces these encryption requirements and will not establish a WebRTC connection without them. Additionally, browsers require user permission before accessing the camera or microphone, and the permission request is clearly visible to the user. - Does WebRTC always connect peer-to-peer?
It attempts to but does not always succeed. The ICE framework first tries to establish a direct connection between peers. If network conditions such as strict firewalls or symmetric NAT routers prevent a direct path, it falls back to routing media through a TURN relay server. In that case the connection is still encrypted end-to-end, but media passes through an intermediate server rather than flowing directly between the two devices. In practice, most consumer connections succeed with a direct or STUN-assisted path, but TURN servers are essential for corporate and restricted network environments. - Do I need a server to use WebRTC?
Yes, for the initial connection setup. A signalling server is required to allow the two peers to find each other and exchange the SDP offer, answer, and ICE candidates needed to establish the connection. The signalling server can be a simple WebSocket server or even a REST API. Once the connection is established, the signalling server is no longer involved in the data flow. You will also need STUN servers, and optionally TURN servers, to handle NAT traversal. For production applications, running your own TURN server or using a managed WebRTC infrastructure service is typically necessary. - What is SDP and why does it matter?
SDP stands for Session Description Protocol. It is a text-based format used in WebRTC to describe the parameters of a media session, including which codecs are supported for audio and video, the network addresses and ports being offered, the encryption keys to be used, and other session metadata. During connection setup, the initiating peer creates an offer SDP and the receiving peer responds with an answer SDP. The two peers compare their capabilities and negotiate the best settings both sides support. SDP is not designed to be human-readable or modified by hand, but understanding what it represents helps when debugging WebRTC connection failures. - How does WebRTC handle poor network conditions?
WebRTC includes several built-in mechanisms for coping with unreliable networks. It uses adaptive bitrate control, automatically reducing video resolution and quality when bandwidth is limited and increasing it when conditions improve. It implements Forward Error Correction (FEC) to recover from packet loss without retransmission. It uses jitter buffers to smooth out irregular packet arrival times. The RTCP protocol provides continuous feedback between peers about packet loss, round-trip time, and bandwidth estimates, which the sender uses to adjust its transmission rate dynamically. These mechanisms work automatically without any configuration from the developer.
Conclusion
WebRTC is one of the most powerful and complex APIs available in the browser, enabling real-time peer-to-peer audio, video, and data communication without plugins, native apps, or media relay servers in the common case. It is the technology behind browser-based video calls, screen sharing, peer-to-peer file transfer, and low-latency live streaming. Its complexity is justified by what it delivers: encrypted, direct communication between users with the lowest possible latency. Understanding signalling, ICE, STUN, TURN, and SDP gives you the foundation to build or integrate real-time communication features into any web application. Continue with WebSockets, HTTPS and TLS, and REST APIs to complete your understanding of real-time and network communication on the web.
