WebRTC: Enabling Collaboration Augmented Reality App
Source
In my previous post, we learned about how to detect, query & control the various media devices through WebRTC.
In this post, we’ll dive deep into the collaboration process.
Signalling:
session control, network and media information:
WebRTC uses a mechanism to coordinate communication and to send control messages, a process known as signalling. Signalling methods and protocols are not specified by WebRTC: signalling is not part of the RTCPeerConnection API. Instead, WebRTC app developers can choose whatever messaging protocol they prefer, such as SIP or XMPP, and any appropriate duplex (two-way) communication channel.
Signalling is used to exchange three types of information:
- Session control messages: to initialize or close communication and report errors.
- Network configuration: to the outside world, what’s my computer’s IP address and port.
- Media capabilities: what codecs and resolutions can be handled by my browser and the browser it wants to communicate with.
The exchange of information via signalling must have completed successfully before peer-to-peer streaming can begin.
For example, imagine Jack wants to communicate with Jill. Here’s a code sample from the W3C WebRTC spec, which shows the signalling process in action. Also note that on Chrome and Opera, RTCPeerConnection is currently prefixed.
Credits: https://www.html5rocks.com/en/tutorials/webrtc/basics/#toc-signalingCredits: https://www.html5rocks.com/en/tutorials/webrtc/basics/#toc-signaling
First, Jack and Jill exchange network information.
- Jack creates an RTCPeerConnection object with an onicecandidate handler.
- The handler is run when network candidates become available.
- Jack sends serialized candidate data to Jill, via whatever signalling channel they are using: WebSocket or some other mechanism.
- When Jill gets a candidate message from Jack, she calls addIceCandidate, to add the candidate to the remote peer description.
WebRTC clients (known as peers, aka Jack and Jill) also need to ascertain and exchange local and remote audio and video media information.
Signalling to exchange media configuration information proceeds by exchanging an offer and an answer using the Session Description Protocol (SDP):
- Jack runs the RTCPeerConnection createOffer() method. The return from this of this is passed an RTCSessionDescription: Jack’s local session description.
- In the callback, Jack sets the local description using setLocalDescription() and then sends this session description to Jill via their signalling channel. Note that RTCPeerConnection won't start gathering candidates until setLocalDescription() is called: this is codified in JSEP IETF draft.
- Jill sets the description Jack sent her as the remote description using setRemoteDescription().
- Jill runs the RTCPeerConnection createAnswer() method, passing it the remote description she got from Jack, so a local session can be generated that is compatible with his. The createAnswer() callback is passed an RTCSessionDescription: Jill sets that as the local description and sends it to Jack.
- When Jack gets Jill’s session description, he sets that as the remote description with setRemoteDescription.
- Ping!
The acquisition and exchange of network and media information can be done simultaneously, but both processes (offer & answer)must be completed before audio and video streaming between peers can begin. The offer & answer architecture described above is called JSEP, JavaScript Session Establishment Protocol.
Process of signalling and streaming.
Once the signalling process has completed successfully, data can be streamed directly peer to peer, between the caller and callee.
RTCPeerConnection
RTCPeerConnection is the WebRTC component that handles stable and efficient communication of streaming data between peers.
WebRTC architecture diagram
The main thing to understand from this diagram is that RTCPeerConnection shields web developers from the myriad complexities that lurk beneath. The codecs and protocols used by WebRTC do a huge amount of work to make real-time communication possible, even over unreliable networks.
RTCPeerConnection without servers
The code below is taken from the ‘single page’ WebRTC demo at webrtc.github.io/samples/src/content/peerconnection/pc1, which has local and remote RTCPeerConnection (and local and remote video) on one web page.
Caller:
Create a new RTCPeerConnection and add the stream from getUserMedia():
Credits: webrtc.github.io/samples/src/content/peerconnection/pc1,
Create an offer and set it as the local description for pc1 and as the remote description for pc2. This can be done directly in the code without using signalling because both caller and callee are on the same page:
Credits: webrtc.github.io/samples/src/content/peerconnection/pc1,
Callee:
Create pc2 and, when the stream from pc1 is added, display it in a video element:
Credits: webrtc.github.io/samples/src/content/peerconnection/pc1,
RTCPeerConnection plus servers
In the real world, WebRTC needs servers, however simple, so the following can happen:
- User discovery and communication.
- Signalling.
- NAT/firewall traversal.
- Relay servers in case peer-to-peer communication fails.
NAT traversal, peer-to-peer networking, and the requirements for building a server app for user discovery and signalling are not covered in this post.
To enable RTCPeerConnection to cope with NAT traversal and other network vagaries the STUN protocol and its extension TURN are used by the ICE.
Trending AR VR Articles:
1. Ready Player One : How Close Are We?
2. Augmented Reality — with React-Native
3. Five Augmented Reality Uses That Solve Real-Life Problems
4. Virtual Reality Headsets: What are the Options? Which is Right For You?
ICE is a framework for connecting peers, such as two video chat clients. Initially, ICE tries to connect peers directly, with the lowest possible latency, via UDP. In this process, STUN servers have a single task: to enable a peer behind a NAT to find out its public address and port. (You can find out more about STUN and TURN from the HTML5 Rocks article WebRTC in the real world.)
Finding connection candidates
If UDP fails, ICE tries TCP. If direct connection fails — in particular, because of enterprise NAT traversal and firewalls — ICE uses an intermediary (relay) TURN server.
WebRTC data pathways
Network topologies
WebRTC as currently implemented only supports one-to-one communication, but could be used in more complex network scenarios: for example, with multiple peers each communicating each other directly, peer-to-peer, or via a Multipoint Control Unit (MCU).
Many existing WebRTC apps only demonstrate communication between web browsers, but gateway servers can enable a WebRTC app running on a browser to interact with devices such as telephones (aka PSTN) and with VOIP systems. In May 2012, Doubango Telecom open-sourced the sipml5 SIP client, built with WebRTC and WebSocket which (among other potential uses) enables video calls between browsers and apps running on iOS or Android. At Google I/O, Tethr and Tropo demonstrated a framework for disaster communications ‘in a briefcase’, using an OpenBTS cell to enable communications between feature phones and computers via WebRTC. Telephone communication without a carrier!
RTCDataChannel
The RTCDataChannel API enables peer-to-peer exchange of arbitrary data, with low latency and high throughput. There are several simple ‘single page’ demos at webrtc.github.io/samples/#datachannel and our WebRTC codelab shows how to build a simple file transfer application.
The API has several features to make the most of RTCPeerConnection and enable powerful and flexible peer-to-peer communication:
- Leveraging of RTCPeerConnection session setup.
- Multiple simultaneous channels, with prioritization.
- Reliable and unreliable delivery semantics.
- Built-in security (DTLS) and congestion control.
- Ability to use with or without audio or video.
The syntax is deliberately similar to WebSocket, with a send() method and a message event:
Credits: https://www.html5rocks.com/en/tutorials/webrtc/basics/#toc-signaling
RTCDataChannel is available in Chrome, Safari, Firefox, Opera and Samsung Internet.
Security
There are several ways a real-time communication application or plugin might compromise security.
WebRTC has several features to avoid these problems:
- WebRTC implementations use secure protocols such as DTLS and SRTP.
- Encryption is mandatory for all WebRTC components, including signalling mechanisms.
- WebRTC is not a plugin: its components run in the browser sandbox and not in a separate process, components do not require separate installation, and are updated whenever the browser is updated.
- Camera and microphone access must be granted explicitly and, when the camera or microphone are running, this is clearly shown by the user interface.
A full discussion of security for streaming media is out of scope for this article. For more information, see the WebRTC Security Architecture proposed by the IETF.
In conclusion
The APIs and standards of WebRTC can democratize and decentralize tools for content creation and communication — for telephony, gaming, video production, music-making, news gathering and XR applications.
In my next post, we’ll discuss the next technology WebAssembly.
Don’t forget to give us your 👏 !
https://medium.com/media/1e1f2ee7654748bb938735cbca6f0fd3/href
WebRTC: Enabling Collaboration was originally published in AR/VR Journey: Augmented & Virtual Reality Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.