Overview#

Introduction#

Video Storage Toolkit (VST), also referred to as VMS (Video Management System), manages audio and video streams and provides on-demand access to offline streams from storage. It accepts WebRTC streams from the front-end UI application and outputs RTSP streams for further processing. VST effectively manages stream quality, which is suitable for the currently available bandwidth. It provides low-bandwidth events to the front-end UI using the WebRTC data channel. VST also provides an API to download video clips with audio of the user session for offline analysis.

Input:
1. Webcam streams audio and/or video from the browser client
2. WebRTC messages from UE service and Tokkio UI
Output:
1. Redis notifications to downstream services notify about new webcam-stream/session.
2. WebRTC messages from UE service and Tokkio UI

Features#

Media streaming using hardware-accelerated WebRTC protocol for live and recorded videos.
RTSP streaming in pass-through and multi-cast mode.
RTP-UDP streaming.
Video storage and recording with aging policy.
Supports both H.264 and H.265 video formats.
Manage devices manually by IP address and/or RTSP URLs.
Provide REST APIs to write a client application to control and configure VST.
Support Redis message bus to publish device add/remove events.
Prometheus/Grafana integration for publishing VST statistics.
Cloud native. Deploy as a container.

VST Architecture#

UI Interaction#

WebRTC & WebSocket workflow

WebRTC Introduction#

Introduction: WebRTC is an open-source technology that allows direct, real-time communication of audio, video, and data between web browsers and devices without the need for additional software or plugins. It enables users to make voice and video calls, share files, or engage in text chats directly through their web browsers, making online communication more seamless and accessible. It enables real time streaming of video to web-browsers like a live sports match or music concert.
WebRTC Peers: WebRTC peers are like two people trying to have a conversation over the internet. They could be your computer, phone, or any device that can use a web browser. These peers want to talk directly to each other, without needing a middleman to pass messages back and forth. This direct connection allows for faster communication, which is great for things like video calls or live video streaming.
ICE Servers: ICE servers are like helpful guides that assist WebRTC peers in finding each other on the internet. Imagine you’re trying to meet a friend in a big city, but you don’t know their exact location. ICE servers are like information booths that give you directions and help you navigate through the city’s complex network of streets (in this case, the internet) to find your friend.
ICE Candidates: ICE candidates are like different possible routes to reach your friend in the city. Each candidate represents a potential path for connecting two WebRTC peers. Some routes might be direct (like walking straight to your friend), while others might be more roundabout (like taking a bus or subway). The WebRTC system tries these different routes to see which one works best for connecting the peers.
WebRTC Signaling: Signaling in WebRTC is like a mutual friend helping two people meet up. Before two WebRTC peers can start their direct conversation, they need to exchange some basic information, like where they are and how they can be reached. The signaling process handles this initial exchange of information. It’s similar to how your mutual friend might tell you and your other friend where and when to meet but doesn’t join you for the conversation.
Peer Connection: A peer connection is like a direct, private telephone line between two devices on the internet. Imagine you and a friend have a special string telephone that works across any distance. Once set up, you can talk, share pictures, or even play games without anyone else in between. This direct link allows for faster communication and is particularly useful for applications like video calls.

In more technical terms, a peer connection enables direct communication between two devices without routing through a central server. This peer-to-peer approach makes the connection more efficient and reduces latency, which is crucial for real-time applications.

WebRTC Connection Workflow: The diagram below depicts the technical WebRTC workflow in Tokkio when a TURN server is used. For simplicity, the diagram only depicts the microphone stream Peer Connection part of the Tokkio Workflow. A similar workflow is used for the avatar stream Peer Connection. You can consider Peer A as the Tokkio UI and Peer B as the VST in the context of Tokkio.

GetUserMedia API: The GetUserMedia API is built into almost all modern web browsers. The API enables UI applications to access the user’s webcam and microphone streams. For example, whenever you make a video call from a web browser (Zoom, FaceTime, or Teams), the UI application accesses your webcam via the GetUserMedia API.
Using WebRTC with GetUserMedia API: WebRTC is a technology to send real-time data from one place to another, and GetUserMedia is an API to collect webcam and microphone streams from a web browser like Google Chrome. What can we do if we combine the WebRTC protocol with the GetUserMedia API? We can collect the webcam and microphone stream from one web browser using the GetUserMedia API and then send it over to another web browser in some other part of the world via the WebRTC protocol. We can achieve all of this while maintaining low latency.