Video Storage Toolkit#
Introduction#
Video Storage Toolkit (VST) also referred as VMS (Video Management System) manages audio and video streams and provides on demand access to offline streams from storage. It accepts WebRTC stream from front end UI application and outputs RTSP streams for further processing. It is also responsible for streaming out Avatar animation stream to front end UI application. VST effectively manages stream quality which is suitable for the current available bandwidth. It provides low bandwidth events to front end UI using WebRTC data channel. VST also provides API to download video clips with audio of user session for offline analysis.

Input:
Webcam stream audio and/or video from browser client;
Avatar udp stream provided by OV-renderers
Output:
Avatar Webrtc output stream rendered on browser client;
RTSP stream to Chat Controller (audio) & DS (video)
Redis notifications to downstream services notify about new webcam-stream/session
Features#
Media streaming using hardware-accelerated WebRTC protocol for live and recorded videos.
RTSP streaming in pass-through and multi-cast mode.
RTP-UDP streaming.
Video storage and recording with aging policy.
Supports both H.264 and H.265 video formats.
Manage devices manually by IP address and/or RTSP URLs.
Provide REST APIs to write client application to control and configure VST.
Support Redis message bus to publish device add/remove events.
Prometheus/Grafana integration for publishing VST statistics.
Cloud native. Deploy as a container.
VST Architecture#

UI interaction#
WebRTC & WebSocket workflow

VST Web Streamer Usage Guide#
The VST Web Streaming library encapsulates the complexities of WebRTC, WebRTC Signaling and VST APIs to provide easy to use single shot APIs to developers to perform WebRTC Streaming.
APIs#
Updating lib configuration#
streamManger.updateConfig ({
inboundStreamVideoElementId: string,
outboundStreamVideoElementId?: string,
connectionId?: string,
queryParams?: string,
enableWebsocketPing?: boolean,
websocketPingInterval?: number,
vstWebsocketEndpoint: string,
enableLogs?: boolean,
enableMicrophone?: boolean,
enableCamera?: boolean,
websocketTimeoutMS?: number,
streamType: StreamType,
enableDummyUDPCall: boolean,
sendCustomWebsocketMessage?: (msg: string) => boolean,
firstFrameReceivedCallback?: () => void,
errorCallback?: () => void,
successCallback?: () => void,
closeCallback?: () => void,
});
Key Name |
Description |
Optional/Mandatory |
Default Value |
inboundStreamVideoElementId |
Video Element ID where inbound stream will be displayed |
Mandatory |
undefined |
outboundStreamVideoElementId |
Video Element ID where webcam stream will be displayed |
Optional |
undefined |
connectionId |
The connection ID for the WebSocket connection |
Optional |
random UUID |
queryParams |
Query parameters to pass with WebSocket Connection |
Optional |
empty string |
enableWebsocketPing |
Enable WebSocket Ping so that connection is not dropped due to inactivity |
Optional |
true |
websocketPingInterval |
Change WebSocket Ping message Interval |
Optional |
2000 |
vstWebsocketEndpoint |
VST WebSocket Endpoint. This typically ends with /vms/ws |
Mandatory |
undefined |
enableLogs |
Enable the library logs for debug purpose |
Optional |
true |
enableMicrophone |
Enable Microphone for outbound stream |
Optional |
true |
enableCamera |
Enable Camera for outbound stream |
Optional |
true |
websocketTimeoutMS |
Websocket Timeout |
Mandatory |
5000 |
streamType |
Supported VST Stream types are ‘live’, ‘replay’ and ‘streambridge’ |
Mandatory |
streambridge |
enableDummyUDPCall |
Debug purpose only. If enabled a dummy UDP stream will be started by VST for inbound stream |
Mandatory |
false |
sendCustomWebsocketMessage |
Send any custom message to VST |
Optional |
() => {} |
firstFrameReceivedCallback |
Callback will be triggered when first frame is received by the Video Element |
Optional |
() => {} |
errorCallback |
Callback will be triggered if any errors occurs during streaming or during connection establishment |
Optional |
() => {} |
successCallback |
Callback will be triggered if streaming is successful |
Optional |
() => {} |
closeCallback |
Callback will be triggered when connection is closed inside the lib and cleanup is done |
Optional |
() => {} |
Starting the stream with stream configuration#
streamManger.startStreaming({
streamId?: string,
startTime?: string,
endTime?: string,
options: {
rtptransport: string,
timeout: number,
quality: string,
overlay?: {
objectId?: number[],
color?: string,
thickness?: number,
debug?: boolean,
needBbox?: boolean,
needTripwire?: boolean,
needRoi?: boolean,
}
}
});
Key Name |
Description |
Optional/Mandatory |
Default Value |
streamId |
The stream ID of the sensor stream that will be streamed in the video player |
Optional |
undefined |
startTime |
The start time in UTC format. This is required for replay streams |
Optional |
undefined |
endTime |
The end time in UTC format. This is required for replay streams |
Optional |
undefined |
options |
The options for stream. Options include features like enabling overlay, bounding boxes, quality of stream etc., |
Optional |
undefined |
rtptransport |
WebRTC Specific settings. Keep default value |
Mandatory |
udp |
timeout |
WebRTC Specific settings. Keep default value |
Mandatory |
60 |
quality |
WebRTC Specific settings. The streaming quality can be controlled by using this field. The allowed values are ‘auto’, ‘low’, ‘medium’, ‘high’ and ‘pass-through’ |
Mandator |
auto |
overlay |
The overlay related options like color of overlay, thickness etc. |
Optional |
undefined |
objectId |
The filter for object IDs for overlay. Array of objectIds is allowed as filter |
Optional |
undefined |
color |
The color of overlay |
Optional |
undefined |
thickness |
The thickness of overlay or bounding boxes |
Optional |
undefined |
debug |
Enable debug overlay |
Optional |
undefined |
needBbox |
Enable bounding boxes |
Optional |
undefined |
needTripwire |
Enable tripwire overlay |
Optional |
undefined |
needRoi |
Enable ROI overlay |
Optional |
undefined |
Stopping the stream#
streamManger.stopStreaming();
Getting the peer connection objects#
const inboundObject = streamManger.getInboundPeerConnectionObject();
const outboundObject = streamManger.getOutboundPeerConnectionObject();
Getting Peer IDs#
const inboundPeerId = streamManger.getInboundStreamPeerId();
const outboundPeerId = streamManger.getOutboundStreamPeerId();
Send custom WebSocket message#
const message = "Hello";
const isSuccess = streamManger.sendCustomWebsocketMessage(message);
if (isSucess) console.log("Success");
Get Stream Configuration#
const streamConfiguration = streamManger.getStreamConfig();
Get Lib configuration#
const libConfiguration = streamManger.getConfig();
Supported use-cases#
Tokkio Streaming#
Inbound Stream
Tokkio inbound stream use-case. The library supports inbound video streaming for Tokkio use-case. This is typically used to stream digital human avatars.
Outbound Stream
Tokkio outbound stream use-case. The lib supports outbound video streaming. Either the webcam or the microphone or both can be recorded and sent to VST via outbound stream. In Tokkio use-case the outbound stream works in conjunction with inbound stream.
Examples#
The following sections show minimal working examples.
Tokkio Streaming#
Inbound Stream Only
const streamManger = new StreamManager();
streamManger.updateConfig({
inboundStreamVideoElementId: 'unqiue-video-element-id',
connectionId: 'unique-uuid',
enableWebsocketPing: true,
vstWebsocketEndpoint: 'ws://10.41.25.11:30000/vms/ws',
enableLogs: false,
errorCallback: () => {console.log('Error Callback')},
successCallback: () => {console.log('Success Callback')},
});
streamManger.startStreaming();
Outbound & Inbound Stream
const streamManger = new StreamManager();
streamManger.updateConfig({
inboundStreamVideoElementId: 'unqiue-video-element-id',
connectionId: 'unique-uuid',
enableWebsocketPing: true,
vstWebsocketEndpoint: 'ws://10.41.25.11:30000/vms/ws',
enableLogs: false,
enableMicrophone: true,
enableCamera: true,
errorCallback: () => {console.log('Error Callback')},
successCallback: () => {console.log('Success Callback')},
});
streamManger.startStreaming();
VST Configuration#
The following VST configuration options can be modified in tokkio-app-params.yaml file. At least one valid turnserver (coturn/twilio) or valid reverse-proxy server should be provided for webcam and avatar streaming to work over webRTC.
VST Configuration# stunurl_list
set list of stun URLs, It discovers their public IP and type of NAT
[“stun.l.google.com:19302”,”stun1.l.google.com:19302”]
static_turnurl_list
list of TURN servers with long term credential mechanism. TURN provides a relay mechanism for communication when direct peer-to-peer communication is not possible due to NAT traversal issues.
[“admin:admin@10.0.0.1:3478”, “admin:admin@10.0.0.1:3478”]
use_coturn_auth_secret
enable authentication secret mechanism for coturn
false
coturn_turnurl_list_with_secret
list of co-turn servers with short term credential mechanism. To use this config, use_coturn_auth_secret should be set to true
[“10.0.0.1:3478”:<secret_key>, “10.0.0.1:3478”:<secret_key>]
use_twilio_stun_turn
enable to use twilio stun and turn server
false
twilio_account_sid
twilio account username. The config option use_twilio_stun_turn should be set to true to enable use of twilio server
“”
twilio_auth_token
authentication token of twilio account
“”
use_reverse_proxy
use reverse proxy instead of turnserver (coturn/twilio). RP is public facing service that directly receives and handles client traffic, performing the appropriate routing so that the traffic arrives at tokkio cluster VPC.
false
reverse_proxy_server_address
if use_reverse_proxy is set to true, then set reverse_proxy server address & port. Also env variable REVERSE_PROXY_SERVER_ADDRESS can be used to set the ip_address.
10.0.0.1:100
max_webrtc_out_connections
set maximum count of webrtc out connections i.e avatar stream
8
max_webrtc_in_connections
set maximum count of webrtc in connections i.e webcam stream
3
grpc_server_port
set GRPC server port. GRPC server used to port negotiate with ov-renderer to receive avatar udp stream
50051
webrtc_in_audio_sender_max_bitrate
set max bitrate in kbps to be used by web-UI for webcam stream
128000
webrtc_in_video_degradation_preference
set degradation preference to be used by webrtc sender for webcam stream. It controls the quality of media streams (either resolution or framerate) when network conditions degrade
“framerate”
total_video_storage_size_MB
set max video record size used to record webcam stream
10000
always_recording
set always recording on or off
false
gpu_indices
set GPU indices to use particular gpu device inside vst container, If there are multiple gpu devices visible inside container
[]
webrtc_port_range
set webrtc min and max port range. This should be in sync with nodePort range in the VST microservice helm-chart. VST uses nodeports for webrtc media traffic due to separate RP instance
min 30001, max 30030
use_software_path
enable or disable software path, if gpu is not available.
false
enable_websocket_pingpong
enable websocket periodic ping pong.This is to avoid websocket connection break due to proxy/LB or any firewall policies
false
websocket_keep_alive_ms
websocket periodic ping pong time in milliseconds
5000
WebRTC Introduction#
Introduction: WebRTC is an open-source technology that allows direct, real-time communication of audio, video, and data between web browsers and devices without the need for additional software or plugins. It enables users to make voice and video calls, share files, or engage in text chats directly through their web browsers, making online communication more seamless and accessible. It enables real time streaming of video to web-browsers like a live sports match or music concert.
WebRTC Peers: WebRTC peers are like two people trying to have a conversation over the internet. They could be your computer, phone, or any device that can use a web browser. These peers want to talk directly to each other, without needing a middleman to pass messages back and forth. This direct connection allows for faster communication, which is great for things like video calls or live video streaming.
ICE Servers: ICE servers are like helpful guides that assist WebRTC peers in finding each other on the internet. Imagine you’re trying to meet a friend in a big city, but you don’t know their exact location. ICE servers are like information booths that give you directions and help you navigate through the city’s complex network of streets (in this case, the internet) to find your friend.
ICE Candidates: ICE candidates are like different possible routes to reach your friend in the city. Each candidate represents a potential path for connecting two WebRTC peers. Some routes might be direct (like walking straight to your friend), while others might be more roundabout (like taking a bus or subway). The WebRTC system tries these different routes to see which one works best for connecting the peers.
WebRTC Signaling: Signaling in WebRTC is like a mutual friend helping two people meet up. Before two WebRTC peers can start their direct conversation, they need to exchange some basic information, like where they are and how they can be reached. The signaling process handles this initial exchange of information. It’s similar to how your mutual friend might tell you and your other friend where and when to meet but doesn’t join you for the conversation.
Peer Connection: A peer connection is like a direct, private telephone line between two devices on the internet. Imagine you and a friend have a special string telephone that works across any distance. Once set up, you can talk, share pictures, or even play games without anyone else in between. This direct link allows for faster communication and is particularly useful for applications like video calls.
In more technical terms, a peer connection enables direct communication between two devices without routing through a central server. This peer-to-peer approach makes the connection more efficient and reduces latency, which is crucial for real-time applications.
WebRTC Connection Workflow: Below diagram is depicting the technical WebRTC workflow in Tokkio when TURN server is used. For simplicity, the diagram only depicts the microphone stream Peer Connection part of Tokkio Workflow. Similar workflow is for avatar stream Peer Connection. You can consider Peer A as Tokkio UI and Peer B as VST in context of Tokkio.

GetUserMedia API: The GetUserMedia is an API built in almost all modern web browsers. The API enables the UI applications to get the WebCam & Microphone streams of the user. For example, whenever you are making a video call from a web browser (Zoom, Facetime or Teams) the UI application is accessing your webcam via the GetUserMedia API.
Using WebRTC with GetUserMedia API: WebRTC is a technology to send real time data from one place to another and GetUserMedia is an API to collect WebCam & Microphone streams from a Web Browser like Google Chrome. What can we do if we combine the WebRTC protocol with GetUserMedia API? We can collect the webcam & Microphone stream from one Web Browser using GetUserMedia API and then send it over to another Web Browser in some other part of the world via the WebRTC protocol. We can achieve all of this while maintaining low latency.
