Video Storage Toolkit#

Introduction#

Video Storage Toolkit (VST) also referred as VMS (Video Management System) manages audio and video streams and provides on demand access to offline streams from storage. It accepts WebRTC stream from front end UI application and outputs RTSP streams for further processing. It is also responsible for streaming out Avatar animation stream to front end UI application. VST effectively manages stream quality which is suitable for the current available bandwidth. It provides low bandwidth events to front end UI using WebRTC data channel. VST also provides API to download video clips with audio of user session for offline analysis.

Input:
1. Webcam stream audio and/or video from browser client;
2. Avatar udp stream provided by OV-renderers
Output:
1. Avatar Webrtc output stream rendered on browser client;
2. RTSP stream to Chat Controller (audio) & DS (video)
3. Redis notifications to downstream services notify about new webcam-stream/session

Features#

Media streaming using hardware-accelerated WebRTC protocol for live and recorded videos.
RTSP streaming in pass-through and multi-cast mode.
RTP-UDP streaming.
Video storage and recording with aging policy.
Supports both H.264 and H.265 video formats.
Manage devices manually by IP address and/or RTSP URLs.
Provide REST APIs to write client application to control and configure VST.
Support Redis message bus to publish device add/remove events.
Prometheus/Grafana integration for publishing VST statistics.
Cloud native. Deploy as a container.

VST Architecture#

UI interaction#

WebRTC & WebSocket workflow

VST Web Streamer Usage Guide#

The VST Web Streaming library encapsulates the complexities of WebRTC, WebRTC Signaling and VST APIs to provide easy to use single shot APIs to developers to perform WebRTC Streaming.

APIs#

Updating lib configuration#

streamManger.updateConfig ({
inboundStreamVideoElementId: string,
outboundStreamVideoElementId?: string,
connectionId?: string,
queryParams?: string,
enableWebsocketPing?: boolean,
websocketPingInterval?: number,
vstWebsocketEndpoint: string,
enableLogs?: boolean,
enableMicrophone?: boolean,
enableCamera?: boolean,
websocketTimeoutMS?: number,
streamType: StreamType,
enableDummyUDPCall: boolean,
sendCustomWebsocketMessage?: (msg: string) => boolean,
firstFrameReceivedCallback?: () => void,
errorCallback?: () => void,
successCallback?: () => void,
closeCallback?: () => void,
});

VST Configuration :Widths: 25, 25, 25, 25 :header-rows: 0#
Key Name	Description	Optional/Mandatory	Default Value
inboundStreamVideoElementId	Video Element ID where inbound stream will be displayed	Mandatory	undefined
outboundStreamVideoElementId	Video Element ID where webcam stream will be displayed	Optional	undefined
connectionId	The connection ID for the WebSocket connection	Optional	random UUID
queryParams	Query parameters to pass with WebSocket Connection	Optional	empty string
enableWebsocketPing	Enable WebSocket Ping so that connection is not dropped due to inactivity	Optional	true
websocketPingInterval	Change WebSocket Ping message Interval	Optional	2000
vstWebsocketEndpoint	VST WebSocket Endpoint. This typically ends with /vms/ws	Mandatory	undefined
enableLogs	Enable the library logs for debug purpose	Optional	true
enableMicrophone	Enable Microphone for outbound stream	Optional	true
enableCamera	Enable Camera for outbound stream	Optional	true
websocketTimeoutMS	Websocket Timeout	Mandatory	5000
streamType	Supported VST Stream types are ‘live’, ‘replay’ and ‘streambridge’	Mandatory	streambridge
enableDummyUDPCall	Debug purpose only. If enabled a dummy UDP stream will be started by VST for inbound stream	Mandatory	false
sendCustomWebsocketMessage	Send any custom message to VST	Optional	() => {}
firstFrameReceivedCallback	Callback will be triggered when first frame is received by the Video Element	Optional	() => {}
errorCallback	Callback will be triggered if any errors occurs during streaming or during connection establishment	Optional	() => {}
successCallback	Callback will be triggered if streaming is successful	Optional	() => {}
closeCallback	Callback will be triggered when connection is closed inside the lib and cleanup is done	Optional	() => {}

Starting the stream with stream configuration#

streamManger.startStreaming({
streamId?: string,
startTime?: string,
endTime?: string,
options: {
rtptransport: string,
timeout: number,
quality: string,
overlay?: {
objectId?: number[],
color?: string,
thickness?: number,
debug?: boolean,
needBbox?: boolean,
needTripwire?: boolean,
needRoi?: boolean,
}
}
});

VST Configuration#
Key Name	Description	Optional/Mandatory	Default Value
streamId	The stream ID of the sensor stream that will be streamed in the video player	Optional	undefined
startTime	The start time in UTC format. This is required for replay streams	Optional	undefined
endTime	The end time in UTC format. This is required for replay streams	Optional	undefined
options	The options for stream. Options include features like enabling overlay, bounding boxes, quality of stream etc.,	Optional	undefined
rtptransport	WebRTC Specific settings. Keep default value	Mandatory	udp
timeout	WebRTC Specific settings. Keep default value	Mandatory	60
quality	WebRTC Specific settings. The streaming quality can be controlled by using this field. The allowed values are ‘auto’, ‘low’, ‘medium’, ‘high’ and ‘pass-through’	Mandator	auto
overlay	The overlay related options like color of overlay, thickness etc.	Optional	undefined
objectId	The filter for object IDs for overlay. Array of objectIds is allowed as filter	Optional	undefined
color	The color of overlay	Optional	undefined
thickness	The thickness of overlay or bounding boxes	Optional	undefined
debug	Enable debug overlay	Optional	undefined
needBbox	Enable bounding boxes	Optional	undefined
needTripwire	Enable tripwire overlay	Optional	undefined
needRoi	Enable ROI overlay	Optional	undefined

Stopping the stream#

streamManger.stopStreaming();

Getting the peer connection objects#

const inboundObject = streamManger.getInboundPeerConnectionObject();
const outboundObject = streamManger.getOutboundPeerConnectionObject();

Getting Peer IDs#

const inboundPeerId = streamManger.getInboundStreamPeerId();
const outboundPeerId = streamManger.getOutboundStreamPeerId();

Send custom WebSocket message#

const message = "Hello";
const isSuccess = streamManger.sendCustomWebsocketMessage(message);
if (isSucess) console.log("Success");

Get Stream Configuration#

const streamConfiguration = streamManger.getStreamConfig();

Get Lib configuration#

const libConfiguration = streamManger.getConfig();

Supported use-cases#

Tokkio Streaming#

Inbound Stream

Tokkio inbound stream use-case. The library supports inbound video streaming for Tokkio use-case. This is typically used to stream digital human avatars.

Outbound Stream

Tokkio outbound stream use-case. The lib supports outbound video streaming. Either the webcam or the microphone or both can be recorded and sent to VST via outbound stream. In Tokkio use-case the outbound stream works in conjunction with inbound stream.

Examples#

The following sections show minimal working examples.

Tokkio Streaming#

Inbound Stream Only

const streamManger = new StreamManager();
streamManger.updateConfig({
inboundStreamVideoElementId: 'unqiue-video-element-id',
connectionId: 'unique-uuid',
enableWebsocketPing: true,
vstWebsocketEndpoint: 'ws://10.41.25.11:30000/vms/ws',
enableLogs: false,
errorCallback: () => {console.log('Error Callback')},
successCallback: () => {console.log('Success Callback')},
});
streamManger.startStreaming();

Outbound & Inbound Stream

const streamManger = new StreamManager();
streamManger.updateConfig({
inboundStreamVideoElementId: 'unqiue-video-element-id',
connectionId: 'unique-uuid',
enableWebsocketPing: true,
vstWebsocketEndpoint: 'ws://10.41.25.11:30000/vms/ws',
enableLogs: false,
enableMicrophone: true,
enableCamera: true,
errorCallback: () => {console.log('Error Callback')},
successCallback: () => {console.log('Success Callback')},
});
streamManger.startStreaming();

VST Configuration#

The following VST configuration options can be modified in tokkio-app-params.yaml file. At least one valid turnserver (coturn/twilio) or valid reverse-proxy server should be provided for webcam and avatar streaming to work over webRTC.

VST Configuration#

stunurl_list

set list of stun URLs, It discovers their public IP and type of NAT

[“stun.l.google.com:19302”,”stun1.l.google.com:19302”]

static_turnurl_list

list of TURN servers with long term credential mechanism. TURN provides a relay mechanism for communication when direct peer-to-peer communication is not possible due to NAT traversal issues.

[“admin:admin@10.0.0.1:3478”, “admin:admin@10.0.0.1:3478”]

use_coturn_auth_secret

enable authentication secret mechanism for coturn

false

coturn_turnurl_list_with_secret

list of co-turn servers with short term credential mechanism. To use this config, use_coturn_auth_secret should be set to true

[“10.0.0.1:3478”:<secret_key>, “10.0.0.1:3478”:<secret_key>]

use_twilio_stun_turn

enable to use twilio stun and turn server

false

twilio_account_sid

twilio account username. The config option use_twilio_stun_turn should be set to true to enable use of twilio server

“”

twilio_auth_token

authentication token of twilio account

“”

use_reverse_proxy

use reverse proxy instead of turnserver (coturn/twilio). RP is public facing service that directly receives and handles client traffic, performing the appropriate routing so that the traffic arrives at tokkio cluster VPC.

false

reverse_proxy_server_address

if use_reverse_proxy is set to true, then set reverse_proxy server address & port. Also env variable REVERSE_PROXY_SERVER_ADDRESS can be used to set the ip_address.

10.0.0.1:100

max_webrtc_out_connections

set maximum count of webrtc out connections i.e avatar stream

8

max_webrtc_in_connections

set maximum count of webrtc in connections i.e webcam stream

3

grpc_server_port

set GRPC server port. GRPC server used to port negotiate with ov-renderer to receive avatar udp stream

50051

webrtc_in_audio_sender_max_bitrate

set max bitrate in kbps to be used by web-UI for webcam stream

128000

webrtc_in_video_degradation_preference

set degradation preference to be used by webrtc sender for webcam stream. It controls the quality of media streams (either resolution or framerate) when network conditions degrade

“framerate”

total_video_storage_size_MB

set max video record size used to record webcam stream

10000

always_recording

set always recording on or off

false

gpu_indices

set GPU indices to use particular gpu device inside vst container, If there are multiple gpu devices visible inside container

[]

webrtc_port_range

set webrtc min and max port range. This should be in sync with nodePort range in the VST microservice helm-chart. VST uses nodeports for webrtc media traffic due to separate RP instance

min 30001, max 30030

use_software_path

enable or disable software path, if gpu is not available.

false

enable_websocket_pingpong

enable websocket periodic ping pong.This is to avoid websocket connection break due to proxy/LB or any firewall policies

false

websocket_keep_alive_ms

websocket periodic ping pong time in milliseconds

5000

WebRTC Introduction#

Introduction: WebRTC is an open-source technology that allows direct, real-time communication of audio, video, and data between web browsers and devices without the need for additional software or plugins. It enables users to make voice and video calls, share files, or engage in text chats directly through their web browsers, making online communication more seamless and accessible. It enables real time streaming of video to web-browsers like a live sports match or music concert.
WebRTC Peers: WebRTC peers are like two people trying to have a conversation over the internet. They could be your computer, phone, or any device that can use a web browser. These peers want to talk directly to each other, without needing a middleman to pass messages back and forth. This direct connection allows for faster communication, which is great for things like video calls or live video streaming.
ICE Servers: ICE servers are like helpful guides that assist WebRTC peers in finding each other on the internet. Imagine you’re trying to meet a friend in a big city, but you don’t know their exact location. ICE servers are like information booths that give you directions and help you navigate through the city’s complex network of streets (in this case, the internet) to find your friend.
ICE Candidates: ICE candidates are like different possible routes to reach your friend in the city. Each candidate represents a potential path for connecting two WebRTC peers. Some routes might be direct (like walking straight to your friend), while others might be more roundabout (like taking a bus or subway). The WebRTC system tries these different routes to see which one works best for connecting the peers.
WebRTC Signaling: Signaling in WebRTC is like a mutual friend helping two people meet up. Before two WebRTC peers can start their direct conversation, they need to exchange some basic information, like where they are and how they can be reached. The signaling process handles this initial exchange of information. It’s similar to how your mutual friend might tell you and your other friend where and when to meet but doesn’t join you for the conversation.
Peer Connection: A peer connection is like a direct, private telephone line between two devices on the internet. Imagine you and a friend have a special string telephone that works across any distance. Once set up, you can talk, share pictures, or even play games without anyone else in between. This direct link allows for faster communication and is particularly useful for applications like video calls.

In more technical terms, a peer connection enables direct communication between two devices without routing through a central server. This peer-to-peer approach makes the connection more efficient and reduces latency, which is crucial for real-time applications.

WebRTC Connection Workflow: Below diagram is depicting the technical WebRTC workflow in Tokkio when TURN server is used. For simplicity, the diagram only depicts the microphone stream Peer Connection part of Tokkio Workflow. Similar workflow is for avatar stream Peer Connection. You can consider Peer A as Tokkio UI and Peer B as VST in context of Tokkio.

GetUserMedia API: The GetUserMedia is an API built in almost all modern web browsers. The API enables the UI applications to get the WebCam & Microphone streams of the user. For example, whenever you are making a video call from a web browser (Zoom, Facetime or Teams) the UI application is accessing your webcam via the GetUserMedia API.
Using WebRTC with GetUserMedia API: WebRTC is a technology to send real time data from one place to another and GetUserMedia is an API to collect WebCam & Microphone streams from a Web Browser like Google Chrome. What can we do if we combine the WebRTC protocol with GetUserMedia API? We can collect the webcam & Microphone stream from one Web Browser using GetUserMedia API and then send it over to another Web Browser in some other part of the world via the WebRTC protocol. We can achieve all of this while maintaining low latency.

stunurl_list	set list of stun URLs, It discovers their public IP and type of NAT	[“stun.l.google.com:19302”,”stun1.l.google.com:19302”]
static_turnurl_list	list of TURN servers with long term credential mechanism. TURN provides a relay mechanism for communication when direct peer-to-peer communication is not possible due to NAT traversal issues.	[“admin:admin@10.0.0.1:3478”, “admin:admin@10.0.0.1:3478”]
use_coturn_auth_secret	enable authentication secret mechanism for coturn	false
coturn_turnurl_list_with_secret	list of co-turn servers with short term credential mechanism. To use this config, use_coturn_auth_secret should be set to true	[“10.0.0.1:3478”:<secret_key>, “10.0.0.1:3478”:<secret_key>]
use_twilio_stun_turn	enable to use twilio stun and turn server	false
twilio_account_sid	twilio account username. The config option use_twilio_stun_turn should be set to true to enable use of twilio server	“”
twilio_auth_token	authentication token of twilio account	“”
use_reverse_proxy	use reverse proxy instead of turnserver (coturn/twilio). RP is public facing service that directly receives and handles client traffic, performing the appropriate routing so that the traffic arrives at tokkio cluster VPC.	false
reverse_proxy_server_address	if use_reverse_proxy is set to true, then set reverse_proxy server address & port. Also env variable REVERSE_PROXY_SERVER_ADDRESS can be used to set the ip_address.	10.0.0.1:100
max_webrtc_out_connections	set maximum count of webrtc out connections i.e avatar stream	8
max_webrtc_in_connections	set maximum count of webrtc in connections i.e webcam stream	3
grpc_server_port	set GRPC server port. GRPC server used to port negotiate with ov-renderer to receive avatar udp stream	50051
webrtc_in_audio_sender_max_bitrate	set max bitrate in kbps to be used by web-UI for webcam stream	128000
webrtc_in_video_degradation_preference	set degradation preference to be used by webrtc sender for webcam stream. It controls the quality of media streams (either resolution or framerate) when network conditions degrade	“framerate”
total_video_storage_size_MB	set max video record size used to record webcam stream	10000
always_recording	set always recording on or off	false
gpu_indices	set GPU indices to use particular gpu device inside vst container, If there are multiple gpu devices visible inside container	[]
webrtc_port_range	set webrtc min and max port range. This should be in sync with nodePort range in the VST microservice helm-chart. VST uses nodeports for webrtc media traffic due to separate RP instance	min 30001, max 30030
use_software_path	enable or disable software path, if gpu is not available.	false
enable_websocket_pingpong	enable websocket periodic ping pong.This is to avoid websocket connection break due to proxy/LB or any firewall policies	false
websocket_keep_alive_ms	websocket periodic ping pong time in milliseconds	5000