Vision AI Overview#
The Tokkio Vision AI microservice is a robust video inference solution designed to extract facial bounding boxes and body poses from user video streams. It is implemented using the GXF and Deepstream frameworks.
The microservice usage is the following
User Connection A new user connects to the Tokkio User Interface (UI).
Stream Initialization The DS SDR sends an HTTP request to the Vision AI microservice, transmitting the user’s video stream as an RTSP stream.
Real-Time Processing The Vision AI microservice performs real-time computer vision analysis on each frame of the video stream, outputting detected metadata to the visionai Redis channel.
User Disconnection When the user closes the UI, the DS SDR sends a “remove stream” HTTP request to the Vision AI microservice.
Stream Termination The Vision AI microservice ceases processing the video stream.
The microservice automatically reconnects to the input video stream if data loss occurs for a specified duration, ensuring continuous operation. This architecture provides a seamless and efficient way to analyze video streams in real-time, enhancing user interaction through precise facial and body pose detection.
REST interface#
The REST API interface of the vision microservice is the following:
Send a video stream to the vision microservice using RTSP
import requests from datetime import datetime, timezone # Get current date and time in UTC current_datetime_utc = datetime.now(timezone.utc) # Format the datetime object to the desired string format formatted_datetime = current_datetime_utc.strftime("%Y-%m-%dT%H:%M:%SZ") url = 'http://IP:8082/AddStream/stream' camera_id = '123' configData = { "alert_type": "camera_status_change", "created_at": formatted_datetime, "event": { "camera_id": camera_id, "camera_name": "webcam_" + camera_id, "camera_url": "rtsp_url", "change": "camera_streaming" }, "source": "vst" } response = requests.post(json=configData, url=url, timeout=1)
Send a local video to the vision microservice
import requests from datetime import datetime, timezone # Get current date and time in UTC current_datetime_utc = datetime.now(timezone.utc) # Format the datetime object to the desired string format formatted_datetime = current_datetime_utc.strftime("%Y-%m-%dT%H:%M:%SZ") url = 'http://IP:8082/AddStream/stream' camera_id = '123' configData = { "alert_type": "camera_status_change", "created_at": formatted_datetime, "event": { "camera_id": camera_id, "camera_name": "webcam_" + camera_id, "camera_url": "file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h265.mp4", "change": "camera_streaming" }, "source": "vst" } response = requests.post(json=configData, url=url, timeout=1)
Remove a stream
import requests from datetime import datetime, timezone # Get current date and time in UTC current_datetime_utc = datetime.now(timezone.utc) # Format the datetime object to the desired string format formatted_datetime = current_datetime_utc.strftime("%Y-%m-%dT%H:%M:%SZ") url = 'http://IP:8082/RemoveStream/stream' camera_id = '123' configData = { "alert_type": "camera_status_change", "created_at": formatted_datetime, "event": { "camera_id": camera_id, "camera_name": "webcam_" + camera_id, "camera_url": "rtsp_url", "change": "camera_streaming" }, "source": "vst" } response = requests.post(json=configData, url=url, timeout=1)
Output Schema#
The Schema of the redis event sent by the vision microservice in the visionai channel is the following: The sensorId is identical to the camera_id provided in the REST call
{ "version" : "4.0", "id" : "frame_id", "@timestamp" : "YYYY-MM-DDTHH:MM:SS.sssZ", "sensorId" : "123", "objects" : [ "0|xmin|ymin|xmax|ymax|Face| #|pose2D|18| nose,x,y,conf| neck,-1,-1,-1| right-shoulder,x,y,conf| right-elbow,x,y,conf| right-wrist,x,y,conf| left-shoulder,x,y,conf| left-elbow,x,y,conf| left-wrist,x,y,conf| right-hip,x,y,conf| right-knee,x,y,conf| right-ankle,x,y,conf| left-hip,x,y,conf| left-knee,x,y,conf| left-ankle,x,y,conf| right-eye,x,y,conf| left-eye,x,y,conf| right-ear,x,y,conf| left-ear,x,y,conf| "] }
About the model#
The video inference service utilizes the Movenet model, which is distributed under the Apache 2 license. To enhance performance, the model is converted to NVIDIA TensorRT , optimizing it for low latency and high throughput. This conversion ensures efficient processing, making it suitable for real-time applications.
UCF microservice#
- Sample Params:
ds-visionai: checkInterval: '1' jitterbufferLatency: 2000 peerPadIdSameAsSourceId: 'true' redisCfg: payloadkey: message topic: visionai rtspReconnectInterval: 10 streammuxResolution: height: 720 width: 1280 videoSink: none
Sample Connections:
connections:
ds-visionai/redis: redis-timeseries/redis
tokkio-ds-sdr/httpds: ds-visionai/http-api
All Parameters:
Parameter |
Description |
---|---|
dsServicePort: (integer ) |
DS service port. Default 8084 |
remoteAccess: (string ) |
Enable remote access or localhost only. Default ‘True’ |
addStreamApiPath: (string ) |
API path for service to add a stream. Default AddStream |
removeStreamApiPath: (string ) |
API path for service to remove a stream. Default RemoveStream |
getStatusApiPath: (string ) |
API path for service to get status. Default Status |
apiResourceName: (string ) |
REST API resource name for stream. Default stream |
recycleSourceId: (string ) |
Recycle and reuse DS internal source-id upon removing a stream. Default ‘False’ [Mandatory:False, Allowed Values:{ True, False }] |
maxNumSources: (integer ) |
max number of sources allowed to add. Default 6 |
batchSize: (integer ) |
batch size for streammux and inference. Default 8 |
rtspReconnectInterval: (integer ) |
Timeout in seconds to wait since last data was received from an RTSP source before forcing a reconnection. 0=disable timeout. Default 10 |
rtpProtocol: (integer ) |
RTP protocol to use. 0 for TCP/UDP; 4 for TCP Only. Default 0 [Mandatory:False, Allowed Values:{ 0, 4 }] |
peerPadIdSameAsSourceId: (string ) |
Use id comes from source stream in streammux stream index. Default ‘True’ [Mandatory:False, Allowed Values:{ true, false }] |
jitterbufferLatency: (integer ) |
Pipeline source component Jitterbuffer size in milliseconds. Default 100 [Mandatory:False] |
fileLoop: (string ) |
loop a file input. Default ‘True’ [Mandatory:False, Allowed Values:{ true, false }] |
msgConvPayloadType: (integer ) |
DS msgbroker payload schema. 0 DS; 1 DS minimal; 256 Reserved; 257 Custom. Default 1 [Mandatory:False, Allowed Values:{ 0, 1, 256, 257 }] |
redisCfg: (object ) |
Redis broker config |
|
Payload key for messages. Default metadata |
|
topic or stream name for redis messages. Default ‘test’ |
enableLatency: (string ) |
Whether to enable per frame latency measure. Default ‘false’ [Mandatory:False, Allowed Values:{ true, false }] |
enableCompLatency: (string ) |
Whether to enable per component latency; only used when enableLatency is ‘True’. Default ‘False’ [Mandatory:False, Allowed Values:{ true, false }] |
videoSink: (string ) |
type of video sink. Default ‘none’ [Mandatory:False, Allowed Values:{ rtsp, fake, file, none }] |
sinkSync: (string ) |
pipeline sink component sync on the clock. Default ‘false’ [Mandatory:False, Allowed Values:{ true, false }] |
streammuxResolution: (object ) |
DS input video resolution config |
|
expected video frame height. Default 1080 |
|
expected video frame width. Default 1920 |
checkInterval: (string ) |
init container check interval in seconds. Default ‘1’ |