Vision AI Overview#

The Tokkio Vision AI microservice is a robust video inference solution designed to extract facial bounding boxes and body poses from user video streams. It is implemented using the GXF and Deepstream frameworks.

  • The microservice usage is the following

    • User Connection A new user connects to the Tokkio User Interface (UI).

    • Stream Initialization The DS SDR sends an HTTP request to the Vision AI microservice, transmitting the user’s video stream as an RTSP stream.

    • Real-Time Processing The Vision AI microservice performs real-time computer vision analysis on each frame of the video stream, outputting detected metadata to the visionai Redis channel.

    • User Disconnection When the user closes the UI, the DS SDR sends a “remove stream” HTTP request to the Vision AI microservice.

    • Stream Termination The Vision AI microservice ceases processing the video stream.

The microservice automatically reconnects to the input video stream if data loss occurs for a specified duration, ensuring continuous operation. This architecture provides a seamless and efficient way to analyze video streams in real-time, enhancing user interaction through precise facial and body pose detection.

REST interface#

The REST API interface of the vision microservice is the following:

  1. Send a video stream to the vision microservice using RTSP

import requests
from datetime import datetime, timezone

# Get current date and time in UTC
current_datetime_utc = datetime.now(timezone.utc)

# Format the datetime object to the desired string format
formatted_datetime = current_datetime_utc.strftime("%Y-%m-%dT%H:%M:%SZ")

url = 'http://IP:8082/AddStream/stream'
camera_id = '123'

configData = {
"alert_type": "camera_status_change",
"created_at": formatted_datetime,
"event": {
    "camera_id": camera_id,
    "camera_name": "webcam_" + camera_id,
    "camera_url": "rtsp_url",
    "change": "camera_streaming"
},
"source": "vst"
}

response = requests.post(json=configData, url=url, timeout=1)
  1. Send a local video to the vision microservice

import requests
from datetime import datetime, timezone

# Get current date and time in UTC
current_datetime_utc = datetime.now(timezone.utc)

# Format the datetime object to the desired string format
formatted_datetime = current_datetime_utc.strftime("%Y-%m-%dT%H:%M:%SZ")

url = 'http://IP:8082/AddStream/stream'
camera_id = '123'

configData = {
"alert_type": "camera_status_change",
"created_at": formatted_datetime,
"event": {
    "camera_id": camera_id,
    "camera_name": "webcam_" + camera_id,
    "camera_url": "file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h265.mp4",
    "change": "camera_streaming"
},
"source": "vst"
}

response = requests.post(json=configData, url=url, timeout=1)
  1. Remove a stream

import requests
from datetime import datetime, timezone

# Get current date and time in UTC
current_datetime_utc = datetime.now(timezone.utc)

# Format the datetime object to the desired string format
formatted_datetime = current_datetime_utc.strftime("%Y-%m-%dT%H:%M:%SZ")

url = 'http://IP:8082/RemoveStream/stream'
camera_id = '123'

configData = {
"alert_type": "camera_status_change",
"created_at": formatted_datetime,
"event": {
    "camera_id": camera_id,
    "camera_name": "webcam_" + camera_id,
    "camera_url": "rtsp_url",
    "change": "camera_streaming"
},
"source": "vst"
}

response = requests.post(json=configData, url=url, timeout=1)

Output Schema#

The Schema of the redis event sent by the vision microservice in the visionai channel is the following: The sensorId is identical to the camera_id provided in the REST call

{
"version" : "4.0",
"id" : "frame_id",
"@timestamp" : "YYYY-MM-DDTHH:MM:SS.sssZ",
"sensorId" : "123",
"objects" : [
        "0|xmin|ymin|xmax|ymax|Face|
        #|pose2D|18|
        nose,x,y,conf|
        neck,-1,-1,-1|
        right-shoulder,x,y,conf|
        right-elbow,x,y,conf|
        right-wrist,x,y,conf|
        left-shoulder,x,y,conf|
        left-elbow,x,y,conf|
        left-wrist,x,y,conf|
        right-hip,x,y,conf|
        right-knee,x,y,conf|
        right-ankle,x,y,conf|
        left-hip,x,y,conf|
        left-knee,x,y,conf|
        left-ankle,x,y,conf|
        right-eye,x,y,conf|
        left-eye,x,y,conf|
        right-ear,x,y,conf|
        left-ear,x,y,conf|
        "]
}

About the model#

The video inference service utilizes the Movenet model, which is distributed under the Apache 2 license. To enhance performance, the model is converted to NVIDIA TensorRT , optimizing it for low latency and high throughput. This conversion ensures efficient processing, making it suitable for real-time applications.

UCF microservice#

Sample Params:
ds-visionai:
checkInterval: '1'
jitterbufferLatency: 2000
peerPadIdSameAsSourceId: 'true'
redisCfg:
    payloadkey: message
    topic: visionai
rtspReconnectInterval: 10
streammuxResolution:
    height: 720
    width: 1280
videoSink: none

Sample Connections:

connections:
  ds-visionai/redis: redis-timeseries/redis
  tokkio-ds-sdr/httpds: ds-visionai/http-api

All Parameters:

All Microservice Parameters#

Parameter

Description

dsServicePort: (integer )

DS service port. Default 8084

remoteAccess: (string )

Enable remote access or localhost only. Default ‘True’

addStreamApiPath: (string )

API path for service to add a stream. Default AddStream

removeStreamApiPath: (string )

API path for service to remove a stream. Default RemoveStream

getStatusApiPath: (string )

API path for service to get status. Default Status

apiResourceName: (string )

REST API resource name for stream. Default stream

recycleSourceId: (string )

Recycle and reuse DS internal source-id upon removing a stream. Default ‘False’ [Mandatory:False, Allowed Values:{ True, False }]

maxNumSources: (integer )

max number of sources allowed to add. Default 6

batchSize: (integer )

batch size for streammux and inference. Default 8

rtspReconnectInterval: (integer )

Timeout in seconds to wait since last data was received from an RTSP source before forcing a reconnection. 0=disable timeout. Default 10

rtpProtocol: (integer )

RTP protocol to use. 0 for TCP/UDP; 4 for TCP Only. Default 0 [Mandatory:False, Allowed Values:{ 0, 4 }]

peerPadIdSameAsSourceId: (string )

Use id comes from source stream in streammux stream index. Default ‘True’ [Mandatory:False, Allowed Values:{ true, false }]

jitterbufferLatency: (integer )

Pipeline source component Jitterbuffer size in milliseconds. Default 100 [Mandatory:False]

fileLoop: (string )

loop a file input. Default ‘True’ [Mandatory:False, Allowed Values:{ true, false }]

msgConvPayloadType: (integer )

DS msgbroker payload schema. 0 DS; 1 DS minimal; 256 Reserved; 257 Custom. Default 1 [Mandatory:False, Allowed Values:{ 0, 1, 256, 257 }]

redisCfg: (object )

Redis broker config

payloadkey: (string )

Payload key for messages. Default metadata

topic: (string )

topic or stream name for redis messages. Default ‘test’

enableLatency: (string )

Whether to enable per frame latency measure. Default ‘false’ [Mandatory:False, Allowed Values:{ true, false }]

enableCompLatency: (string )

Whether to enable per component latency; only used when enableLatency is ‘True’. Default ‘False’ [Mandatory:False, Allowed Values:{ true, false }]

videoSink: (string )

type of video sink. Default ‘none’ [Mandatory:False, Allowed Values:{ rtsp, fake, file, none }]

sinkSync: (string )

pipeline sink component sync on the clock. Default ‘false’ [Mandatory:False, Allowed Values:{ true, false }]

streammuxResolution: (object )

DS input video resolution config

height: (integer )

expected video frame height. Default 1080

width: (integer )

expected video frame width. Default 1920

checkInterval: (string )

init container check interval in seconds. Default ‘1’