Vision AI Overview#

The Tokkio Vision AI microservice is a robust video inference solution designed to extract facial bounding boxes and body poses from user video streams. It is implemented using the GXF and Deepstream frameworks.

  • The microservice usage is the following

    • User Connection A new user connects to the Tokkio User Interface (UI).

    • Stream Initialization The DS SDR sends an HTTP request to the Vision AI microservice, transmitting the user’s video stream as an RTSP stream.

    • Real-Time Processing The Vision AI microservice performs real-time computer vision analysis on each frame of the video stream, outputting detected metadata to the visionai Redis channel.

    • User Disconnection When the user closes the UI, the DS SDR sends a “remove stream” HTTP request to the Vision AI microservice.

    • Stream Termination The Vision AI microservice ceases processing the video stream.

The microservice automatically reconnects to the input video stream if data loss occurs for a specified duration, ensuring continuous operation. This architecture provides a seamless and efficient way to analyze video streams in real-time, enhancing user interaction through precise facial and body pose detection.

REST interface#

The REST API interface of the vision microservice can be used to send and remove live video streams. The endpoints are called by the SDR in the Tokkio workflow. We have also provided the endpoints bellow for reference:

  1. Send a video stream to the vision microservice using RTSP

import requests
from datetime import datetime, timezone

# Get current date and time in UTC
current_datetime_utc = datetime.now(timezone.utc)

# Format the datetime object to the desired string format
formatted_datetime = current_datetime_utc.strftime("%Y-%m-%dT%H:%M:%SZ")

url = 'http://IP:8082/AddStream/stream'
camera_id = '123'

configData = {
"alert_type": "camera_status_change",
"created_at": formatted_datetime,
"event": {
        "camera_id": camera_id,
        "camera_name": "webcam_" + camera_id,
        "camera_url": "rtsp_url",
        "change": "camera_streaming"
    },
"source": "vst"
}

response = requests.post(json=configData, url=url, timeout=1)
  1. Send a local video to the vision microservice

import requests
from datetime import datetime, timezone

# Get current date and time in UTC
current_datetime_utc = datetime.now(timezone.utc)

# Format the datetime object to the desired string format
formatted_datetime = current_datetime_utc.strftime("%Y-%m-%dT%H:%M:%SZ")

url = 'http://IP:8082/AddStream/stream'
camera_id = '123'

configData = {
"alert_type": "camera_status_change",
"created_at": formatted_datetime,
"event": {
        "camera_id": camera_id,
        "camera_name": "webcam_" + camera_id,
        "camera_url": "file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h265.mp4",
        "change": "camera_streaming"
    },
"source": "vst"
}

response = requests.post(json=configData, url=url, timeout=1)
  1. Remove a stream

import requests
from datetime import datetime, timezone

# Get current date and time in UTC
current_datetime_utc = datetime.now(timezone.utc)

# Format the datetime object to the desired string format
formatted_datetime = current_datetime_utc.strftime("%Y-%m-%dT%H:%M:%SZ")

url = 'http://IP:8082/RemoveStream/stream'
camera_id = '123'

configData = {
"alert_type": "camera_status_change",
"created_at": formatted_datetime,
"event": {
        "camera_id": camera_id,
        "camera_name": "webcam_" + camera_id,
        "camera_url": "rtsp_url",
        "change": "camera_streaming"
},
"source": "vst"
}

response = requests.post(json=configData, url=url, timeout=1)

Output Schema#

The Schema of the Redis event sent by the vision microservice in the visionai channel is the following: The sensorId is identical to the camera_id provided in the REST call.

{
"version" : "4.0",
"id" : "frame_id",
"@timestamp" : "YYYY-MM-DDTHH:MM:SS.sssZ",
"sensorId" : "123",
"objects" : [
    "0|xmin|ymin|xmax|ymax|Face|
    #|pose2D|18|
    nose,x,y,conf|
    neck,-1,-1,-1|
    right-shoulder,x,y,conf|
    right-elbow,x,y,conf|
    right-wrist,x,y,conf|
    left-shoulder,x,y,conf|
    left-elbow,x,y,conf|
    left-wrist,x,y,conf|
    right-hip,x,y,conf|
    right-knee,x,y,conf|
    right-ankle,x,y,conf|
    left-hip,x,y,conf|
    left-knee,x,y,conf|
    left-ankle,x,y,conf|
    right-eye,x,y,conf|
    left-eye,x,y,conf|
    right-ear,x,y,conf|
    left-ear,x,y,conf|
    "]
}

About the model#

The video inference service utilizes the Movenet model, which is distributed under the Apache 2 license. To enhance performance, the model is converted to NVIDIA TensorRT , optimizing it for low latency and high throughput. This conversion ensures efficient processing, making it suitable for real-time applications.

UCF microservice#

Sample Params:
ds-visionai:
checkInterval: '1'
jitterbufferLatency: 2000
peerPadIdSameAsSourceId: 'true'
redisCfg:
    payloadkey: message
    topic: visionai
rtspReconnectInterval: 10
streammuxResolution:
    height: 720
    width: 1280
videoSink: none

Sample Connections:

connections:
  ds-visionai/redis: redis-timeseries/redis
  tokkio-ds-sdr/httpds: ds-visionai/http-api

All Parameters:

All Microservice Parameters#

Parameter

Description

dsServicePort: (integer )

DS service port. Default 8084

remoteAccess: (string )

Enable remote access or localhost only. Default ‘True’

addStreamApiPath: (string )

API path for service to add a stream. Default AddStream

removeStreamApiPath: (string )

API path for service to remove a stream. Default RemoveStream

getStatusApiPath: (string )

API path for service to get status. Default Status

apiResourceName: (string )

REST API resource name for stream. Default stream

recycleSourceId: (string )

Recycle and reuse DS internal source-id upon removing a stream. Default ‘False’ [Mandatory:False, Allowed Values:{ True, False }]

maxNumSources: (integer )

max number of sources allowed to add. Default 6

batchSize: (integer )

batch size for streammux and inference. Default 8

rtspReconnectInterval: (integer )

Timeout in seconds to wait since last data was received from an RTSP source before forcing a reconnection. 0=disable timeout. Default 10

rtpProtocol: (integer )

RTP protocol to use. 0 for TCP/UDP; 4 for TCP Only. Default 0 [Mandatory:False, Allowed Values:{ 0, 4 }]

peerPadIdSameAsSourceId: (string )

Use id comes from source stream in streammux stream index. Default ‘True’ [Mandatory:False, Allowed Values:{ true, false }]

jitterbufferLatency: (integer )

Pipeline source component Jitterbuffer size in milliseconds. Default 100 [Mandatory:False]

fileLoop: (string )

loop a file input. Default ‘True’ [Mandatory:False, Allowed Values:{ true, false }]

msgConvPayloadType: (integer )

DS msgbroker payload schema. 0 DS; 1 DS minimal; 256 Reserved; 257 Custom. Default 1 [Mandatory:False, Allowed Values:{ 0, 1, 256, 257 }]

redisCfg: (object )

Redis broker config

payloadkey: (string )

Payload key for messages. Default metadata

topic: (string )

topic or stream name for redis messages. Default ‘test’

enableLatency: (string )

Whether to enable per frame latency measure. Default ‘false’ [Mandatory:False, Allowed Values:{ true, false }]

enableCompLatency: (string )

Whether to enable per component latency; only used when enableLatency is ‘True’. Default ‘False’ [Mandatory:False, Allowed Values:{ true, false }]

videoSink: (string )

type of video sink. Default ‘none’ [Mandatory:False, Allowed Values:{ rtsp, fake, file, none }]

sinkSync: (string )

pipeline sink component sync on the clock. Default ‘false’ [Mandatory:False, Allowed Values:{ true, false }]

streammuxResolution: (object )

DS input video resolution config

height: (integer )

expected video frame height. Default 1080

width: (integer )

expected video frame width. Default 1920

checkInterval: (string )

init container check interval in seconds. Default ‘1’