DeepStream Perception#

Overview#

Jetson Platform Services offers DeepStream 7.1 based perception service supporting optimized, real-time, and multi-stream out of the box object detection and tracking support. In combination with the SDR and VST microservices, this also supports automatic addition of camera streams. The processing includes object detection using the PeopleNet model from NVIDIA by default or the optional use of YOLOv8s as described below, followed by object tracking using the NvDCF tracker plugin available in DeepStream. The output of the application is metadata based on the Metropolis schema sent to the Redis message bus using the msgbroker plugin.

The modified Jetson Platform Services DeepStream 7.1 container has the following changes in comparison to the standard samples container:

Precompiled version of Service Maker deepstream-test5 app
PeopleNet2.6 model and calibration files as well as many precompiled engine files for various Jetson platforms and batch sizes
Precompiled tracker engine files
Precompiled YOLOv8s plugin and engine files

Jetson Platform Services uses the sample deepstream-test5 app based on the new Service Maker framework supported as part of DeepStream 7.1 to programmatically assemble DeepStream pipelines, source code for which is available as part of the native DeepStream install and as part of the DeepStream Docker container. Options passed into this app are as follows:

-s option specifies a source list config file. If using a config with nvmultiurisrcbin as the reference configs provided with AI-NVR do, the add/remove API within DeepStream is enabled.
-c option specifies a pipeline config file containing the various components and connections within your programs pipeline.
-l option specifies the label file for your model.

--perf-measurement-interval-sec option specifies how frequently in seconds to print fps info.

#example for running on Orin AGX
$ /opt/nvidia/deepstream/deepstream-7.1/service-maker/sources/apps/cpp/deepstream_test5_app/build/deepstream-test5-app -s /ds-config-files/pn26/service-maker/source-list-0_agx.yaml,/ds-config-files/pn26/service-maker/source-list-1_agx.yaml -c /ds-config-files/pn26/service-maker/ds-config-0_agx.yaml,/ds-config-files/pn26/service-maker/ds-config-1_agx.yaml -l /ds-config-files/pn26/labels.txt --perf-measurement-interval-sec 5

In order to run multiple pipelines, you must specify multiple source list and config files separated by commas as seen in the above example.

On Orin AGX and Orin NX16, these pipelines use DLA 0 & 1 through their gie config files (config/deepstream/pn26/config_infer_primary_*):

enable-dla=1 # 0 => disable, 1 => enable
use-dla-core=0 # dla core number => 0, 1, etc.

On Orin NX8, only a single DLA is available and thus only one is used, while on Orin Nano GPU is utilized.

Note that user can alternatively incorporate the legacy test5 application (included by default in the DeepStream 7.1 container) as part of their Jetson Platform Services based systems provided the integration points with rest of the Jetson Platform Services architecture in terms of stream addition API and metadata output in the Metropolis schema are preserved.

DeepStream config files are in the config/deepstream/pn26 folder of the Docker Compose repo. Each config is marked with the device type at the end of the filename - agx, nx16, nx8, or nano. This folder is mounted in the DeepStream Docker container. Depending on which compose file run (compose_agx.yaml for Orin AGX, compose_nx16.yaml for Orin NX16, compose_nx8.yaml for Orin NX8, compose_nano.yaml for Orin Nano), the corresponding DeepStream config files are passed into the DeepStream application. Configs for both the Service Maker and legacy versions of the DeepStream test5 app are provided for all supported systems.

An additional container called SDR is used to automatically add and remove streams from the DeepStream application based on events that occur in VST. An overview of how DeepStream and VST interface with other components can be seen below:

GitHub Repo#

As a reference, the Dockerfile for the JPS DeepStream container and initialization script for YOLOv8s is open sourced and available on GitHub on the jetson-platform-services repository under the inference/perception folder.

Configuration#

No changes should need to be made to the DeepStream config files. Up to 8 streams per DLA are supported (16 total) on Orin AGX, 4 streams per DLA are supported (8 total) on Orin NX16, 4 streams total on Orin NX8, and 4 streams total on Orin Nano. Notable points about the configuration include:

Use of PeopleNet 2.6 (PN2.6) unpruned model for superior accuracy or YOLOv8s for multi class support
use of DLA for PN2.6 to offload inference from GPU on Orin AGX, Orin NX16, and Orin NX8
Inference done every alternate frame based on the interval parameter within the primary-gie section
Use of NvDCF multi-object tracker that supports running on PVA backend for PN2.6 (on Orin AGX, NX16, NX8) for further reducing the load on GPU
Use of Redis message broker for metadata output
Use of dynamic stream addition to add streams using the SDR microservices (see below)

To increase or reduce the number of streams per pipeline from the default values, the WDM_WL_THRESHOLD value under the compose yaml file will need to be changed. Similarly, the max_devices_supported value in the VST config (config/vst/vst_config.json) may also need to be edited to support the necessary number of streams. For best performance, batch size options set in the DeepStream configs should be modified, as well as generating and using engine files specific to the new batch size and available hardware (ie., DLAs).

SDR (Sensor Distribution and Routing)#

Stream addition to DeepStream is based on a dedicated microservice named SDR to discover streams from VST and add them to DeepStream on the fly. SDR is used to automatically add and remove streams from DeepStream using the dynamic stream addition feature in DeepStream, triggered based on events from VST. When a stream is added to VST and is able to stream, an event is sent by VST to Redis with the following format:

{
    "alert_type": "camera_status_change",
    "created_at": "2023-11-20T23:51:04Z",
    "event": {
        "camera_id": "MOJD7",
        "camera_name": "36398191-d48a-4260-b574-239fd59f156f",
        "camera_url": "rtsp://172.17.170.143/live/36398191-d48a-4260-b574-239fd59f156f",
        "change": "camera_streaming"
    },
    "source": "vst"
}

SDR picks this up, then determines which pipeline in DeepStream currently has availability and adds it to that.

Sometimes when SDR starts up, VST may already have streams added and will not resend a message on Redis. This can occur for instance if SDR was restarted. To solve this, on startup SDR queries the VST live streams endpoint to fetch streams that have already been added. It then adds these streams as well to DeepStream.

SDR keeps track of which pipeline each stream was added to. This is so that when a stream is deleted, it knows which pipeline to send the delete request to. Each DeepStream pipeline has a different port that needs to be specified when using the add or remove API call.

To add a stream to DeepStream, a POST request must be made to the http://localhost:9010/api/v1/stream/add endpoint. Note that localhost may need to be changed to the proper IP and the port from 9010 to something else - by default port 9010 is used for pipeline 1 and port 9011 for pipeline 2, this is specified in the DeepStream config files. The following JSON must also be included in the request:

{
    "alert_type": "camera_status_change",
    "created_at": "2024-05-01T07:49:16Z",
    "event": {
        "camera_id": "Amcrest_3",
        "camera_name": "0227ed9f-de24-495e-b46f-b9d2b8122dd5",
        "camera_url": "rtsp://172.17.170.143/live/0227ed9f-de24-495e-b46f-b9d2b8122dd5",
        "change": "camera_streaming"
    },
    "source": "vst"
}

To remove a stream from DeepStream, a POST request must be made to the http://localhost:9010/api/v1/stream/remove endpoint. Again, localhost may need to be changed to the proper IP and specify the correct port. The following JSON must also be included in the request:

{
    "alert_type": "camera_status_change",
    "created_at": "2024-05-01T07:49:16Z",
    "event": {
        "camera_id": "Amcrest_3",
        "camera_name": "0227ed9f-de24-495e-b46f-b9d2b8122dd5",
        "camera_url": "rtsp://172.17.170.143/live/0227ed9f-de24-495e-b46f-b9d2b8122dd5",
        "change": "camera_remove"
    },
    "source": "vst"
}

Note that only the camera_id, camera_name, camera_url, and change values are necessary, everything else is optional. Also note that these two API endpoints are interfacing directly with DeepStream and thus if used manually, will skip over SDR and may cause issues. Internally, SDR uses these endpoints and JSON formats to interface with DeepStream. These add and remove endpoints are provided in the default DeepStream test5 application when using [source-list] in the legacy config file or using nvmultiurisrcbin in the Service Maker config.

SDR also supports reconciliation in the event some streams are missed from VST. The SDR sidecar container occasionally queries both the SDR cache and current streams in VST to make sure they are in sync and calls the add/remove function as necessary in SDR in the event they are out of sync.

There are various SDR configuration options that can be set via container environment variables and can be seen used in the Docker Compose yaml files. These options are specified below. Note that the sample docker compose files do override some of the default values specified here.

Variable	Description	Default Value
PORT	Port for the SDR REST API server	4000
WDM_WL_SPEC	Local cache file SDR uses in case of a crash/restart	“./tests/data_wl.yaml”
WDM_CLUSTER_CONFIG_FILE	IP/port values for each DeepStream pipeline	“docker_cluster_config.json”
WDM_MSG_KEY	Redis key SDR should listen on	“vst.event”
WDM_WL_REDIS_MSG_FIELD	key within the Redis JSON to parse	“sensor.id”
WDM_WL_ADD_URL	Service add endpoint	“/api/v1/stream/add”
WDM_WL_DELETE_URL	Service remove endpoint	“/api/v1/stream/remove”
WDM_WL_HEALTH_CHECK_URL	Service health check endpoint	“/api/v1/stream/add”
WDM_WL_CHANGE_ID_ADD	“change” type to listen for on Redis to know when to add a stream to DeepStream	“camera_streaming”
WDM_PRELOAD_WORKLOAD	Config file with streams which should be auto loaded on startup other than those already available on VST	“./event_pre-roll.json”
WDM_CLEAR_DATA_WL	True if the local cache should be cleared at startup	False
WDM_KFK_ENABLE	True if Kafka should be enabled	True
WDM_DS_SWAP_ID_NAME	True if ID and Name should be swapped from the VST Redis message before adding to DeepStream	False
WDM_VALIDATE_BEFORE_ADD	True if the add JSON should be validated before being sent to DeepStream (ie., make sure name, ID, and URL are set)	False
WDM_PRELOAD_DELAY_FOR_DS_API	True if SDR should wait for the service health check to pass before starting to add streams	False
WDM_WL_THRESHOLD	Maximum number of streams SDR should add per pipeline	8
WDM_CLUSTER_TYPE	Cluster type being used	“docker”
WDM_POD_WATCH_DOCKER_DELAY	How frequently in seconds SDR should check for container restarts and update stream distribution	0.05
WDM_DS_STATUS_CHECK	True if DeepStream return status code should be read rather than ignored	False
WDM_RESTART_DS_ON_ADD_FAIL	True if the DeepStream should be restarted on multiple failed add/remove events	False
WDM_DISABLE_WERKZEUG_LOGGING	True if Werkzeug logging should be disabled to reduce overall logging	False
WDM_WL_OBJECT_NAME	The workload object name for SDR events sent to Redis and must be unique across instances of SDR running on the same system	“testapp”
WDM_CONSUMER_GRP_ID	The Redis consumer group ID and must be unique across instances of SDR running on the same system	“consumer-grp-id-3”
WDM_CLUSTER_CONTAINER_NAMES	The container names to watch for restarts/crashes for the SDR recovery mechanism	“["sdr", "deepstream", "vst"]”

the docker_cluster_config.json file specifies details on available DeepStream pipelines for SDR to send streams to. This file is pointed to by the WDM_CLUSTER_CONFIG_FILE environment variable and must be in the following format:

{
    "moj-ds-01": {
        "provisioning_address": "localhost:9010",
        "process_type": "docker"
    },
    "moj-ds-02": {
        "provisioning_address": "localhost:9011",
        "process_type": "docker"
    }
}

Where:

The number of keys are equal to the total number of services you want to distribute to.

The keys for all must be unique.

provisioning_address must be set to the IP/port of each service (i.e., in the case of DeepStream, the IP is localhost but the port differs for the two pipelines).

process_type must be set to docker if using Docker Compose.

SDR will fill up each in order. In other words, SDR will first add WDM_WL_THRESHOLD sources to “moj-ds-01” before adding sources to “moj-ds-02”. In the event a source is removed from “moj-ds-01”, that spot will be filled by the next source added.

PeopleNet Based Inference#

The DeepStream module uses NVIDIA PeopleNet version 2.6 for object detection model. Pre-generated engine files are used to reduce startup time. Engine files vary by device used (Orin AGX, Orin NX16, Orin NX8, Orin Nano) as does batch size, hence the varying DeepStream configs depending on device. Tracking distance of 1 is used, which means inference is run every alternate frames to limit inference utilization. Hybrid clustering is used to process network output.

Note that PeopleNet 2.6 engine file used with DeepStream to be run on DLA with TensorRT in Jetpack 6.1 need to generated offline due to a known issue documented in the TensorRT release notes, captured here for convenience:

When building TensorRT engines for DLA, there is a known issue that entire DLA subgraphs listed in “Layers Running on DLA” (seen with TensorRT’s verbose mode) cannot be built/eventually fall back to GPU with the message “{ForeignNode[…]} cannot be compiled by DLA, falling back to GPU”. This has been observed with the two ResNet-based models: PeopleNet v2.6 and TrafficCamNet from TAO. In both cases, this issue can be fixed by changing TensorRT’s default DLA SRAM pool size of 1 MiB to 0.5 MiB. Using trtexec, this can be achieved by adding the argument --memPoolSize=dlaSRAM:0.5 when building the TensorRT engine.

YOLOv8s Based Inference#

DeepStream inference service now supports the YOLO (v8s) model for object detection on DLA. YOLO is a state of the art object detection model supporting several features such as low latency and multi class support. For more background of various YOLO models released over the years, refer this link: https://arxiv.org/abs/2304.00501

Unlike PeopleNet2.6, YOLOv8s uses legacy test5 rather than the Service Maker version. Due to a known issue with the custom DLA supported version of this model and a known issue with the legacy DeepStream test5 app, only one stream is supported by default for this model. See Increase YOLOv8s Stream Count below for details on how to run a higher number of streams using this model.

NvDCF based tracker#

The NvDCF tracker used in DeepStream has been configured for inference being run every alternate frame. NvDCF is particularly suitable for people analytics in indoor, retail settings based on its visual tracking capability that makes it resilient to full or partial occlusions. Refer to the ll-config-file variable definition in the DeepStream config file for tracker configurations being used.

The default configs for Orin AGX, NX16, and NX8 make use of a PVA backend for the tracker via the VPI unified API in order to offload computation from the GPU. Sub-batching is also utilized to process multiple streams per PVA instance. On Orin AGX, a CUDA backend is used in conjunction with PVA for some streams due to the high stream count supported. The Orin Nano does not have the PVA accelerator and thus tracker computation is done on GPU.

Redis#

Detection events from DeepStream are outputted by the MessageBroker to the Redis test stream. The latest event can be viewed using the following command:

sudo docker exec -it redis redis-cli

xinfo STREAM test

Events are in the following format:

"{\n  \"version\" : \"4.0\",\n  \"id\" : \"65704\",\n  \"@timestamp\" : \"2024-04-25T21:56:43.715Z\",\n  \"sensorId\" : \"MOJD7_2\",\n  \"objects\" : [\n    \"655|1364.74|230.769|1487.88|431.31|Person|-0.1|#||||||\",\n    \"607|137.03|204.356|282.938|583.155|Person|-0.1|#||||||\"\n  ]\n}"

These messages contain information about the latest timestamp processed, sensorId, and object detections. The Analytics Microservice then consumes these messages as its input. View the Redis page for more information.

Logs#

DeepStream logs are saved in the /data/logging-volume/deepstream.log file by default. Live logs can be viewed using the command tail -f /data/logging-volume/deepstream.log. The current streams being processed by DeepStream can be viewed in this file as well as the current fps values for each stream. Generally, these should be around 30fps with default config options, but may be lower if the input stream is a lower fps value. A sample snippet of the DeepStream logs can be seen below:

Active sources : 4
**PERF:  FPS 3 (Avg)    FPS 2 (Avg)     FPS 1 (Avg)     FPS 0 (Avg)
Thu Apr 25 17:26:26 2024
**PERF:
MOJD2_2mbps[e9ad9678-0506-4c3a-8621-9dc865bb1cb1] 30.03 (30.15) MOJD3[e728c793-cf9c-4b6c-b0c3-3565d001d9d9] 30.03 (30.86)       MOJD2_5mbps[9dab9be9-b1b0-4892-b6b9-7f7adfe08bb7] 30.03 (30.40)    MOJD6[555f23f8-4b8f-457d-8246-207af6e06263] 30.03 (30.92)
Active sources : 0
Thu Apr 25 17:26:26 2024

It specifies the number of sources for each pipeline, the current date/time, fps values, stream IDs, and stream names. The first fps is the current fps value while the number in parenthesis is the average for that stream. Stream name is set when adding a stream via VST while stream ID is randomly generated by VST as a unique identifier.

SDR logs are saved in the /data/logging-volume/sdr-deepstream.log file by default. Live logs can be viewed using the command tail -f /data/logging-volume/sdr-deepstream.log. The current streams added to DeepStream via SDR can be viewed in this file as well as the latest relevant messages SDR picks up over Redis.

Deployment#

Example DeepStream config files and container information can be found on NGC as part of the reference AI-NVR application. PeopleNet2.6 configs can be found under the ai_nvr/config/deepstream/pn26 while YOLOv8s configs can be found under ai_nvr/config/deepstream/yolov8s. Different docker compose configs are provided for easy out of the box use of either PeopleNet2.6 or YOLOv8s. Keep in mind that only the PeopleNet2.6 engine and model files are included by default in the provided DeepStream container. Commands to deploy using the default PeopleNet2.6 model with the rest of the sample AI-NVR reference application are provided in the Quick Start guide. Before deployment, first follow the Quick Start Guide to set up your system with Jetson Platform Services.

Modifying DeepStream Configs#

You may modify DeepStream configs as necessary for your use case. This may include things such as changing batch size, changing the tracker configs, changing primary-gie configs to modify models used or hardware accelerators used, enabling or disabling various sinks such as display outputs or different message brokers, etc. A few things to keep in mind while modifying these configs:

If changing batch size or configs for the primary-gie, you may also need to rename (and possibly generate) new engine files with the new batch size. Similarly, you may need to change the WDM_WL_THRESHOLD SDR environment variable as defined in the docker compose yaml config to let SDR know the max number of streams it should add to a given DeepStream pipeline.
Having various width/height options between different components of the DeepStream pipeline can cause large performance hits. Same goes with different batch sizes across different components. It is recommended to keep these in sync across each piece of the DeepStream pipeline.
SDR uses the DeepStream nvmultiurisrcbin source-bin to add/remove streams. Analytics uses the DeepStream Redis output as defined by the msgbroker sink. Monitoring DeepStream FPS reporting in Grafana uses logs outputted to /data/logging-volume/deepstream.log on the host. If any of these aspects are modified, some other components of the reference AI-NVR application may not function as expected.

Additional considerations for the two reference models are provided below.

PeopleNet2.6 Deployment#

If you decide to modify the example configs, you may need to regenerate engine files to support your new configurations. As described in the above PeopleNet2.6 section, this may need to be done manually using a trtexec command due to a known issue. An example command is provided below to generate the DLA0 batch size 8 config file for Orin AGX:

$ /usr/src/tensorrt/bin/trtexec --onnx=./resnet34_peoplenet_int8.onnx --calib=./PeopleNet_calib.cal --int8 --minShapes="input_1:0":8x3x544x960 --optShapes="input_1:0":8x3x544x960 --maxShapes="input_1:0":8x3x544x960 --duration=100 --useDLACore=0 --allowGPUFallback --memPoolSize=dlaSRAM:0.5 --verbose --saveEngine=./dla0_pn26_jp6_halfmem_bs8.engine

This can be run from the /pn26-files directory of the provided Jetson Platform Services DeepStream container. This is where the necessary PeopleNet2.6 dependencies (onnx and cal files) are located. Options passed into trtexec can be modified as needed.

Commands to deploy PeopleNet2.6 with the AI-NVR reference application can be found under the Run IVA Application section of the quick start guide.

YOLOv8s Deployment#

Note

For each dataset an user elects to use, the user is responsible for checking if the dataset license is fit for the intended purpose.

Increase YOLOv8s Stream Count#

Due to a known issue with the custom DLA supported version of this model and a known issue with the legacy DeepStream test5 app, only one stream is supported by default for this model. This DLA model currently only supports batch sizes of 1, but the DeepStream app automatically overrides this batch size value if using the nvmultiurisrcbin. The following changes can be made in the provided DeepStream container and configs to support a higher number of streams using this model:

Start the DeepStream triton and JPS samples containers:

$ sudo docker run -itd --runtime nvidia --network host --name ds-modifications-triton nvcr.io/nvidia/deepstream:7.1-triton-multiarch
$ sudo docker run -itd --runtime nvidia --network host --name ds-modifications nvcr.io/nvidia/jps/deepstream:7.1-public-v1
$ sudo docker exec -it ds-modifications-triton /bin/bash

Modify the /opt/nvidia/deepstream/deepstream-7.1/sources/apps/sample_apps/deepstream-app/deepstream_app.c file. Find the following section (starting at line 1327) and comment it out:

/** if using nvmultiurisrcbin, override batch-size config for sgie */
if (config->use_nvmultiurisrcbin) {
    for (guint i = 0; i < config->num_secondary_gie_sub_bins; i++) {
        config->secondary_gie_sub_bin_config[i].batch_size =
        config->sgie_batch_size;
    }
}

This will disable DeepStream from overriding the batch size value for PGIE based on the batch size value of nvmultiurisrcbin.

Afterwards, recompile the DeepStream test5 app:

$ cd /opt/nvidia/deepstream/deepstream-7.1/sources/apps/sample_apps/deepstream-test5
$ export CUDA_VER=12.6
$ make

Exit the container, copy the new executable to the JPS container, then commit the new container to save it:

$ exit
$ sudo docker cp ds-modifications-triton:/opt/nvidia/deepstream/deepstream-7.1/sources/apps/sample_apps/deepstream-test5 .
$ sudo docker cp deepstream-test5 ds-modifications:/opt/nvidia/deepstream/deepstream-7.1/sources/apps/sample_apps
$ sudo docker commit ds-modifications jps-ds:yolo-modifications
$ sudo docker stop ds-modifications-triton ds-modifications

Modify the provided DeepStream YOLOv8s configs to update batch size. Edit the [source-list] section of config/deepstream/yolov8s/yolov8s-ds-config_<DEVICE_TYPE>.txt. Here update [max-batch-size] to be 8.

You need to also update the Docker Compose config to point to the modified container and the updated version of the test5 executable within it. Change the image line under the deepstream section of compose_<DEVICE TYPE>_yolov8s.yaml to:

image: jps-ds:yolo-modifications

And similarly change the command line in the same section to:

command: sh -c '/opt/nvidia/deepstream/deepstream-7.1/sources/apps/sample_apps/deepstream-test5/deepstream-test5-app -c /ds-config-files/yolov8s/yolov8s-ds-config_nx16.txt 2>&1 | grep --line-buffered . | tee -a /log/deepstream.log'

Finally, also increase the number of streams SDR will assign to DeepStream. Update WDM_WL_THRESHOLD under sdr in this same yaml file and set it to 8.

You can now launch AI-NVR with YOLOv8s using the same Docker Compose up command as before and be able to support higher stream counts.

Running YOLOv8s with AI-NVR Reference Application#

Docker compose configs are provided to run this with the rest of the AI-NVR stack. To run, use the commands below:

Orin AGX: sudo docker compose -f compose_agx_yolov8s.yaml up -d --force-recreate

Orin NX16: sudo docker compose -f compose_nx16_yolov8s.yaml up -d --force-recreate

To stop:

Orin AGX: sudo docker compose -f compose_agx_yolov8s.yaml down --remove-orphans

Orin NX16: sudo docker compose -f compose_nx16_yolov8s.yaml down --remove-orphans