Troubleshooting

This section lists resolutions and tips for troubleshooting across various modules and scenarios.

Setup

Initial deployment taking a long time

Containers are downloaded from NGC during first time install, and therefore time to first deployment might be impacted by network bandwidth available to download the containers.

Containers not running after deployment

Check if root filesystem is full by running the df command and verifying usage.

nvidia@tegra-ubuntu:~$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/mmcblk3p1   54G   54G     0 100% /

Examine root filesystem content to determine possible files to delete using command such as du.

This is especially likely if no external storage is attached to the system to support a data partition.

DeepStream FPS in Grafana is incorrect or not updating

Ensure that monitoring and DeepStream are both up. Afterwards make sure that the DeepStream log file is being updated properly. This can be done using tail -f /data/logging-volume/deepstream.log. Grafana parses this to get current stream FPS values.

If the DeepStream log is not updating every ~5 seconds, make sure that the logging volume is not full. In the event that it is, delete (or move) the DeepStream log then restart both the DeepStream and Metric Monitor containers. If the logging volume is not full and this log is not updating, try restarting just the DeepStream container as it is possible that DeepStream ended up getting into a bad state.

DeepStream FPS degradation

  • With dynamic stream addition mechanism, a slight FPS degradation is possible (eg: going down to 29 from 30) especially at larger stream counts.

  • Ensure that your input stream is healthy in terms of quality and FPS. The VLC video player for instance enables display of stream metrics including lost frames, corrupted frames and bitrate. This can be viewed by navigating: Tools > Media Information > Statistics.

  • Verify if your system is over subscribed by verifying utilization metrics using sudo tegrastats. Refer to the supported stream counts for the AI-NVR workflow under the Quick Start Guide section in the documentation, which also describes considerations and tips for performance tuning.

  • Similarly, make sure you are running appropriate configs for your system. If for instance you deploy Orin AGX configs on an Orin NX8, performance degradation may occur.

  • Make sure to run at maximum clocks and power settings as mentioned in the setup guide. See instructions for Update settings for performance in Quick Start Guide

-On rare occasions when there is a drop in camera FPS, the Deepstream FPS for that camera goes to 0 and will not recover. Futher, it has been observed that FPS logs from DeepStream will stop getting printed from that point, thereby having monitoring stats for DeepStream FPS to be reported incorrectly as 0. Remove stream causing the issue and restart DeepStream to recover from this issue.

DeepStream container won’t exit on docker compose down

Sometimes the DeepStream container hangs when trying to remove using the docker compose down command. In this scenario, you may see an error similar to below:

Error response from daemon: cannot stop container: 40997391f8fe65b604cf64166da7e0fe9992c99620c5872ed16e478f1b15e020: tried to kill container, but did not receive an exit event

In this case, first find the PID of the DeepStream container process:

ps auxw | grep $(sudo docker container ls | grep deepstream | awk '{print $1}') | awk '{print $2}'

Use the output of that command as the input to the command below:

sudo kill -9 <PID given by previous command>

You may also need to restart the system after this.

DeepStream container keeps crashing/restarting

Ensure that the system had enough free memory to run DeepStream. DeepStream requires 1-2.5GB depending on how many streams are being processed. Free memory can be viewed using the tegrastats utility. Not having enough free memory can cause issues with DeepStream crashing or cause multiple streams to be stuck at 0fps, especially if the PVA is not able to allocate enough memory.

DeepStream freeze

Sometimes with RTSP streams the application gets stuck on reaching EOS. This is because of an issue in rtpjitterbuffer component. To fix this issue, follow instructions to update rtpmanager library as described in the DeepStream Installation Guide.

Note that these steps have to be run inside the Deepstream container and not natively on the Jetson device.

Sometimes with RTSP streams, using VIC for pre-processing and scaling at large stream counts causes pipeline to freeze due to a known issue in VIC being addressed currently. As a workaround, configure pipeline elements to use GPU instead of VIC. See Deepstream config for AGX (ds-config-0_agx.yaml) for illustration of how to do so.

WebRTC Streaming issues

Video is not playing

  • Make sure streaming client (VST webUI browser or mobile app) is in the same network as the device. Otherwise, a relay streaming service such as Twilio is required, and VST must be configured to use it.

Black screen seen during video streaming with mobile app/browser

Ensure that Jetson device and the client (mobile phone) are in the same network. Otherwise video relay streaming has to be setup using service such as Twilio as described in the VST documentation.

Troubleshooting Video Quality - Deep Dive

If you’ve followed the general troubleshooting guidelines but are still experiencing issues with video quality, this section offers a detailed approach to isolate and localize streaming quality issue experienced with VST webUI and mobile app.

Checking the incoming RTSP stream quality from Camera to Jetson device:

Use the following options to make sure the input RTSP stream from the camera is received at the connected Jetson device with the expected quality,

  • VST metrics:

    • Access streaming metrics via VST webUI by navigating to Debug >> Stream Stats.

    • In the FPS stats, ensure the camera FPS metrics consistently show approximately 30 FPS.

    _images/vst_stream_debug.png
  • FPS from DeepStream logs:

    • Use the following command to view DeepStream logs:

      ` sudo docker logs -f deepstream `

    • Confirm that the average FPS logged is close to 30 FPS.

  • Manual stream inspection:

    • Connect a monitor to the Jetson device.

    • Open a terminal on the Jetson device and execute:

      gst-launch-1.0 playbin uri=rtsp://<camera_url>
      
    • Manually inspect the input stream from the camera to ensure it’s playing at good quality on the Jetson device.

Checking the output stream quality at the client app:

  • VST Metrics:

    • Use the VST webUI to access streaming metrics by navigating to Debug >> Stream Stats.

    • This view displays client bitrate, dropped frame count, and client FPS.

    • Dropped frames (nackCount) may indicate insufficient network bandwidth, while fluctuations in bitrate or client FPS can suggest performance issues with the Jetson system.

_images/webrtc_stats.png

VST crashes after a while when recording multiple streams on NVMe

If you find that VST crashes after a while when recording multiple streams on an attached NVMe drive due to progressive memory usage increase, it may be because write operations to the drive are getting backed up. To address this issue, you can try one of the following options.

Option 1: Edit /boot/extlinux/extlinux.conf and add nvme.use_threaded_interrupts=1 in APPEND parameter, and then reboot your device. See the sample file shown below:

TIMEOUT 30
DEFAULT primary

MENU TITLE L4T boot options

LABEL primary
    MENU LABEL primary kernel
    LINUX /boot/Image
    INITRD /boot/initrd
    APPEND ${cbootargs} root=PARTUUID=1efddf3c-894d-4b21-a230-82476db0ee5e rw rootwait rootfstype=ext4 mminit_loglevel=4 console=ttyTCU0,115200 console=ttyAMA0,115200 firmware_class.path=/etc/firmware fbcon=map:0 net.ifnames=0 nospectre_bhb video=efifb:off console=tty0 nvme.use_threaded_interrupts=1

Option 2: If you are flashing and using an AGX Orin devkit, you can append the kernel module parameter to the flash command through option -C as shown below:

sudo ./flash.sh -C nvme.use_threaded_interrupts=1 jetson-agx-orin-devkit internal

AI Services (VLM & Zero Shot Detection)

The VLM and Zero Shot Detection services have additional troubleshooting sections on their respective pages

Known issues

  • Cameras outside of supported list specified in the VST documentation may fail to detect. Additionally, camera without s-profile compliance may not work with VST.

  • Occasional crashes are possible in DeepStream container due to a few known issues noted below that occur sporadically.

    • Tracker error with message gstnvtracker: All sub-batches are fully allocated. Modify “sub-batches” configuraion to accommodate more number of streams causing all streams in the problematic pipeline to drop to 0FPS. Restart the DeepStream container using sudo docker restart deepstream to recover.

    • PVA issue upon DeepStream startup causing crash with error VPI_ERROR_INVALID_OPERATION: PVA is not available and may be oversubscribed in the system. Reboot the system to recover.

  • Newly added streams to VST not getting picked up in DeepStream due to add/remove API timing out; this can particularly occur when VST restarts (such as after a crash), upon which DeepStream streams are removed and re-added by SDR.

  • VST crashes occasionally seen when adding RTSP streams. However, docker compose will automatically restart the container and system will become operational.

  • Occasional Deepstream container crashes are observed, but docker compose will automatically restart the container and system should resume operation without any intervention

  • After rebooting VST, on rare occasions, streams might show 0 fps in perception pipeline. Try restarting the DeepStream container to re-add streams to the perception pipeline.

  • Docker compose deployment may not run properly after system reboots. This is due to the required startup ordering of containers not being enforced when containers are automatically started by Docker after reboot.

  • Stream name change in VST does not get reflected in DeepStream.

  • Intermittent freezing and distortion during webRTC streaming; do refer to streaming issues troubleshooting for tips to diagnose the cause.

  • Mobile app video streaming at increased stream count can result in drop in video quality depending on mobile phone specifications.

  • Notification: The authentication token used to publish notifications from the device to the reference cloud is valid for 7 days. To continue publishing notifications after the token expires, the TCPMux client on the device must be restarted to fetch a new authentication token.

  • AI Services: Streams and prompts added to the VLM and Zero Shot Detection service do not persist across reboots. Streams can be made to persist across reboots by combining with SDR as shown in the Zero Shot Detection Workflow page.

  • VLM Service: When sending a chat completion request to the VLM, it may respond with an empty string. Adjusting the prompt, changing the model or sending the query again may result in a better response.