NVIDIA AI Services for Jetson#

Overview#

Jetson AI Services provide optimized, out of the box video analysis capabilities that can be leveraged through well defined APIs. They can be readily integrated with other Jetson platform services to quickly enable end to end applications that support production grade features including camera discovery and streaming (through VST), dynamic stream addition (through SDR), message bus for microservice integration (through Redis).

AI services are deployed as containers, typically as part of a larger deployment of relevant containers to the system through docker-compose. Examples of such integration is showcased in the various reference workflows included in the Jetson Platform Services release. REST APIs define various standard operations associated with AI services including stream addition/removal, model interaction and configuration.

Generative AI is supported with the AI services, enabling unprecedented use cases with computer vision both by redefining traditionally addressed problems such as object detection (eg: through open vocabulary support) and through new use cases with visual language models capable of natural language interaction with video input.

DeepStream#

DeepStream AI Service provides optimized DeepStream pipelines supporting multi-stream object detection and tracking using PeopleNet or YOLO (v8) models. Supports deep integration with various other modules in the Jetson Platform Services architecture including VST (for camera discovery and streaming), SDR (dynamic stream addition), Analytics (spatio-temporal analytics using DeepStream metadata output) and Monitoring.

To get started view the DeepStream Perception page.

Zero Shot Detection#

The zero shot detection AI service, uses an open vocabulary detection model called NanoOWL which is based on Google’s OWL-ViT. NanoOWL has been optimized for Jetson and packaged as a zero shot detection AI service for easy deployment. The AI service allows REST API based interaction to control the video stream input and classes to detect. The model is not bounded by a set of pre-defined classes, which allows the user to update the detection classes at runtime and immediately see the updated detections in overlay output stream.

To get started view the Zero Shot Detection with Jetson Platform Services page.

Visual Language Models (VLMs)#

The VLM AI service, enables quick deployment of the VILA and LLaVA family of large multi-modal models which are capable of understanding both image and text input. The AI service wraps these models with a REST API to allow easy configuration and integration with other services. The REST API allows the user to control the video stream input and prompts to the model to set alerts or ask questions about the video stream.

To get started view the Visual Language Models (VLM) with Jetson Platform Services page.

Grounding DINO (GDINO)#

Grounding DINO microservice supports open vocabulary object detection supporting limitless category of objects using generative AI. It supports natural language based contextual prompting of objects through REST APIs.

To get started view the Grounding DINO (GDINO) page.

VLM Video Summarization#

Video summarization microservices enables an accurate, generalizable technique based on natural language interfaces for summarizing video files using video language models (VLMs) accessible through REST APIs. While video summarization is generally a resource intensive operation, the microservice uses optimizations making this feasible on Jetson devices.

To get started view the VLM Video Summarization page.

Customization#

To customize or create your own AI services, view the open source code on GitHub <https://github.com/NVIDIA-AI-IOT/jetson-platform-services> to understand how the AI services are developed. These AI services can be used as a starting point for customization or as a reference for building your own AI services with new models. A great place to find the latest models that could be turned into AI services is the NVIDIA Jetson AI Lab.