Quickstart

Overview

Multi-Camera Tracking, also called Multi-Target Multi-Camera (MTMC), is a reference workflow or application for video analytics that can track people across multiple cameras and provides the counts of unique people seen over time. Developers can extend this to other camera views such including outdoor scenarios.

This reference application uses live camera feeds as input, performs object detection, object tracking, streaming analytics and multi-target multi-camera tracking, provides various aggregated analytics functions as API end points, and visualize the results via a browser-based user interface.

Live camera feeds are simulated by streaming video files in RTSP format. Various analytics microservices are connected via Kafka message broker and processed results are saved in database for long-term storage.

The image below shows a visual representation of the Multi-Camera Tracking app end-to-end pipeline:

Multi-Camera Tracking pipeline

Media Management provides the video streaming and recording functionalities serving all downstream components as input sources. The RSTPs coming out from NVStreamer and VST goes to Perception (DeepStream) where raw meta data with bounding box, tracker id, class type and an embedding vector for each detected object are generated. The raw meta data is transferred through Kafka message broker to Behavior Analytics and Multi-Camera Tracking separately for analytics. The processed results from Behavior Analytics and Multi-Camera Tracking along with the raw metadata from Perception are saved in Elasticsearch via Kafka message broker and Logstash data ingestion. Web API queries saved data from Elasticsearch and provides endpoints with various integrated analytics and utilities and Web UI leverages the Web API endpoints and creates a browser based user interface for easy data visualization.

We also provide an optional to deploy the reference application without the heavy GPU dependent modules using pre-extracted metadata as input so that user have an light weight option to explore this reference application. Comparing to the above end-to-end mode, we all this option the playback mode. The image below shows a visual representation of the Multi-Camera Tracking app playback mode pipeline:

Multi-Camera Tracking pipeline playback

Quick Deployment

Deploy

  • To download and setup Metropolis apps, refer to the Quickstart Setup section.

  • The --profile arg needs to be added to docker compose up command to select between two types of deployment: end-to-end and playback. To deploy the Multi-Camera Tracking app, navigate to the metropolis-apps-standalone-deployment/docker-compose folder.

    • The end-to-end mode deploys every related module from NVStreamer/VST to UI. The end-to-end mode is for user to fully explore the entire pipeline. To deploy everything end-to-end, use --profile e2e:

      $ docker compose -f foundational/mdx-foundational.yml -f mtmc-app/mdx-mtmc-app.yml --profile e2e up -d --pull always --build --force-recreate
      
    • The playback mode does not deploy NVStreamer, VST, Perception (DeepStream) and UI. Instead, a playback module using saved metadata will be used for streaming input data. The playback mode is for user to quickly investigate the data flow or replay saved data with the most light-weight pipeline. To deploy playback mode, use --profile playback:

      $ docker compose -f foundational/mdx-foundational.yml -f mtmc-app/mdx-mtmc-app.yml --profile playback up -d --pull always --build --force-recreate
      

    Initialization of some components of the reference application might take up to a minute. After the deployment, the following ports will be used to expose services from the reference application:

    • Calibration-Toolkit - 8003

    • Default Kafka port - 9092

    • Default ZooKeeper ports - 2181

    • Elasticsearch and Kibana (ELK) - 9200 and 5601, respectively

    • Jupyter Lab - 8888

    • NVStreamer - 31000 (for e2e mode only)

    • Triton - 8000 (HTTP), 8001 (GRPC)

    • VST - 30000 (for e2e mode only)

    • Web-API - 8081

    • Multi-Camera Tracking Web-UI - 3002 (for e2e mode only)

Note

With the addition of Real Time Location System (RTLS) app in version 2.1, now you can run Multi-Camera Tracking along with RTLS if the system has enough resources.

  • To deploy the end-to-end mode, use the following command:

    $ docker compose -f foundational/mdx-foundational.yml -f mtmc-app/mdx-mtmc-app.yml -f mtmc-app/mdx-rtls-app.yml --profile e2e up -d --pull always --build --force-recreate
    

Note that Multi-Camera Tracking UI will run on port 3002 while RTLS UI will run on port 3003. For details about the RTLS app, please refer to Quickstart.

Shutdown

To gracefully stop and shut down all services, run the command with the corresponding profile:

$ docker compose -f foundational/mdx-foundational.yml -f mtmc-app/mdx-mtmc-app.yml --profile e2e down
$ docker compose -f foundational/mdx-foundational.yml -f mtmc-app/mdx-mtmc-app.yml --profile playback down
  • If both Multi-Camera Tracking and RTLS are deployed combinedly then to stop and shut down all services, use the following command:

    $ docker compose -f foundational/mdx-foundational.yml -f mtmc-app/mdx-mtmc-app.yml -f mtmc-app/mdx-rtls-app.yml --profile e2e down
    

Clean up (Optional)

Clean up data logs and Docker images to ensure a clean re-deployment - highly recommended after any configuration or customization.

  • To clean up existing data and logs that are present under metropolis-apps-data/data_log/ folder:

    $ sudo chmod +x cleanup_all_datalog.sh
    $ ./cleanup_all_datalog.sh
    

Note

  • The cleanup_all_datalog.sh script is present inside metropolis-apps-standalone-deployment/docker-compose/ and includes an optional --delete-calibration-data flag. This flag accepts true or false as values, with false being the default.

  • Camera calibration is a time-consuming process. To preserve the calibration data, run the script without the flag or with the flag set to false, as in ./cleanup_all_datalog.sh --delete-calibration-data false.

  • Clean up Docker images and cached volumes (selective):

    $ docker compose -f foundational/mdx-foundational.yml -f mtmc-app/mdx-mtmc-app.yml --profile e2e down --rmi all && docker volume rm `docker volume ls -q| grep -v 'deepstream\|calibration'`
    $ docker compose -f foundational/mdx-foundational.yml -f mtmc-app/mdx-mtmc-app.yml --profile playback down --rmi all && docker volume rm `docker volume ls -q| grep -v 'deepstream\|calibration'`
    
  • If both Multi-Camera Tracking and RTLS are deployed combinedly then to clean up Docker images and cached volumes (selective), use the following command:

    $ docker compose -f foundational/mdx-foundational.yml -f mtmc-app/mdx-mtmc-app.yml -f mtmc-app/mdx-rtls-app.yml --profile e2e down --rmi all && docker volume rm `docker volume ls -q| grep -v 'deepstream\|calibration'`
    
  • Or, clean up existing Docker images and all cached volumes:

    $ docker compose -f foundational/mdx-foundational.yml -f mtmc-app/mdx-mtmc-app.yml --profile e2e down --volumes --rmi all
    $ docker compose -f foundational/mdx-foundational.yml -f mtmc-app/mdx-mtmc-app.yml --profile playback down --volumes --rmi all
    $ docker volume prune
    
  • If both Multi-Camera Tracking and RTLS are deployed combinedly then to clean up existing Docker images and all cached volumes, use the following command:

    $ docker compose -f foundational/mdx-foundational.yml -f mtmc-app/mdx-mtmc-app.yml -f mtmc-app/mdx-rtls-app.yml --profile e2e down --volumes --rmi all
    $ docker volume prune
    

Note

  • The 1st set of commands preserves the calibration data volume.

  • It also retains the DeepStream volume to avoid the need to re-build engine files for the perception pipeline, which can be very time-consuming for heavy models (ViT-based, Swin-based, etc.).

Troubleshoot

To troubleshoot issues, developers can start with the FAQs.


Explore the Results

Explore the application features & output data via the following interfaces:

Kibana Dashboards

You can explore data in Kibana at http://localhost:5601/ (replace with the deployment system IP if opening browser on a remote system). An init container imports a custom dashboard for the MTMC reference app and creates all the required indexes. Access the dashboard by going to Menu/Dashboard and clicking on MTMC-Data-View dashboard.

MTMC dashboard

From this sample screen capture, on the top-left corner, we can see that there are six unique people identified from 34 individual tracking IDs based on the MTMC algorithm. We know this is correct since we know there are only six people in total present in all the seven videos.

The Kibana dashboard is a powerful tool to visualize data. You can interact with any of the visualization panels to filter the data according to a selection. You can read more about dashboards in the official Kibana dashboard documentation.

Reference UI

Once the Multi-Camera Tracking app deployment is complete, open a browser and access the reference UI at http://localhost:3002/ (replace with the deployment system IP if opening browser on a remote system).

Note

It will take some time for UI to start running. You can monitor UI deployment progress from docker logs by running: $docker logs mdx-mtmc-ui --follow

You should be able to see the UI window as below with Events window on the left and floor plan view on the right.

Multi-Camera Tracking UI

On the event window, it displays all the identified global IDs from the MTMC algorithm. On the floor plan view, the live motion of detected people will be marked with colored trajectories. Each global ID from the left window has multiple associated local IDs, and each local ID has its own moving trajectory. The trajectory of the last local ID from each global ID is plotted on the floor plan view and is color-coded the same as the corresponding global ID.

Note that the max number of global IDs to be displayed, by default 5, can be modified in UI config file.

Other than visualizing Multi-Camera Tracking events as detailed above, the same UI app allows you to perform QBE (Query-by-Example) - query related events for a person identified in a video frame. QBE can be performed over a video associated with an Multi-Camera Tracking event and/or video recordings that are available for the cameras in the environment. Detailed instructions regarding performing QBE can be found at the link below.

For in-depth documentation of the UI, refer to the Multi-Camera Tracking UI section.

API Tutorial Notebook

The reference UI calls the web API endpoints to create those functionalities. You can also explore the results via the provided API endpoints and potentially use them in your own application.

Navigate to http://localhost:8888/ or http://<deployment-machine-ip>:8888/ on your browser, input metropolis as password/token and direct to .work folder.

multi-camera-tracking.ipynb provides a walkthrough of all the key API endpoints in this Multi-Camera Tracking app, and you can try them out by just running through the cells.


Components

To further understand this reference application, here is a brief description of the key components.

Media Management

Both NVStreamer and VST are tailored media microservices with functionalities specialized for management & storage of live camera feeds and pre-recorded videos. NVIDIA media microservice group provides various video management services that are critical for end-to-end intelligent video analytics applications. For more details on NVIDIA Media Service refer to the following sections:

In this reference app, NVStreamer performs as a simulated live camera source where it streams contents from provided videos files as RTSPs. Those RTSPs are pipelined into VST just as live cameras and from there VST performs as the video managements system and interacts with downstream microservices such as Perception and UI.

The key functionalities of NVStreamer and VST in this reference app include:

  • NVStreamer provides RTSP streaming links from given video files as the input source to VST and considered as simulated live streams.

  • VST provides RTSP streaming links to Perception (DeepStream) for image processing.

  • VST creates video clips overlaid with bounding boxes from extracted metadata.

  • VST provides WebRTC streaming links from given video files for visualization in UI.

VST provides a browser based UI which you can access from http://localhost:30000/.

There are multiple videos files provided in the metropolis-apps-data folder. Multi-Camera Tracking app uses the seven videos named as Building_K_Cam<id>.mp4. Those seven videos are captured in the same room from seven cameras with different viewing angles. There are six unique people present in the seven videos. Here is a sample view from Building_K_cam1:

Multi-Camera Tracking video sample

Note

Since the seven cameras start recording at the different times, we inserted black frames to ensure the seven videos are synchronized.

Perception (DeepStream)

The Perception (DeepStream) component consists of a PGIE and single camera tracker pipeline, where the tracker deploys a re-identification model to extract feature vectors for each person. Then, it generates streaming perception metadata which is consumed by Metropolis apps via the Kafka broker. These messages correspond to seven sensors, and act as input data to the Multi-Camera Tracking app. The messages from the perception microservice are compressed in protobuf format.

The Key contents of the message are:

  • sensor ID

  • frame ID and timestamp

  • detection bounding box

  • tracking ID

  • extracted feature vector

For more information on the schema and contents of the sensor metadata, refer to the Protobuf Schema section.

Note

  • The provided video files are 10 minutes in length. In this reference application, Perception is configured to terminate when the video streaming reaches the end of file and stop sending metadata.

  • If you want to reprocess the video from the start, you can restart only the perception microservice (DeepStream app) by running docker restart mdx-deepstream.

Behavior Analytics

Behavior Analytics is the core analytics component of Metropolis apps and is built using the Analytics Stream library for Scala. The app can consume metadata from any source which adheres to a pre-defined mdx-schema (see JSON Schema and Protobuf Schema for details). Behavior Analytics generates the following key analytics for Multi-Camera Tracking app:

  • Behavior data of people (includes speed, direction and distance travelled).

  • Object feature vector

The results are published to the Kafka message broker for downstream components to consume. For the Multi-Camera Tracking application the PeopleTracking.scala class in modules/behavior-analytics/src/main/scala/example folder is run, we encourage the user to go through the sample code. For in-depth documentation of the component, refer to the Behavior Analytics section.

Note

Behavior Analytics relies on a sensor calibration file to process the sensor metadata. Sensor calibration allows us to map the output from the sensor-processing layer onto the Global Information System or Cartesian Coordinate map in Metropolis apps. For more information on calibration, refer to the Camera Calibration section.

Multi-Camera Tracking

The Multi-Camera Tracking component consumes raw data which are processed into behaviors (trajectories, embeddings, etc.), and clusters the behaviors based on re-identification feature embeddings and spatio-temporal information.

More particularly in this reference app, the MTMC-analytics component operates in live mode by consuming raw data from a Kafka topic in micro batches, manages the behavior state, and then merges the clustering results with existing IDs.

The pipeline workflow and the configuration are discussed in-depth in the Multi-Camera Tracking microservice section.

Web API

The Web API component provides REST APIs and Web Socket Interfaces for a wide set of functions on the data produced by Behavior Analytics. The Web API exposes various functions used by the Multi-Camera Tracking UI. Such as:

  • Fetch unique number of people during a time range

  • Fetch behavior samples of a given global ID

Example - The following request fetches the number of unique people seen over a given time range:

curl "http://localhost:8081/tracker/unique-object-count?timestamp=T&sensorIds=Building_K_Cam1&sensorIds=Building_K_Cam2&sensorIds=Building_K_Cam7"

where T is a valid timestamp which follows ISO 8601 format. (example: 2024-04-15T20:57:55.000Z)

The web API component is started with the index.js present in modules/analytics-tracking-web-api, we encourage you to go through the code. For in-depth documentation of the component, refer to the Analytics and Tracking API section.

Web UI

The Multi-Camera Tracking reference web UI visualizes the valuable insights generated by the multi-camera tracking app and helps in monitoring and management of indoor spaces such as rooms, hallways, entry/exit doors. For in-depth documentation of the component, refer to the Multi-Camera Tracking UI section.

Note

The entire source code for the Multi-Camera Tracking app UI is present in the modules/analytics-tracking-web-ui/apps/mtmc-indoor folder. Users can extend the given UI or even create a new one that fits their use case while leveraging the endpoints provided by the Web API component.


Conclusion

Congratulations! You have successfully deployed key microservices and built a Multi-Camera Tracking application.

We encourage you to explore the remaining reference applications provided as part of Metropolis Microservices. Below are additional resources:

  • Quickstart Setup - Guide to deploy all reference applications in the standalone mode via Docker Compose.

  • Production Deployment Setup - Guide to deploying Metropolis microservices in a Kubernetes environment.

  • FAQs- A list of commonly asked questions.