Quickstart
Overview
Multi-Camera Tracking, also called Multi-Target Multi-Camera (MTMC), is a reference workflow or application for video analytics that can track people across multiple cameras and provides the counts of unique people seen over time. Developers can extend this to other camera views such including outdoor scenarios.
This reference application uses live camera feeds as input, performs object detection, object tracking, streaming analytics and multi-target multi-camera tracking, provides various aggregated analytics functions as API end points, and visualize the results via a browser-based user interface.
Live camera feeds are simulated by streaming video files in RTSP format. Various analytics microservices are connected via Kafka message broker and processed results are saved in database for long-term storage.
The image below shows a visual representation of the Multi-Camera Tracking app end-to-end pipeline:

Media Management provides the video streaming and recording functionalities serving all downstream components as input sources. The RSTPs coming out from NVStreamer and VST goes to Perception (DeepStream) where raw meta data with bounding box, tracker id, class type and an embedding vector for each detected object are generated. The raw meta data is transferred through Kafka message broker to Behavior Analytics and Multi-Camera Tracking separately for analytics. The processed results from Behavior Analytics and Multi-Camera Tracking along with the raw metadata from Perception are saved in Elasticsearch via Kafka message broker and Logstash data ingestion. Web API queries saved data from Elasticsearch and provides endpoints with various integrated analytics and utilities and Web UI leverages the Web API endpoints and creates a browser based user interface for easy data visualization.
We also provide an optional to deploy the reference application without the heavy GPU dependent modules using pre-extracted metadata as input so that user have an light weight option to explore this reference application. Comparing to the above end-to-end mode, we all this option the playback mode. The image below shows a visual representation of the Multi-Camera Tracking app playback mode pipeline:

Quick Deployment
Deploy
To download and setup Metropolis apps, refer to the Quickstart Setup section.
The
--profile
arg needs to be added to docker compose up command to select between two types of deployment: end-to-end and playback. To deploy the Multi-Camera Tracking app, navigate to themetropolis-apps-standalone-deployment/docker-compose
folder.The end-to-end mode deploys every related module from NVStreamer/VST to UI. The end-to-end mode is for user to fully explore the entire pipeline. To deploy everything end-to-end, use
--profile e2e
:$ docker compose -f foundational/mdx-foundational.yml -f mtmc-app/mdx-mtmc-app.yml --profile e2e up -d --pull always --build --force-recreate
The playback mode does not deploy NVStreamer, VST, Perception (DeepStream) and UI. Instead, a playback module using saved metadata will be used for streaming input data. The playback mode is for user to quickly investigate the data flow or replay saved data with the most light-weight pipeline. To deploy playback mode, use
--profile playback
:$ docker compose -f foundational/mdx-foundational.yml -f mtmc-app/mdx-mtmc-app.yml --profile playback up -d --pull always --build --force-recreate
Initialization of some components of the reference application might take up to a minute. After the deployment, the following ports will be used to expose services from the reference application:
Calibration-Toolkit - 8003
Default Kafka port - 9092
Default ZooKeeper ports - 2181
Elasticsearch and Kibana (ELK) - 9200 and 5601, respectively
Jupyter Lab - 8888
NVStreamer - 31000 (for e2e mode only)
Triton - 8000 (HTTP), 8001 (GRPC)
VST - 30000 (for e2e mode only)
Web-API - 8081
Multi-Camera Tracking Web-UI - 3002 (for e2e mode only)
Note
With the addition of Real Time Location System (RTLS) app in version 2.1, now you can run Multi-Camera Tracking along with RTLS if the system has enough resources.
To deploy the end-to-end mode, use the following command:
$ docker compose -f foundational/mdx-foundational.yml -f mtmc-app/mdx-mtmc-app.yml -f mtmc-app/mdx-rtls-app.yml --profile e2e up -d --pull always --build --force-recreate
Note that Multi-Camera Tracking UI will run on port 3002 while RTLS UI will run on port 3003. For details about the RTLS app, please refer to Quickstart.
Shutdown
To gracefully stop and shut down all services, run the command with the corresponding profile:
$ docker compose -f foundational/mdx-foundational.yml -f mtmc-app/mdx-mtmc-app.yml --profile e2e down $ docker compose -f foundational/mdx-foundational.yml -f mtmc-app/mdx-mtmc-app.yml --profile playback down
If both Multi-Camera Tracking and RTLS are deployed combinedly then to stop and shut down all services, use the following command:
$ docker compose -f foundational/mdx-foundational.yml -f mtmc-app/mdx-mtmc-app.yml -f mtmc-app/mdx-rtls-app.yml --profile e2e down
Clean up (Optional)
Clean up data logs and Docker images to ensure a clean re-deployment - highly recommended after any configuration or customization.
To clean up existing data and logs that are present under
metropolis-apps-data/data_log/
folder:$ sudo chmod +x cleanup_all_datalog.sh $ ./cleanup_all_datalog.sh
Note
The
cleanup_all_datalog.sh
script is present insidemetropolis-apps-standalone-deployment/docker-compose/
and includes an optional--delete-calibration-data
flag. This flag acceptstrue
orfalse
as values, withfalse
being the default.Camera calibration is a time-consuming process. To preserve the calibration data, run the script without the flag or with the flag set to false, as in
./cleanup_all_datalog.sh --delete-calibration-data false
.
Clean up Docker images and cached volumes (selective):
$ docker compose -f foundational/mdx-foundational.yml -f mtmc-app/mdx-mtmc-app.yml --profile e2e down --rmi all && docker volume rm `docker volume ls -q| grep -v 'deepstream\|calibration'` $ docker compose -f foundational/mdx-foundational.yml -f mtmc-app/mdx-mtmc-app.yml --profile playback down --rmi all && docker volume rm `docker volume ls -q| grep -v 'deepstream\|calibration'`
If both Multi-Camera Tracking and RTLS are deployed combinedly then to clean up Docker images and cached volumes (selective), use the following command:
$ docker compose -f foundational/mdx-foundational.yml -f mtmc-app/mdx-mtmc-app.yml -f mtmc-app/mdx-rtls-app.yml --profile e2e down --rmi all && docker volume rm `docker volume ls -q| grep -v 'deepstream\|calibration'`
Or, clean up existing Docker images and all cached volumes:
$ docker compose -f foundational/mdx-foundational.yml -f mtmc-app/mdx-mtmc-app.yml --profile e2e down --volumes --rmi all $ docker compose -f foundational/mdx-foundational.yml -f mtmc-app/mdx-mtmc-app.yml --profile playback down --volumes --rmi all $ docker volume prune
If both Multi-Camera Tracking and RTLS are deployed combinedly then to clean up existing Docker images and all cached volumes, use the following command:
$ docker compose -f foundational/mdx-foundational.yml -f mtmc-app/mdx-mtmc-app.yml -f mtmc-app/mdx-rtls-app.yml --profile e2e down --volumes --rmi all $ docker volume prune
Note
The 1st set of commands preserves the calibration data volume.
It also retains the DeepStream volume to avoid the need to re-build engine files for the perception pipeline, which can be very time-consuming for heavy models (ViT-based, Swin-based, etc.).
Troubleshoot
To troubleshoot issues, developers can start with the FAQs.
Explore the Results
Explore the application features & output data via the following interfaces:
Kibana Dashboards
You can explore data in Kibana at http://localhost:5601/ (replace with the deployment system IP if opening browser on a remote system). An init container imports a custom dashboard for the MTMC reference app and creates all the required indexes. Access the dashboard by going to Menu/Dashboard and clicking on MTMC-Data-View dashboard.

From this sample screen capture, on the top-left corner, we can see that there are six unique people identified from 34 individual tracking IDs based on the MTMC algorithm. We know this is correct since we know there are only six people in total present in all the seven videos.
The Kibana dashboard is a powerful tool to visualize data. You can interact with any of the visualization panels to filter the data according to a selection. You can read more about dashboards in the official Kibana dashboard documentation.
Reference UI
Once the Multi-Camera Tracking app deployment is complete, open a browser and access the reference UI at http://localhost:3002/ (replace with the deployment system IP if opening browser on a remote system).
Note
It will take some time for UI to start running. You can monitor UI deployment progress from docker logs by running: $docker logs mdx-mtmc-ui --follow
You should be able to see the UI window as below with Events window on the left and floor plan view on the right.
On the event window, it displays all the identified global IDs from the MTMC algorithm. On the floor plan view, the live motion of detected people will be marked with colored trajectories. Each global ID from the left window has multiple associated local IDs, and each local ID has its own moving trajectory. The trajectory of the last local ID from each global ID is plotted on the floor plan view and is color-coded the same as the corresponding global ID.
Note that the max number of global IDs to be displayed, by default 5, can be modified in UI config file.
Other than visualizing Multi-Camera Tracking events as detailed above, the same UI app allows you to perform QBE (Query-by-Example) - query related events for a person identified in a video frame. QBE can be performed over a video associated with an Multi-Camera Tracking event and/or video recordings that are available for the cameras in the environment. Detailed instructions regarding performing QBE can be found at the link below.
For in-depth documentation of the UI, refer to the Multi-Camera Tracking UI section.
API Tutorial Notebook
The reference UI calls the web API endpoints to create those functionalities. You can also explore the results via the provided API endpoints and potentially use them in your own application.
Navigate to http://localhost:8888/ or http://<deployment-machine-ip>:8888/
on your browser, input metropolis
as password/token and direct to .work
folder.
multi-camera-tracking.ipynb
provides a walkthrough of all the key API endpoints in this Multi-Camera Tracking app, and you can try them out by just running through the cells.
Components
To further understand this reference application, here is a brief description of the key components.
Media Management
Both NVStreamer and VST are tailored media microservices with functionalities specialized for management & storage of live camera feeds and pre-recorded videos. NVIDIA media microservice group provides various video management services that are critical for end-to-end intelligent video analytics applications. For more details on NVIDIA Media Service refer to the following sections:
In this reference app, NVStreamer performs as a simulated live camera source where it streams contents from provided videos files as RTSPs. Those RTSPs are pipelined into VST just as live cameras and from there VST performs as the video managements system and interacts with downstream microservices such as Perception and UI.
The key functionalities of NVStreamer and VST in this reference app include:
NVStreamer provides RTSP streaming links from given video files as the input source to VST and considered as simulated live streams.
VST provides RTSP streaming links to Perception (DeepStream) for image processing.
VST creates video clips overlaid with bounding boxes from extracted metadata.
VST provides WebRTC streaming links from given video files for visualization in UI.
VST provides a browser based UI which you can access from http://localhost:30000/.
There are multiple videos files provided in the metropolis-apps-data
folder. Multi-Camera Tracking app uses the seven videos named as Building_K_Cam<id>.mp4
.
Those seven videos are captured in the same room from seven cameras with different viewing angles. There are six unique people present in the seven videos.
Here is a sample view from Building_K_cam1:

Note
Since the seven cameras start recording at the different times, we inserted black frames to ensure the seven videos are synchronized.
Perception (DeepStream)
The Perception (DeepStream) component consists of a PGIE and single camera tracker pipeline, where the tracker deploys a re-identification model to extract feature vectors for each person. Then, it generates streaming perception metadata which is consumed by Metropolis apps via the Kafka broker. These messages correspond to seven sensors, and act as input data to the Multi-Camera Tracking app. The messages from the perception microservice are compressed in protobuf format.
The Key contents of the message are:
sensor ID
frame ID and timestamp
detection bounding box
tracking ID
extracted feature vector
For more information on the schema and contents of the sensor metadata, refer to the Protobuf Schema section.
Note
The provided video files are 10 minutes in length. In this reference application, Perception is configured to terminate when the video streaming reaches the end of file and stop sending metadata.
If you want to reprocess the video from the start, you can restart only the perception microservice (DeepStream app) by running
docker restart mdx-deepstream
.
Behavior Analytics
Behavior Analytics is the core analytics component of Metropolis apps and is built using the Analytics Stream library for Scala. The app can consume metadata from any source which adheres to a pre-defined mdx-schema (see JSON Schema and Protobuf Schema for details). Behavior Analytics generates the following key analytics for Multi-Camera Tracking app:
Behavior data of people (includes speed, direction and distance travelled).
Object feature vector
The results are published to the Kafka message broker for downstream components to consume.
For the Multi-Camera Tracking application the PeopleTracking.scala class in modules/behavior-analytics/src/main/scala/example
folder is run, we encourage the user to go through the sample code. For in-depth documentation of the component, refer to the Behavior Analytics section.
Note
Behavior Analytics relies on a sensor calibration file to process the sensor metadata. Sensor calibration allows us to map the output from the sensor-processing layer onto the Global Information System or Cartesian Coordinate map in Metropolis apps. For more information on calibration, refer to the Camera Calibration section.
Multi-Camera Tracking
The Multi-Camera Tracking component consumes raw data which are processed into behaviors (trajectories, embeddings, etc.), and clusters the behaviors based on re-identification feature embeddings and spatio-temporal information.
More particularly in this reference app, the MTMC-analytics component operates in live mode by consuming raw data from a Kafka topic in micro batches, manages the behavior state, and then merges the clustering results with existing IDs.
The pipeline workflow and the configuration are discussed in-depth in the Multi-Camera Fusion microservice section.
Web API
The Web API component provides REST APIs and Web Socket Interfaces for a wide set of functions on the data produced by Behavior Analytics. The Web API exposes various functions used by the Multi-Camera Tracking UI. Such as:
Fetch unique number of people during a time range
Fetch behavior samples of a given global ID
Example - The following request fetches the number of unique people seen over a given time range:
curl "http://localhost:8081/tracker/unique-object-count?timestamp=T&sensorIds=Building_K_Cam1&sensorIds=Building_K_Cam2&sensorIds=Building_K_Cam7"
where T is a valid timestamp which follows ISO 8601 format. (example: 2024-04-15T20:57:55.000Z
)
The web API component is started with the index.js present in modules/analytics-tracking-web-api
, we encourage you to go through the code. For in-depth documentation of the component, refer to the Analytics and Tracking API section.
Web UI
The Multi-Camera Tracking reference web UI visualizes the valuable insights generated by the multi-camera tracking app and helps in monitoring and management of indoor spaces such as rooms, hallways, entry/exit doors. For in-depth documentation of the component, refer to the Multi-Camera Tracking UI section.
Note
The entire source code for the Multi-Camera Tracking app UI is present in the modules/analytics-tracking-web-ui/apps/mtmc-indoor
folder. Users can extend the given UI or even create a new one that fits their use case while leveraging the endpoints provided by the Web API component.
Conclusion
Congratulations! You have successfully deployed key microservices and built a Multi-Camera Tracking application.
We encourage you to explore the remaining reference applications provided as part of Metropolis Microservices. Below are additional resources:
Quickstart Setup - Guide to deploy all reference applications in the standalone mode via Docker Compose.
Production Deployment Setup - Guide to deploying Metropolis microservices in a Kubernetes environment.
FAQs- A list of commonly asked questions.