Technical Brief

One of the biggest challenges in the commercial fleet industry is routing optimization. This is prevalent in many industries, where determining the most cost-effective route can contribute significant cost savings for meal delivery, where a single restaurant franchise can deliver millions of meals a day, or a telecommunications company that dispatches millions of jobs per year. In these large-scale scenarios, inefficient routes can cost billions of dollars in operational costs and reduce our environmental carbon footprint. A computational solver can minimize these inefficiencies by finding the most optimal routes across a list of locations. Computational CPU-based solvers are available today, but using the massive throughput of GPU acceleration, more ambitious algorithms will help fuel our future.

Route optimization problems such as those described above are commonly known as the Traveling Salesperson (TSP) problem. To reduce the time to develop a GPU-accelerated TSP solution, NVIDIA has developed the route optimization workflow to streamline the development of Vehicle Routing Problem (VRP) solutions.

This NVIDIA AI Workflow contains details on how to deploy a sample opinionated AI solution for route optimization; the following items are included:

  • Origin-destination cost-matrix creation

  • Data preprocessing

  • NVIDIA cuOpt™ GPU accelerated solver pipeline

  • Driving directions

  • Map visualization

  • Components for authentication, logging, and monitoring the workflow

  • Cloud Native deployable bundle packaged as a helm chart


The components and instructions used in the workflow are intended to be used as examples for integration and may need to be sufficiently production-ready on their own, as stated. The workflow should be customized and integrated into one’s infrastructure, using the workflow as a reference.

Using the above assets, this NVIDIA AI Workflow provides a reference for you to get started and build your AI solution with minimal preparation and includes enterprise-ready implementation best practices which range from authentication, monitoring, reporting, and load balancing, helping you achieve the desired AI outcome more quickly while still allowing a path for you to deviate.

NVIDIA AI Workflows are designed as microservices, which means they can be deployed on Kubernetes alone or with other microservices to create a production-ready application for seamless scaling in your Enterprise environment.

The workflow components are packaged together into a deployable solution described in the diagram below:


These components are used to build and deploy the route optimization cuOpt microservice, integrated with the additional features as indicated in the diagram below:


More information about the components used can be found in the section below and in NVIDIA Cloud Native Service Add-on Pack Deployment Guide.

Furthermore, the route optimization workflow describes integrating the cuOpt Server API with third-party APIs for route mapping and map visualization.


cuOpt Microservice

The NVIDIA route optimization workflow packages cuOpt server as a cloud-native microservice, and deploys it as a helm chart. This microservice includes basic authentication and proxy configurations to authorize and filter requests sent to the service, using an application level load balancer (Envoy) and an Identity Provider (Keycloak) for JSON Web Token Authentication. For more information about the authentication portion of the workflow, refer to the Authentication section in the Appendix.

Sample Data

The NVIDIA route optimization workflow uses the NVIDIA cuOpt server via a representational state transfer (REST) microservice API to generate routes. To do this, a series of sample synthetic datasets are included within the workflow to assign orders to a fleet of delivery drivers. Three CSV files are used by workflow to assign the drivers to their appropriate orders; Orders, Depot and Route.


An Order can be delivered to a customer, a pickup from a customer, or some other type of work. Examples include a furniture delivery, a grease pickup from a restaurant, or an inspection visit. This workflow looks at deliveries from a distribution center to stores. The Order dataset includes the stores’ data. This includes store name, location, start and end time (store hours), demand (order weight in pounds), and service time (how long it will take to unload the package).


A Depot is a location that a vehicle departs from at the beginning of its workday and returns to at the end of the workday. Depots are locations where the vehicles are loaded (for deliveries) or unloaded (for pickups). Sometimes, a depot can also act as a renewal location whereby the vehicle can unload, reload, and continue performing deliveries and pickups. A Depot has open and close times, as specified by a hard time window. Vehicles can’t arrive at a Depot outside this time window. In this route optimization workflow, vehicles depart the depot in the morning and return at the end of the day. The depots’ information includes names, locations, and start and end times (operation hours).


Route information specifies the vehicle and driver characteristics, such as the vehicle capacity, work shift hours, and driving range, and it represents the traversal between depots and orders. The features needed here are vehicle name/ID number, start and end depot name, start and end time (vehicle/driver shift hours), and carrying capacity.

The sample AI Workflow uses a combination of these three CSVs to find the best cost-effective route using your data for your specific use case. For example, within the Order CSV file, the package weight is indicated, and the Route CSV contains the route of the delivery truck with the maximum order weight. The Route is assigned to a Depot.


You might have additional features depending on the problem, such as order priorities or vehicle break time windows. Other features would be preprocessed similarly to the features shown in the workflow.

Jupyter Notebook Client

Data Preprocessing

The cuOpt server has a set of data requirements handled in the pre-processing stage that includes modeling the data as arrays of integers and creating a cost matrix. This is done in the Jupyter notebook client, where the route optimization workflow uses the Open Source Routing Machine (OSRM) API as an open-source router that uses OpenStreetMap. We use OSRM to build the cost matrix that represents the travel time from one depot or order to another. Once the preprocessing stage is complete, the data from the three datasets mentioned above and the cost matrix are sent over and imported to the cuOpt Server via API calls, again made using the Jupyter notebook client.

Below is an example cost matrix. If a problem has five total locations, then the cost matrix, in the form of a dataframe, will look like this.


This cost matrix represents the travel time in minutes, as used in the workflow, such that the traveling from location 0 to location 1 takes 16.546667 minutes. Note that the cost of going from a location to itself is typically 0, and the cost of going from location A to location B is not necessarily equal to going from location B to location A.

Route Mapping

Once the cuOpt solver returns optimized routes, the route optimization workflow uses OSRM again to visualize the routes. OSRM parses the cuOpt solver response, converts locations to coordinate points, and then maps the routes. These optimized routes inherently include driving directions for each order.


Prometheus is an open-source monitoring and alerting solution. This workflow stores GPU performance metrics from the DCGM Exporter in GPU Operator, which enables System Administrators to understand the health and throughput of the system.

While the metrics are available in plain text, Grafana is also used for visualization of the metrics via a dashboard. Some of the metrics available, for example, are shown below; depending on the usage metrics, the cuOpt pods can be scaled manually or automatically.

A table of reported metrics and an example Grafana dashboard are shown below:

Metric Name



GPU Temperature

DCGM_FI_DEV_GPU_TEMP{instance=~"${instance}", gpu=~"${gpu}"}

GPU temperature (in C)

GPU Average Temperature

avg(DCGM_FI_DEV_GPU_TEMP{instance=~"${instance}", gpu=~"${gpu}"})

Average GPU temperature (in C)

GPU Power Usage

DCGM_FI_DEV_POWER_USAGE{instance=~"${instance}", gpu=~"${gpu}"}

Power draw (in W)

GPU Power Total

sum(DCGM_FI_DEV_POWER_USAGE{instance=~"${instance}", gpu=~"${gpu}"})

Total power draw (in W)

GPU SM Clocks

DCGM_FI_DEV_SM_CLOCK{instance=~"${instance}", gpu=~"${gpu}"} * 1000000

SM clock frequency (in Hz)

GPU Utilization

DCGM_FI_DEV_GPU_UTIL{instance=~"${instance}", gpu=~"${gpu}"}

GPU utilization (in %)

Tensor Core Utilization

DCGM_FI_PROF_PIPE_TENSOR_ACTIVE{instance=~"${instance}", gpu=~"${gpu}"}

Ratio of cycles the tensor (HMMA) pipe is active (in %)

GPU Framebuffer Memory Used

DCGM_FI_DEV_FB_USED{instance=~"${instance}", gpu=~"${gpu}"}

Framebuffer memory used (in MiB).


© Copyright 2022-2023, NVIDIA. Last updated on May 23, 2023.