Fleet Command provides monitoring capabilities with usage analytics, user activity tracking and metrics.
Fleet Command usage analytics provide insights into GPU usage, registry usage, and log usage.
Fleet Command Stack version 0.4.47 or later is required for usage analytics.
To view the usage, navigate to the NGC application and select Organization > Usage in the left navigation.
On the Usage page, select Entitled Products > Fleet Command.
The Overview tab displays a summary of the GPU and storage usage, as well as system types and systems in use.
GPU Usage Overview: Displays the current and monthly peak total GPU usage.
Storage Usage Overview: Displays the current logs and private registry storage usage.
System Types: Displays various system types in use.
System Inventory: Displays a list of systems in use.
The GPU Usage tab displays the following:
Current GPU Usage: Displays the GPU usage by type for the current month.
Monthly Peak: Displays the monthly GPU peak usage by GPU type. You can adjust the month under Timeframe.
Daily Peak: Displays the daily GPU peak usage for the selected GPU type and timeframe. You can change the GPU type to display using the dropdown.
Total Monthly Peak Trend: Displays the current year’s monthly GPU peak trend. Mouse over the graph to see the exact count on the timeline. You can change the date range using the Timeframe selector.
The Storage Usage tab displays both the Private Registry and Logs storage usage in GB for each month in the current year. You can select a different year using the Timeframe selector.
Logs: Displays the current Logs storage usage.
Private Registry: Displays the storage used by the private registry.
Mouse over the graph to view the specific storage size.
If you wish to export the GPU and storage usage analytics to a CSV, click on Download CSV, and the data will be bundled and downloaded to a ZIP file.
Fleet Command user activity provides insights into the user’s activities within the organization.
To view the user activity, navigate to the NGC User interface and select Organization.
Next, under the Organization, select Audit.
After selecting Audit, the following page displays your organization’s user activity report.
Select the date range that reflects the time frame of user activity and then Create a New Request. This action will generate a downloadable report that you can use to view the user activity.
Fleet Command lets you to view system and application-specific metrics for your deployments at edge locations. Metrics are numerical values that measure aspects of your resources at regular intervals. With metrics, you can monitor and analyze the performance of your deployed machine learning inference solutions over time. This information can help you make adjustments to improve resource consumption and the overall performance of your deployments.
The following metrics are available in Fleet Command:
System metrics: measurement of edge system resource utilization, including
Total CPU Utilization (per core)
Total RAM Utilization
Total GPU Utilization
Total Storage Utilization
Total Network Utilization
Application utilization metrics: measurement of application system resource utilization, including
App CPU Utilization
App RAM Utilization
App GPU Utilization
App Storage Utilization
App Network Utilization
Using the Fleet Command
metric CLI, you can view detailed metrics under several categories (“buckets”) across organizations, view all metrics in a bucket, or view summary metrics for a particular organization within a given time period. The metrics categories include the following:
Application usage metrics for memory, power, GPU, etc.
Custom metrics exposed by applications.
Metrics for disk, memory, CPU, and network usage.
The following are examples of using the
To see what buckets are contained within an org:
ngc fleet-command metric buckets --org <org-name>
To retrieve a list of defined metrics within a bucket:
ngc fleet-command metric list --bucket <bucket-name> --org <org-name>
To see a summary of information about a given metric:
ngc fleet-command metric summary <metric-name> --bucket <bucket-name> --from-date <from-date> --to-date <to-date>
A raw Flux query passthrough is also available that returns a JSON response:
ngc fleet-command metric query <query>
For more information on the metrics command and options, refer to the NGC CLI documentation.
Custom Application Metrics
You can also define and provide custom application metrics from deployed applications and access these aggregated metrics for all the deployments.
Custom application metrics are expected to be exposed as a Prometheus metrics exporter endpoint. For more information on writing custom Prometheus exporters, refer to the Prometheus development documentation.
To expose application metrics, use the following annotations on your application Pod that serves metrics.
prometheus.io/scrapeEnable scraping for this pod.
prometheus.io/schemeDefault value is
prometheus.io/pathOverride the path for the metrics endpoint on the service (default:
prometheus.io/portUsed to override the port (default:
Using the custom-app bucket, you can retrieve custom metrics using the CLI
ngc fleet-command metric command.