Fleet Command provides monitoring capabilities with system logging, usage analytics, and user activity tracking.

Fleet Command Logs

Fleet Command allows you to access your logs from the Fleet Command Console.

By default, only minimum system logs are collected. To enable detailed system and applications logs, go to the Settings Page. Enable All Logs for System and Location and enable toggle bar for Application and Deployment logs.



This will increase log storage on Fleet Command. You can view the usage on Fleet Command Usage Analytics. Logs more than 14 days or longer are not accessible.

In Fleet Command user interface, navigate to Logs.


After navigating to Logs the page will display as shown below.

  • You can select the following values from the drop-down to view the corresponding logs.

    • Location: The location name on that organization.

    • System: The system name that is associated with a location.

    • Deployment: The deployment name from the drop-down list.

    • Component: Select one of the following components from the drop-down.











egx installer_syslog

(Available only if edge system is a real bare metal system)


Ext-auth agent






egx KRS (used for kubelet TLS Bootstrap)



component:vnclog (Remote Console 1.0)

component:RemoteConsole (Remote Console 2.0)





























  • Adjust the logs timeframe from pre-selected values from the drop-down or use custom value and then select the specific date range below.

    ../_images/logs-03.png ../_images/logs-17.png


For logs, the number of pages are restricted to 60,000 only. If it exceeds, you will see the above warning. To avoid this, provide a more specific query.

Deployment Logs - All Locations

  • Deployment logs for all locations can be viewed by clicking the ellipsis.


Deployment Logs Specific - Locations

  • To query deployment logs for a specific location, click the ellipsis from the location under that deployment.

  • This will open a tab/window to the Graylog dashboard with the query shown below.


Troubleshooting Deployments

The Fleet Command search dashboard allows for additional keywords to be used to troubleshoot or pull fine-grained logs specific to each system, component, etc.

  • To see the status of all deployments for a location by viewing the helm logs:

  • To pull more fine-grained Helm logs for a deployment:

  • To see the status of all applications for a location by viewing the kubelet logs:

  • To pull more fine-grained logs to see if an application is running or failed:

  • To get application logs from stdout/stderr streams:


System Logs

All system logs from a location

System logs for a location can be viewed by clicking the ellipsis under the location.


All system logs from an EGX System

To view system logs from an EGX System, click on the action menu option from the specific system under the location.

  • To get more specific logs for your application, you can select multiple values as shown below:



Select the value from each dropdown to combine the queries to get more accurate matches.

Downloading Logs

It is also possible to export your search results as a CSV file. Navigate to the Fleet Command user interface and select Logs.


Click on the Export button to download the logs as a CSV file.


Usage Analytics

Fleet Command usage analytics provide insight into GPU usage, registry usage, and logs usage.


Fleet Command Stack 0.4.47 and above is required for usage analytics.

To view the Usage, navigate to the NGC user interface and select Organization. Next, under Organization, choose Fleet Command.


After selecting Fleet Command, the following page displays your organization’s Fleet Command usage.


On the usage page, you will find the following KPIs:

  • System Name: Choose a name for the new system.

  • Current: Displays the current GPUs usage count.

  • Max: Displays the Max GPU Usage in the current month.

  • Private Registry: Displays the storage used by the private registry.

  • Logs: Displays the current Logs storage usage.

Users can export the usage analytics to a CSV with the Download CSV button.

To disable or enable the view of each section, toggle the slider located in the section:

  • GPUs Under Management: Displays the GPU’s usage over the period in the current month.

  • Maximum GPUs Under Management: Displays the Maximum GPU usage in each month over the current year.

  • Storage: Displays both Private Registry and Logs storage usage in each month over the current year.

  • GPU Inventory: List the location, systems with the GPU type, and number of GPUs attached.

The location name and managed GPUs (GPU names) can be selected under the GPU inventory option.

User Activity

Fleet Command user activity provides insights into the user’s activities within the organization.

  • To view the user activity, navigate to the NGC User interface and select Organization.

  • Next, under the Organization, select Audit.

  • After selecting Audit, the following page displays your organization’s user activity report.

  • Select the date range that reflects the time frame of user activity and then Create a New Request. This action will generate a downloadable report that you can use to view the user activity.


Fleet Command allows you to view system and application-specific metrics for your deployments at edge locations. Metrics are numerical values that measure aspects of your resources at regular intervals. With metrics, you can monitor and analyze the performance of your deployed machine learning inference solutions over time. This information can help you make adjustments to improve resource consumption and the overall performance of your deployments.

The following metrics are available in Fleet Command:

  • System metrics: measurement of edge system resource utilization, including

    • Total CPU Utilization (per core)

    • Total RAM Utilization

    • Total GPU Utilization

    • Total Storage Utilization

    • Total Network Utilization

  • Application utilization metrics: measurement of application system resource utilization, including

    • App CPU Utilization

    • App RAM Utilization

    • App GPU Utilization

    • App Storage Utilization

    • App Network Utilization

Using the Fleet Command metric CLI, you can view detailed metrics under several categories (“buckets”) across organizations, view all metrics in a bucket, or view summary metrics for a particular organization within a given time period. The metrics categories include the following:




Application usage metrics for memory, power, GPU, etc.


Custom metrics exposed by applications.


Metrics for disk, memory, CPU, and network usage.

The following are examples of using the metric command:

  • To see what buckets are contained within an org:

ngc fleet-command buckets --org <org-name>
  • To retrieve a list of defined metrics within a bucket:

ngc fleet-command metric --bucket <bucket-name> --org <org-name>
  • To see a summary of information about a given metric:

ngc fleet-command metric <metric-name> --bucket <bucket-name> -- from-date <from-date> --to-date <to-date>
  • A raw Flux query passthrough is also available that returns a JSON response:

ngc fleet-command metric query <query>

For more information on the metrics command and options, refer to the NGC CLI documentation.

Custom Application Metrics

You can also define and provide custom application metrics from deployed applications and access these aggregated metrics for all the deployments.

Custom application metrics are expected to be exposed as a Prometheus metrics exporter endpoint. For more information on writing custom Prometheus exporters, refer to the Prometheus development documentation.

To expose application metrics, use the following annotations on your application Pod that serves metrics.

  • Enable scraping for this pod.

  • Default value is http.

  • Override the path for the metrics endpoint on the service (default: '/metrics').

  • Used to override the port (default: 9102).

You can retrieve custom metrics through the CLI underneath the ngc fleet-command metric command by using the custom-app bucket.