Troubleshooting

Fleet Command collects log messages from edge systems and locations. The edge system sources include Kubernetes services, running Kubernetes system pods, Fleet Command stack system pods, and user application pod output. The log messages are recorded, tagged with additional keys and values, and aggregated into a central database. You can then query this database with the web UI and the command-line interface, using the keys/values as filters.

Fleet Command allocates 400 GB of storage for log rotation; this is currently not a customer-editable option. Older logs will be removed once 400 GB is reached.

There are two settings for logging messages from edge locations in Fleet Command:

  • System and location logging

  • Application and deployment logging

Both settings are available on the Fleet Command > Settings page.

logging-settings.png


By default, both logging options are disabled and can be enabled by an admin user.

Note

Enabling all system and location or application and deployment logging will increase log utilization on Fleet Command. Refer to Fleet Command Usage Analytics on how to view the log usage.

System and Location Logging

System and location logging has two options: Fleet Command Only or All Logs. Selecting Fleet Command Only will only send a subset of logs from components running on the edge location systems, while All Logs will send all logs from system components back to the Fleet Command logging service.

Logs from edge systems are categorized by the “component” which corresponds to the service running on the system generating the log. These components can be used in the logging screen to filter messages based on the component generating them.

logging-component.png


When Fleet Command Only is selected, logs from the following components are sent:

Component

Description

Used for

kernel Linux kernel messages Hardware-level issues
kubelet.service Kubernetes node agent
RemoteConsole Remote Console access messages ( collected from the cloud service, not the edge system)

When All Logs is selected, logs from the following components are sent in addition to the components for Fleet Command Only:

Component

Description

Used for

auditd.service System event auditing Hardware-level issues
sshd.service Secure shell server
containerd.service Container system services (was dockerd.service in older versions) Errors/messages relating to downloading and running container images
egx-bootstrap.service Bootstrap service
egxd-cred-proxy.service Credential Proxy Service
egx-nlm.service Node Lifecycle Management service Remote Application Access
helm Helm Operator Errors/messages relating to fetching, installing application charts
kube-proxy Kubernetes application proxy
kube-apiserver Kubernetes API services
calico-kube-controllers Kubernetes networking services
calico-node Kubernetes networking services
kube-scheduler Kubernetes resource scheduling
etcd Kubernetes configuration storage
nvidia-device-plugin-ds NVIDIA device plugin for Kubernetes
eac Edge admission controller Errors/messages related to allowing or denying application deployments based on requested system resources (e.g. hostPaths, etc)
fluentbit Log forwarding
efa Edge federation agent

The components listed in the previous table are subject to change.

All system logs from a location

You can view system logs for a location by clicking the ellipsis under the location.

logs-12.png

All system logs from Fleet Command

To view system logs from Fleet Command, click on the action menu option from the specific system under the location.

logs-13.png

  • To get more specific logs for your application, you can select multiple values as shown below:

    logs-14.png


Note

Select the value from each drop-down to combine the queries to get more accurate matches.

Application and Deployment Logging

Application and Deployment Logging has an on or off option. When this option is on, edge locations will send logs from your application deployments to the Fleet Command logging service. When it is off, application deployment logs are not sent. Existing messages are available until the fourteen-day deletion policy.

Logs from application deployments are categorized by “deployment” which corresponds to the deployment name in Fleet Command and includes any logging output (stdout and stderr) from containers running in the deployment. You can use the deployment name to filter messages only from a particular deployment.

logging-deployment.png


It is important to note that application deployment logs only contain messages from running containers in deployments and do not include any messages from system components that might be related to creating and launching a deployment.

For example, a deployment could fail because the Helm Chart could not be fetched. In this case, there will not be any messages for the deployment name in the logging screen; however, there will be messages from the ‘helm’ component describing the issues with the Helm Chart fetch.

Viewing Logs

In Fleet Command user interface, navigate to Logs.

logs-01.png

After navigating to Logs the page will display as shown below.

logs-02.png

  • You can select the following values from the drop-down to view the corresponding logs.

    • Location: The location name of that organization.

    • System: The system name associated with a location.

    • Deployment: The deployment name from the drop-down list.

    • Component: Select one of the following components from the drop-down.

  • Adjust the logs timeframe from pre-selected values from the drop-down or use a custom value and then select the specific date range below.

    logs-03.png

    logs-17.png


Note

For logs, the number of pages is restricted to 60,000 only. If it exceeds, you will see the above warning. To avoid this, provide a more specific query.

Deployment Logs - All Locations

  • Deployment logs for all locations can be viewed by clicking the ellipsis.

    logs-05.png


Deployment Logs Specific - Locations

  • To query deployment logs for a specific location, click the ellipsis from the location under that deployment.

  • This will open a tab/window to the Graylog dashboard with the query shown below.

    logs-06.png


Troubleshooting Deployments

The Fleet Command search dashboard allows for additional keywords to be used to troubleshoot or pull fine-grained logs specific to each system, component, etc.

  • To see the status of all deployments for a location by viewing the helm logs:

    logs-07.png


  • To pull more fine-grained Helm logs for a deployment:

    logs-08.png


  • To see the status of all applications for a location by viewing the kubelet logs:

    logs-09.png


  • To pull more fine-grained logs to see if an application is running or failed:

    logs-10.png


  • To get application logs from stdout/stderr streams:

    logs-11.png


Downloading Logs

You can download logs from Fleet Command using the NGC web UI and the NGC CLI.

To download logs via the web UI, navigate to Fleet Command > Logs.

Select from the available filters and click Export to download the logs to a CSV file.

logs-16.png

logging-export.png


To download logs via the NGC CLI, you can use the ngc fleet-command logs command; for example:

Copy
Copied!
            

$ ngc fleet-command log download --range 30 --system node-01 --location location-01 --component helm --name fclog.log


This will download to the file fclog.log all log messages over the last 30 seconds from the helm component running on the system node-01 in location location-01.

Refer to the Fleet Command CLI documentation for more information.

You can also use the Remote Console feature of Fleet Command to help you troubleshoot issues with your deployments. Refer to Remote Console for more information.

© Copyright 2022-2023, NVIDIA. Last updated on Nov 20, 2023.