Troubleshooting

Fleet Command collects log messages from edge systems and locations. The edge system sources include Kubernetes services, running Kubernetes system pods, Fleet Command stack system pods, and user application pod output. The log messages are recorded, tagged with additional keys and values, and aggregated into a central database. You can then query this database with the web UI and the command-line interface, using the keys/values as filters.

Fleet Command allocates 400 GB of storage for log rotation. Older logs are removed when 400 GB is reached.

There are two settings for logging messages from edge locations in Fleet Command:

  • System and location logging

  • Application and deployment logging

Both settings are available on the Fleet Command > Settings page.

logging-settings.png

By default, both logging options are disabled and can be enabled by an administrator.

Enabling all system and location or application and deployment logging increases log utilization on Fleet Command. Refer to Fleet Command Usage Analytics to learn how to view the log usage.

System and Location Logging

System and location logging has two options: Fleet Command Only or All Logs. Selecting Fleet Command Only will only send a subset of logs from components running on the edge location systems, while All Logs will send all logs from system components back to the Fleet Command logging service.

Logs from edge systems are categorized by the “component” which corresponds to the service running on the system generating the log. These components can be used in the logging screen to filter messages based on the component generating them.

logging-component.png

When Fleet Command Only is selected, logs from the following components are sent:

Component

Description

Used for

kernel Linux kernel messages Hardware-level issues
kubelet.service Kubernetes node agent
RemoteConsole Remote Console access messages ( collected from the cloud service, not the edge system)

When All Logs is selected, logs from the following components are sent in addition to the components for Fleet Command Only:

Component

Description

Used for

auditd.service System event auditing Hardware-level issues
sshd.service Secure shell server
containerd.service Container system services (was dockerd.service in older versions) Errors/messages relating to downloading and running container images
egx-bootstrap.service Bootstrap service
egxd-cred-proxy.service Credential Proxy Service
egx-nlm.service Node Lifecycle Management service Remote Application Access
helm Helm Operator Errors/messages relating to fetching, installing application charts
kube-proxy Kubernetes application proxy
kube-apiserver Kubernetes API services
calico-kube-controllers Kubernetes networking services
calico-node Kubernetes networking services
kube-scheduler Kubernetes resource scheduling
etcd Kubernetes configuration storage
nvidia-device-plugin-ds NVIDIA device plugin for Kubernetes
eac Edge admission controller Errors/messages related to allowing or denying application deployments based on requested system resources (e.g. hostPaths, etc)
fluentbit Log forwarding
efa Edge federation agent

The components listed in the previous table are subject to change.

All system logs from a location

You can view system logs for a location by selecting View Logs from the options button for the location.

logs-12.png

All system logs from Fleet Command

To view system logs from Fleet Command, select View Logs from the action menu of the system.

logs-13.png

To get more specific logs for your application, specify a search term and multiple filters as shown below:

logs-14.png

Application and Deployment Logging

You can enable or disable application and deployment logging. When this option is enabled, edge locations send logs from your application deployments to the Fleet Command logging service. When it is disabled, application deployment logs are not sent. Existing messages are available until the fourteen-day deletion policy.

Logs from application deployments are categorized by deployment name. The logging messages are created by output (stdout and stderr) from containers running in the deployment. You can use the deployment name to filter messages from a particular deployment only.

logging-deployment.png

Application deployment logs only contain messages from running containers in deployments and do not include any messages from system components that might be related to creating and launching a deployment.

For example, a deployment could fail because the Helm chart could not be fetched. In this case, there are no messages for the deployment name in the logging screen. However, there log messages from the Helm component might describe the issue with fetching the Helm chart.

Viewing Logs

  • Select Fleet Command > Logs.

    logs-02.png

You can select values from the following filters to limit the number of log messages:

  • Location: The location name of that organization.

  • System: The system name associated with a location.

  • Deployment: The deployment name from the drop-down list.

  • Component: Select one of the following components from the drop-down.

Adjust the logs timeframe from pre-selected values from the time interval menu or specific a date range.

logs-03.png

Logs messages are limited to 60,000 only. If it exceeds, you will see the above warning. To avoid this, provide a more specific query.

Deployment Logs - All Locations

  1. Select Fleet Command > Deployments.

  2. Click the actions button and select View Logs.

    logs-05.png

Deployment Logs Specific - Locations

  1. Select Fleet Command > Deployments.

  2. Select the deployment from the table of deployments.

  3. On the deployment details page, click the options button for the location and select View Logs.

    deps-details-view-logs.png

Troubleshooting Deployments

The Fleet Command search dashboard allows for additional keywords to be used to troubleshoot or pull fine-grained logs specific to each system, component, etc.

  • To see the status of all deployments for a location by viewing the helm logs:

    logs-07.png


  • To pull more fine-grained Helm logs for a deployment:

    logs-08.png


  • To see the status of all applications for a location by viewing the kubelet logs:

    logs-09.png


  • To pull more fine-grained logs to see if an application is running or failed:

    logs-10.png


  • To get application logs from stdout/stderr streams:

    logs-11.png


Downloading Logs

To download logs with the web interface, perform the following steps:

  1. Select Fleet Command > Logs.

  2. Select the filters to apply and click Export to download the logs to a CSV file.

    logs-16.png

To download logs with the NGC CLI, perform the following steps:

  • Run the ngc fleet-command logs command:

    Copy
    Copied!
                

    $ ngc fleet-command log download --range 30 --system demo-system-0 --location demo-location --component helm --name fc.log

    Log messages are download to the fc.log file. The file includes all log messages over the last 30 seconds from the helm component running on the system demo-system-0 in location demo-location.

    Refer to the Fleet Command CLI documentation for more information.

You can also use the remote console feature of Fleet Command to help you troubleshoot issues with your deployments. Refer to Remote Console for more information.

Previous Monitoring
Next Deployment Example
© Copyright 2022-2024, NVIDIA. Last updated on Apr 3, 2024.