Advanced Features

This chapter describes the more advanced features of Fleet Command including administrative settings, security overrides, and remote management.

This section helps to understand Fleet Command settings. To view the settings, navigate to the Settings page in Fleet Command.

managing-your-setup-14.png

  • System: Get the latest Fleet Command system image and source code for the system image.

  • Remote Management: Enable or disable Remote Console and system reboot option and modify the remote console timeouts. Enable or disable remote application access and modify the remote application access timeouts. For more information on Remote Console or remote application access, refer to Remote Management.

  • Deployment Security Overrides: Enable or disable selecting security overrides when creating deployments. For more information, refer to Deploying an Application.

    Important

    Enabling or disabling Deployment Security Overrides in Settings only affects new deployments created after the setting is applied. Disabling security overrides will not affect existing deployments that have overrides configured. These deployments will need to be deleted and recreated with security overrides disabled.

    managing-your-setup-15.png

    If Fleet Command was previously enabled for Device Access, you would now need to enable Deployment Security Overrides to achieve similar functionality when creating new deployments.

    managing-your-setup-16.png


  • Logs: Enable All Logs to capture all system and location logs and enable application and deployment logging. For more information on logs, refer to Fleet Command Logs.

    Note

    By default, both logging options are disabled and can be enabled by an admin user.


Fleet Command provides the following ways to remotely access and manage your systems and applications deployed at the edge site.

  • Remote Console : to create remote shell sessions

  • Remote Application Access : to access web-based services

Remote Console

Fleet Command allows you to access your system remotely with the help of Remote Console.

Remote Console allows Fleet Command administrators to remotely start and access shell sessions for each system. One console can be created for each system, up to 20 consoles per organization. Multiple administrators can access the system through the remote console. Remote Console will create a shell session within the system for each user.

Note
  • Remote Console is available in Fleet Command Stack 1.3.0 and above.

  • To access the console remotely, you will need TLS 443 outbound connection to be allowed from the system. Refer to Edge Site Requirements for additional information.

Enabling Remote Console

  1. Within the Fleet Command User Interface, navigate to Settings.

    troubleshooting-fleet-command-01.png


  2. Toggle the slider under Remote Management to enable Remote Console.

    troubleshooting-fleet-command-02.png


  3. If necessary, configure the timeouts as required for your use case.

    • Max is the maximum time allowed for remote console sessions; after max time is reached all the active remote console sessions will be closed automatically.

    • Inactivity is the timeout for idle shell sessions, each user will get their own shell session, and inactivity time will be tracked independently for each user.

    Note

    Session time should range from 2 minutes up to an hour. It is recommended that idle time should match or be less than the session time.


Accessing Remote Console

  1. Access to the Remote Console is available from the system ellipsis.

    fc-qsg-manage-edge-system-menu.png


  2. When you click on Start Remote Console, it will ask you to authenticate again for additional security. Login again to access Remote Console.

    troubleshooting-fleet-command-28.png


  3. After successful authentication, a session is created in the background. A pop-up will display the remote console with the system name.

    troubleshooting-fleet-command-04.png


  4. Click Open to launch the Remote Console.

    troubleshooting-fleet-command-45.png

    Note

    Depending on the Fleet Command version, you will be presented with the Remote Console interface or prompted to update the location version.

    troubleshooting-fleet-command-46.png


  5. When the Remote Console launches, it will redirect to a new browser session or tab, as shown below.

    troubleshooting-fleet-command-29.png


  6. To switch to the root user for the troubleshooting, follow the steps as shown below and enter the Fleet Command Administrator password of your edge system for the password prompts.

    Copy
    Copied!
                

    rcuser@ip-172-31-41-77:~$ su admin Password: admin@ip-172-31-41-77:/home/rcuser$ sudo su [sudo] password for admin: root@ip-172-31-41-77:/home/rcuser$

    Note
    • The rcuser and admin user are the basic users that can run nvidia-smi and basic Linux commands without sudo.

    • The root user can run all the commands like kubectl to debug the edge system.


  7. Right-click on the shell to display the available options for Remote Console.

    troubleshooting-fleet-command-31.png


  8. Select the text that you would like to copy and paste using the remote console menu for troubleshooting as shown below.

    troubleshooting-fleet-command-32.png

    troubleshooting-fleet-command-33.png


  9. Another option is you can use the Paste from the browser from the remote console menu to paste into a shell.

    troubleshooting-fleet-command-34.png

    troubleshooting-fleet-command-35.png


  10. If you don’t have a physical keyboard, you can use the Onscreen Keyboard option with Remote Console 2.0.

    troubleshooting-fleet-command-36.png

    troubleshooting-fleet-command-37.png

    troubleshooting-fleet-command-38.png


  11. For container runtime troubleshooting, you can use CTR CLI or Container Runtime Interface (CRI) CLI.

    troubleshooting-fleet-command-39.png

    troubleshooting-fleet-command-40.png

    Note
    • Use Kubernetes Cheat Sheet for any deployment issues from Remote Console.

    • The command history of the Remote Console will be purged when the console shell session is closed or max session timeout.


  12. If the remote session is closed for any reason, you can Open the remote console session from the Remote Consoles banner under each location. You can click End the active remote consoles to close all active remote console sessions for this system.

    troubleshooting-fleet-command-08.png


  13. For each Fleet Command organization, a maximum of 20 active remote console sessions are allowed. If more than 20 remote console sessions are attempted, you will receive the following error message.

    troubleshooting-fleet-command-10.png


Disabling Remote Console

Important

Disabling Remote Console within Fleet Command Settings will immediately close all remote consoles within the organization.

  1. Within the Fleet Command User Interface, navigate to Settings.

    troubleshooting-fleet-command-42.png


  2. Toggle the slider under Remote Management.

    troubleshooting-fleet-command-43.png


If there are any active remote consoles, you will be prompted before disabling the remote management.

troubleshooting-fleet-command-44.png

Rebooting a System

Important

While a system is rebooting, applications will stop running, and applications will start after the system rebooted.

  1. To reboot a system, click on the ellipsis under a System and select Reboot System.

    system-menu-reboot-system.png

    In the confirmation dialog, click Reboot.

    reboot-system-confirmation-dialog.png


  2. You might see the Message below when the system is rebooting.

    troubleshooting-fleet-command-13.png

    • If the remote console session is active, you will be prompted with the below pop-up.

      troubleshooting-fleet-command-41.png


  3. The UI will show the results of the reboot, followed by a success message.

    troubleshooting-fleet-command-14.png


Remote Application Access

Fleet Command allows you to access web-based application services running on edge systems remotely from your local machine. Remote application access is available via a unique URL to users with Fleet Command administrator and operator roles. Remote application access features the following:

  • A configurable time allowance to access services. When the time expires, remote access to services for that application will automatically end. This greatly simplifies resource management and frees up available remote sessions for other services.

  • Accessible by multiple users in multiple locations. Regardless of the remote access origin, all admin and operator users in the same organization have access to the remote services.

Configuring your Remote Application

To support remote application access, you must explicitly configure applications to allow remote access via the appProtocol field in the Kubernetes service.

Fleet Command invokes your application using a mapping to the root location of your web service. If your application requires additional paths for access, Fleet Command recommends that you configure a redirect from the application root location to the full path of your application.

The following example shows a mapping of the full path of a web application to the resulting remote application service URL :

remote-app-access-sample-app-mapping.png

  • The web application root location http://<ip_address_of_node>:31115 is mapped to https://<fc_remaac_location>-31115.rmsession.<org>.egx.nvidia.com where <fc_remaac_location> is the name of your system and <org> is your organization name.

  • Fleet Command will invoke your application using the mapped root location - https://<fc_remaac_location>-31115.rmsession.nvidia.egx.nvidia.com, as in this example.

  • If you have already set up a redirect from http://<ip_address_of_node>:31115 to http://<ip_address_of_node>:31115/WebRTCApp/play.html?name=videoanalytics, your application will launch automatically. Fleet Command recommends using NGINX Ingress to set up the redirect and to access the application via the Ingress NodePort.

  • If the redirect is not in place, you will need to manually append the rest of the application path to the mapped root location in the browser.

Refer to Fleet Command Helm Chart Requirements for more information on configuring the application.

Enabling Remote Application Access

Fleet Command administrators can enable remote application access in Fleet Command > Settings. Under Remote Management, toggle the Enable remote application access setting.

remote-app-access-settings-enabled.png

Configuring Access Timeout

The remote application access timeout is the duration where application services are accessible. When remote application access is enabled in Settings, administrators can adjust the maximum timeout. The default timeout is 3 hours. To adjust the timeout, navigate to Fleet Command > Settings. Under Remote Management, set the maximum hours and minutes in the Remote application access timeout field.

remote-app-access-settings-timeout.png

Starting Remote Application Access

This section is for Fleet Command users with administrator or operator roles and describes the steps for starting remote application access.

You must choose a location first to view a list of available application services.

  1. Go to Fleet Command > Deployments in the navigation menu.

    In the deployments page, choose a deployment and click on

    expand-caret.png

    arrow to view the list of locations.

    remote-app-access-deployments-view-locations.png


  2. To view the details for a location, click on “…” (ellipsis) menu and select View Details.

    remote-app-access-deployment-view-locations.png


  3. The Details pane displays the location information and application services deployed at this location. The list displays all services for this application, but only web-based services with unique URLs are remotely accessible.

    Application service names are generated using the following pattern:

    Copy
    Copied!
                

    default/<application name>/<protocol>

    Web-based services are identified by the ‘http’ protocol.

    remote-app-access-deployment-details-pane.png

    Important

    If the application service is not explicitly enabled for external access, and/or the appProtocol field is not applied to the application service, the service will still be listed with the ‘http’ protocol by default. However, note that only actual web-based services will function properly. Accessing the link for non-web-based services will fail.

    If an application does not have services, you will see the message No application services found in the deployment Details pane for that location.

    remote-app-access-no-services.png

    To start remote application access for a service, click on a Service Name link.

    The Remote Application Access confirmation dialog appears with the allotted time for accessing the remote service. When the time expires, users are unable to access the services, and all running sessions will end.

    Click Continue to open the application service or Cancel to exit.

    remote-app-access-start-services-dialog-confirm.png

    While the connection is in progress, you will see the following progress dialog:

    remote-app-access-progress-dialog.png

    After the progress dialog dismisses, you will see a Remote Application Access banner in the deployment section. The banner shows currently opened services and their expiration time in hours and minutes. A badge appears on the top right corner of the page showing the number of currently opened services.

    You can click on Open to start or view the service or End to stop the remote application access.

    remote-app-access-started-banner-badge.png


  4. The application service will open in a new browser tab. The following image shows an example remote service:

    remote-app-access-sample-app.png

    Note

    In some cases, the remote application may not open automatically in a browser tab. If the window does not open automatically, check the pop-up blocker for your web browser, and configure it to allow pop-ups from Fleet Command. Click the Open button to open the window manually.

    Important

    If your application does not load correctly, make sure you follow the Configuring your Remote Application instructions.


Maximum services

You may enable remote access up to twenty services within an org. If the maximum is reached, you will see a Remote Application Access Limit Reached dialog when starting a remote service in the deployments Details pane. You must close a service that’s not in use before you can open another.

remote-app-access-maxusers-dialog.png


Ending Remote Application Access

Fleet Command users with administrator or operator roles can end access to services for all users in the org.

To end access to remote application services, follow these steps:

  1. Click Fleet Command > Deployments in the navigation menu.

  2. In the Remote Application Access banner, click on End to stop the service.

    remote-app-access-banner-end-service.png


  3. A confirmation dialog appears warning the action will end the connection for other users. Click End to proceed or Cancel to exit.

    remote-app-access-end-service-confirm.png

    If there are no remaining open access sessions for this deployment, the banner will dismiss on its own.

Disabling Remote Application Access

Administrators can disable remote application access for all users in the org.

To disable remote access, navigate to Fleet Command > Settings. Under Remote Management, toggle off the Enable remote application access setting.

remote-app-access-settings-disable.png


If remote services are in use, a confirmation dialog appears. Click Disable to end access for all users or Cancel to exit.

remote-app-access-settings-disable-confirm.png


Once remote access is disabled, users are unable to start any remote access for any deployments. The Service Name links in the Details pane are not clickable.

remote-app-access-deployment-details-links-unclickable.png


If the option to allow Security Overrides in Fleet Command Settings is enabled, additional configuration options are available that may turn off enforcement of specific security policies and enable the application to access more system hardware and software. These security overrides will reduce system security and should be used at the Administrator’s own risk. NVIDIA recommends that a minimum of security overrides be applied only for testing and troubleshooting to ensure maximum security of the system.

The following security overrides are available:

  • Enable all overrides

  • Allow system device access: This allows access to devices mounted at /dev.

  • Allow HostPaths: This enables access to any hostpath. If this option is disabled (by default), only /mnt, /opt, /tmp, and /etc/localtime are allowed.

  • Allow HostPath mount via PersistentVolumes: Enable the application to use hostPath for Persistent Volumes.

  • Allow HostNamespace (HostIPC and HostPID): Enable application access to the host system processes and inter-process interactions.

  • Allow Linux capabilities: Enable applications to use any Linux capabilities.

  • Allow PrivilegedContainers: This option allows privileged containers that run as root to run.

There may be certain combinations of locations and security overrides that are not supported. For details, refer to the following:

  • During Deployment creation, if another location is added after security overrides other than the Allow system device access option are applied, and the second location that was added is on a version older than 1.9.3, you will be prompted to disable all other security overrides except Allow system device access as per below.

managing-your-setup-17.png

  • During Deployment creation, when you choose two locations with different versions, by default, all other security overrides will be disabled except Allow system device access.

managing-your-setup-18.png

  • All security overrides will be grayed out and uneditable when editing the deployment. Security overrides cannot be changed after the Deployment has been created. To change security overrides, the deployment must be removed and recreated.

Note

Security override options selected when a Deployment was first created cannot be changed when editing a deployment irrespective of the Deployment Security Overrides option in Fleet Command Settings.

managing-your-setup-19.png

  • When editing the deployment, if the current deployment with location(s) is enabled with security overrides other than Allow system device access, all locations on versions older than 1.9.3 will be unselectable.

managing-your-setup-20.png

Important

If Security Overrides are applied, a message will be displayed in the Location(s) and Deployments.

managing-your-setup-21.png

managing-your-setup-22.png

Important

If NVIDIA Multi-Instance GPU (MIG) is enabled on the system(s) with an active deployment, the currently running application(s) will be stopped during MIG configuration. If the Application Compatibility mode is On, the running application(s) will restart after MIG is enabled. If Application Compatibility mode is Off, the application(s) may not restart successfully and require reconfiguration and redeployment. More information on Application Compatibility mode can be found later on, and information about configuring applications for MIG can be found in the Application Guide.

NVIDIA Multi-Instance GPU (MIG) allows supported GPUs to be partitioned and used across multiple applications. Fleet Command allows you to enable and configure MIG on supported systems directly from the UI on location versions 1.5.2 or later.

Note

MIG capability is only supported on NVIDIA A30 and A100 GPUs. For unsupported GPUs, the MIG tab in the System Details panel will display that no MIG-capable GPUs are present.

mig-no-gpus.png

For additional information on MIG, refer to the Multi-Instance GPU User Guide.

To view additional information about MIG, click on the Multi-Instance GPU (MIG) tab.

mig-select-config.png

To configure MIG, select the GPU, choose the MIG profile from the drop-down menu, select the checkbox to understand the application disruption, and click on Save to apply the changes. Once you apply the MIG configuration, it will take a minute or two to reflect the changes on the system.

mig-select-config-save-ann.png

Fleet Command allows you to configure specific MIG profiles for A30 and A100. The supported MIG profiles are listed in the following table.

MIG Available Options

Supported GPUs

2 MIGs of 3c.20gb

A100

3 MIGs of 2c.10gb

A100

7 MIGs of 1c.5gb

A100

2 MIGs of 2c.12gb

A30

4 MIGs of 1c.6gb

A30

Accessing Your System

Different configuration combinations for systems with more than one MIG-capable GPU may impact application compatibility mode.

  • If MIG is not enabled on all GPUs in a multi-GPU system, Application Compatibility mode will be Off, and a specific application configuration will be required to use MIG.

  • If MIG is enabled on all GPUs, but with different MIG profile sizes, the Application Compatibility mode will also be Off.

  • If MIG is enabled on all GPUs with the same MIG profile sizes, Application Compatibility mode will be On.

Refer to the Application Guide for additional information on Application Compatibility mode and configuring applications for MIG.

mig-app-compat-ann.png

mig-profile-ann.png

Note

This feature is available only on NVIDIA-Certified x86 systems.

While installing the Fleet Command Discovery Image (ISO) onto edge systems, administrators can customize the utilization of physical storage devices attached to the system and designate storage areas where data can persist across OTA updates, system reboots and application lifecycle stages.

Note

Admins must customize configurations during the initial provisioning of the system. If any changes are required after provisioning, they must rerun the installation to choose a different configuration. Any changes made outside of the installation process will be overwritten on system reboot or an OTA update or result in the system being quarantined (the system is disabled and requires re-installation to use again).

Note

If admins choose the default installer options, the installer will select the largest drive for OS installation and assimilate all of the drives into the same data partition. This could lead to potential performance degradation.

Here are the available customization options:

  • select the disk to format and install the Fleet Command Stack and system OS

  • select the disk to format and store persistent application deployment data

  • select additional drives to format and mount to the system as separate mount points

  • specify custom aliases for mounted drives

  • choose alternative destinations for logs and container resources (running images, source images)

Unused space on the system OS disk will be used for application data; additional drives added will extend the application data to those disks as well in a single logical mount.

The following steps describe each customization option in detail. The drive serial numbers and attributes shown in the dialogs are sample data.

  1. Installing Discovery Image (ISO)

    You can select from the available drives to install the Fleet Command OS. The Fleet Command OS will take up approximately 9 GB of space; unused space on this drive will be used as the data partition.

    adv-storage-config-stack.png


  2. Selecting additional drives for the data partition

    You can designate additional drives to extend the data partition onto. These drives will be added to a logical mount point along with the unused space from Step 1.

    adv-storage-config-lvm.png

  3. Mounting additional drives in separate mount points

    You can mount additional drives as separate mount points to be used for application deployment data, logs, and container images. In the following steps, you can allocate specific drives for the various data types.

    adv-storage-config-mount-points.png

  4. Creating custom drive aliases

    You can create unique aliases (similar to symbolic or soft links) for drives you have specified in the prior steps.

    adv-storage-config-cust-labels.png

  5. Storing system logs

    You can choose to store system logs in a drive you selected in Step 3.

    adv-storage-config-logs.png

  6. Storing application container resources

    You can choose to store application container images in a drive you selected in Step 3.

    adv-storage-config-container-images.png

  7. Viewing storage configuration summary

    In this dialog, you will find a summary of your drive selections and allocations.

    adv-storage-config-summary.png

Viewing Storage Configuration

There are two ways to view the storage configuration: the Fleet Command web interface and the NGC command-line interface (CLI).

To view the advanced storage configuration for a system, go to Fleet Command > Locations. Choose a location and click on the Details tab for a system:

adv-storage-config-details-ui.png


The Storage Configuration pane shows the log and container storage paths and mount points for the additional drives you have selected during installation.

To view the advanced storage configuration using the CLI, issue the following command:

Copy
Copied!
            

$ ngc fleet-command location info <location>[:<system>]


The following example shows the storage configuration for location fc-test-location and system fc-test-node with two mount points under Storage Configuration: /drives/02000000000000000001 and /drives/VMwareNVME_0000.

Copy
Copied!
            

$ ngc fleet-command location info fc-test-location:fc-test-node --format_type ascii ---------------------------------------------------------------------------------------------------------------------------------------- System Information Name: fc-test-node Status: READY Marked For Delete: False Config: controller-worker Local IP: 172.31.44.243 Description: nvidia Advanced Networking Details Default Gateway: 172.31.32.1 Default Interface: ens33 Host Name: fc-test-node.fc.nvda.co HTTP Proxy: HTTPS Proxy: No Proxy: Interface 1 Name: ens33 IP Addresses: 172.31.44.243/20, fe80::457:18ff:fea9:4db7/64 Storage Configuration Additional Data Mount: /drives/02000000000000000001 Type: HDD(sata) Used: 0.1GB Available: 5.4GB Additional Data Mount: /drives/VMwareNVME_0000 Type: SSD(nvme) Used: 3.5GB Available: 5.4GB


Kubernetes clusters deployed at Fleet Command edge sites are enhanced to support node labels using NVIDIA GPU Feature Discovery software component. This component, which leverages Node Feature Discovery, allows you to generate Kubernetes node labels for the set of GPUs available on a node. Fleet Command applications can use these labels to steer and organize specific workloads using Kubernetes node selectors in the edge application pod specification.

A sample list of supported labels follows:

  • Feature labels

    These labels are prefixed with ‘feature.node.kubernetes.io/’.

    Copy
    Copied!
                

    { "feature.node.kubernetes.io/cpu-<feature-name>": "true", "feature.node.kubernetes.io/custom-<feature-name>": "true", "feature.node.kubernetes.io/kernel-<feature name>": "<feature value>", "feature.node.kubernetes.io/memory-<feature-name>": "true", "feature.node.kubernetes.io/network-<feature-name>": "true", "feature.node.kubernetes.io/pci-<device label>.present": "true", "feature.node.kubernetes.io/storage-<feature-name>": "true", "feature.node.kubernetes.io/system-<feature name>": "<feature value>", "feature.node.kubernetes.io/usb-<device label>.present": "<feature value>", "feature.node.kubernetes.io/<file name>-<feature name>": "<feature value>" }


  • GPU-specific labels

    These labels are prefixed with ‘nvidia.com/gpu’ and ‘nvidia.com/cuda’.

    Copy
    Copied!
                

    { "nvidia.com/cuda.driver.major": "<cuda-driver-major-version>", "nvidia.com/cuda.driver.minor": "<cuda-driver-minor-version>", "nvidia.com/cuda.driver.rev": "<cuda-driver-revision>", "nvidia.com/cuda.runtime.major": "<cuda-runtime-major>", "nvidia.com/cuda.runtime.minor": "<cuda-runtime-minor>", "nvidia.com/gpu.compute.major": "<gpu-compute-major>", "nvidia.com/gpu.compute.minor": "<gpu-compute-minor>", "nvidia.com/gpu.count": "<gpu-count>", "nvidia.com/gpu.family": "<gpu-family>", "nvidia.com/gpu.machine": "<gpu-machine>", "nvidia.com/gpu.memory": "<gpu-memory-MiB>", "nvidia.com/gpu.product": "<gpu-product>" }


You can find a set of recommended Labels on the Kubernetes website.

Viewing Labels

Kubernetes labels are available on the Fleet Command locations page.

  1. To view the Kubernetes labels, go to Fleet Command > Locations.

  2. Click on a location from the list to view the locations details page.

  3. Click on the Details tab in the system details pane.

You will see a list of default Kubernetes labels.

k8s-labels.png

Searching Labels

To search for a label, enter a search term in the search field and press Enter.

k8s-labels-search.png

To search on a label by key or value, click on the filter icon, enter a search term in the search field, and press Enter.

k8s-labels-search-by.png

This example shows searching by the key “beta”:

k8s-labels-search-by-key.png

This example shows searching by the value “true”:

k8s-labels-search-by-value.png

Fleet Command supports high availability Kubernetes clusters on edge locations. High availability ensures systems operate continuously for a specified period by eliminating single points of failure or employing redundancy. As a result, users of your applications and services can experience fewer disruptions and minimal downtime. When the high availability option is set for a Fleet Command location, the first three systems created will be assigned the controller-worker role, and any additional systems as worker roles.

High availability comes into play as resources become available. Once a location is set to be high availability, it remains in effect for the lifetime of the location object.

Note

If a system goes offline, you must wait for the system to be back online before deploying the location. NVIDIA recommends using static assignment in DHCP for systems in high-availability locations to minimize downtime when systems encounter IP changes.

© Copyright 2022-2023, NVIDIA. Last updated on Jun 12, 2023.