What can I help you with?
NMX Manager (NMX-M) Documentation v85.1.2000

Known Issues

Internal Reference Number

Issue

4424312

Description: If a bring-up-worker Pod terminates unexpectedly while handling a bring-up task, it may cause the task to fail.

Workaround: Perform a switch-tray cleanup and reissue the bring-up request. Follow the cleanup steps outlined in issue #4466833 below.

Keywords: Bring-up worker Pod; task

Discovered in Version: 85.1.2000

4424454

Description: If a switch-gateway Pod terminates unexpectedly while handling a bring-up task, it may cause the task to fail.

Workaround: Perform a switch-tray cleanup and reissue the bring-up request. Follow the cleanup steps outlined in issue #4466833 below.

Keywords: switch-gateway Pod; bring-up task

Discovered in Version: 85.1.2000

4466833

Description: During the switch-tray bring-up process, if an NMX-M node becomes unavailable, the download of certificate files to the NMX Manager may fail. This is due to the files being served from cluster node IPs, which may not be reachable in such failure scenarios.

Workaround: Perform a switch-tray cleanup and reissue the bring-up request.

To clean the switch-tray:

  1. Connect to the switch via SSH.

  2. Retrieve certificate details:

    nv show system security ca-certificate

    nv show system security certificate

  3. Delete only the certificates and ca-certificates added by NMX-M:

    nv action delete system security certificate <certificate_name>

    nv action delete system security ca-certificate <ca_certificate_name>

  4. Delete the NMX controller SDN configuration:

    nv action delete sdn config apps nmx-controller type fm_config files fm_config.cfg

  5. Reset SDN configuration to factory defaults:

    nv action reset sdn factory-default

  6. Disable the cluster state:

    nv set cluster state disabled

  7. Apply the configuration:

    nv config apply

  8. Clean the temp files:

    rm /tmp/cert.p12 /tmp/ca-cert.crt /tmp/fm_config.cfg

Keywords: Switch tray; bring-up; certificates

Discovered in Version: 85.1.2000

4399074

Description: Switch registration does not fail as expected when the same switch is first registered using its IP address and then again using its domain name.

Workaround: To avoid duplicate registration issues, use either the IP address or the domain name of the switch consistently for service registration, not both.

Keywords: Switch registration; IP address; domain

Discovered in Version: 85.1.2000

4369284

Description: The Loki pods responsible for handling log data in the NMX-M environment may run out of storage, leading to crashes. This can disrupt log aggregation and prevent users from viewing logs. The issue occurs when the assigned storage capacity is exceeded.

Workaround: To mitigate this issue, run the following commands:

  • Delete Loki PVC:

    kubectl -n infra delete pvc storage-loki-0 --wait=false

  • Delete Loki pod to trigger re-creation:

    kubectl -n infra delete pod loki-0

Keywords: Loki pods; log aggregation; storage

Discovered in Version: 85.1.1000

4355456

Description: The registration of services (NMX-T & NMX-C) is only available when all three nodes of NMX-M are up and running.

Workaround: N/A

Keywords: NMX-T; NMX-C

Discovered in Version: 85.1.1000

4367335

Description: The following PUT requests execute successfully despite potentially returning an HTTP 500 Internal Server Error:

  • http://localhost:8084/nmx/v1/compute-nodes

  • http://localhost:8084/nmx/v1/switch-nodes/

  • http://localhost:8084/nmx/v1/gpus/

  • http://localhost:8084/nmx/v1/switches/

  • http://localhost:8084/nmx/v1/chassis/

Workaround: N/A

Keywords: PUT; HTTP 500 Error

Discovered in Version: 85.1.1000

4363356

Description: After installation, NMX services take up to 10 minutes to start and stabilize.

Workaround: N/A

Keywords: Installation

Discovered in Version: 85.1.1000

© Copyright 2025, NVIDIA. Last updated on May 29, 2025.