NVIDIA Docs Hub NVIDIA Networking Networking Software NVLink Management Software NMX Manager (NMX-M) Documentation v85.1.2000 Known Issues

Known Issues

Internal Reference Number	Issue
4424312	Description: If a bring-up-worker Pod terminates unexpectedly while handling a bring-up task, it may cause the task to fail.
	Workaround: Perform a switch-tray cleanup and reissue the bring-up request. Follow the cleanup steps outlined in issue #4466833 below.
	Keywords: Bring-up worker Pod; task
	Discovered in Version: 85.1.2000
4424454	Description: If a switch-gateway Pod terminates unexpectedly while handling a bring-up task, it may cause the task to fail.
	Workaround: Perform a switch-tray cleanup and reissue the bring-up request. Follow the cleanup steps outlined in issue #4466833 below.
	Keywords: switch-gateway Pod; bring-up task
	Discovered in Version: 85.1.2000
4466833	Description: During the switch-tray bring-up process, if an NMX-M node becomes unavailable, the download of certificate files to the NMX Manager may fail. This is due to the files being served from cluster node IPs, which may not be reachable in such failure scenarios.
	Workaround: Perform a switch-tray cleanup and reissue the bring-up request. To clean the switch-tray: Connect to the switch via SSH. Retrieve certificate details: `nv show system security ca-certificate` `nv show system security certificate` Delete only the certificates and ca-certificates added by NMX-M: `nv action delete system security certificate <certificate_name>` `nv action delete system security ca-certificate <ca_certificate_name>` Delete the NMX controller SDN configuration: `nv action delete sdn config apps nmx-controller type fm_config files fm_config.cfg` Reset SDN configuration to factory defaults: `nv action reset sdn factory-default` Disable the cluster state: `nv set cluster state disabled` Apply the configuration: `nv config apply` Clean the temp files: `rm /tmp/cert.p12 /tmp/ca-cert.crt /tmp/fm_config.cfg`
	Keywords: Switch tray; bring-up; certificates
	Discovered in Version: 85.1.2000
4399074	Description: Switch registration does not fail as expected when the same switch is first registered using its IP address and then again using its domain name.
	Workaround: To avoid duplicate registration issues, use either the IP address or the domain name of the switch consistently for service registration, not both.
	Keywords: Switch registration; IP address; domain
	Discovered in Version: 85.1.2000
4369284	Description: The Loki pods responsible for handling log data in the NMX-M environment may run out of storage, leading to crashes. This can disrupt log aggregation and prevent users from viewing logs. The issue occurs when the assigned storage capacity is exceeded.
	Workaround: To mitigate this issue, run the following commands: Delete Loki PVC: `kubectl -n infra delete pvc storage-loki-0 --wait=false` Delete Loki pod to trigger re-creation: `kubectl -n infra delete pod loki-0`
	Keywords: Loki pods; log aggregation; storage
	Discovered in Version: 85.1.1000
4355456	Description: The registration of services (NMX-T & NMX-C) is only available when all three nodes of NMX-M are up and running.
	Workaround: N/A
	Keywords: NMX-T; NMX-C
	Discovered in Version: 85.1.1000
4367335	Description: The following PUT requests execute successfully despite potentially returning an HTTP 500 Internal Server Error: `http://localhost:8084/nmx/v1/compute-nodes` `http://localhost:8084/nmx/v1/switch-nodes/` `http://localhost:8084/nmx/v1/gpus/` `http://localhost:8084/nmx/v1/switches/` `http://localhost:8084/nmx/v1/chassis/`
	Workaround: N/A
	Keywords: PUT; HTTP 500 Error
	Discovered in Version: 85.1.1000
4363356	Description: After installation, NMX services take up to 10 minutes to start and stabilize.
	Workaround: N/A
	Keywords: Installation
	Discovered in Version: 85.1.1000