Configuration Guide#
NVDebug Runtime Configuration File (config.yaml)#
The NVDebug runtime configuration file, a YAML file, is used to specify the runtime configuration for the tool and contains the configuration for the tool.
PLATFORM: "arm64"
TargetBaseboard: "GB200 NVL"
LogSanitization: true
SKIP_BMC_SSH_LOGS: true
DUT Configuration File (dut_config.yaml)#
The DUT configuration file uses YAML format to specify target systems and their properties.
Basic Configuration for a single DUT#
DUT_Defaults: &dut_defaults
NodeType: "Compute"
BMC_IP: "192.168.1.100"
BMC_USERNAME: "bmc_user"
BMC_PASSWORD: "bmc_password"
ipmi_cipher: "-C17"
HOST_IP: "192.168.2.100"
HOST_USERNAME: "host_user"
HOST_PASSWORD: "host_password"
RF_DEFAULT_PREFIX: "/redfish/v1"
RF_AUTH: true
SETUP_PORT_FORWARDING: True
FORCE_PORT_FW: False
IP_NETWORK: 'ipv4'
dut-1:
<<: *dut_defaults
Rack Configuration for multiple DUTs#
DUT_Defaults: &dut_defaults ## User should not modify this line or add anything before this section
NodeType: "Compute"
BMC_IP: ""
BMC_USERNAME: ""
BMC_PASSWORD: ""
BMC_SSH_USERNAME: ""
BMC_SSH_PASSWORD: ""
RF_User: ""
RF_Pass: ""
TUNNEL_TCP_PORT: ""
ipmi_cipher: "-C17"
HOST_IP: ""
HOST_USERNAME: ""
HOST_PASSWORD: ""
RF_DEFAULT_PREFIX: "/redfish/v1"
RF_AUTH: true
SETUP_PORT_FORWARDING: True
FORCE_PORT_FW: False
IP_NETWORK: 'ipv4'
nvl-compute-1: &compute_defaults
<<: *dut_defaults
NodeType: "Compute"
ConfigFileToUse: "config_compute.yaml"
HOST_USERNAME: "host_user"
HOST_PASSWORD: "host_password"
BMC_USERNAME: "bmc_user"
BMC_PASSWORD: "bmc_password"
BMC_SSH_USERNAME: "bmc_ssh_user"
BMC_SSH_PASSWORD: "bmc_ssh_password"
<<: *compute_defaults
HOST_IP: 192.168.1.6
BMC_IP: 192.168.1.134
nvl-compute-2:
<<: *compute_defaults
HOST_IP: 192.168.1.7
BMC_IP: 192.168.1.135
...
nvl-compute-9:
<<: *compute_defaults
HOST_IP: 192.168.1.14
BMC_IP: 192.168.1.142
nvl-switch-1: &switch_defaults
<<: *dut_defaults
NodeType: "SwitchTray"
ConfigFileToUse: "config_switch.yaml"
HOST_USERNAME: "host_user"
HOST_PASSWORD: "host_password"
BMC_USERNAME: "bmc_user"
BMC_PASSWORD: "bmc_password"
BMC_SSH_USERNAME: "bmc_ssh_user"
BMC_SSH_PASSWORD: "bmc_ssh_password"
HOST_IP: 192.168.1.101
BMC_IP: 192.168.1.229
nvl-switch-2:
<<: *switch_defaults
HOST_IP: 192.168.1.102
BMC_IP: 192.168.1.230
...
nvl-switch-9:
<<: *switch_defaults
HOST_IP: 192.168.1.109
BMC_IP: 192.168.1.237
Support Log Collectors#
List Available Collectors for a Platform
arm64#
$ ./nvdebug -l -t arm64
Redfish
CID Collector Name Log Location
------+-------------------------------------+-------------------------------
R1 system_event_log Redfish_R1_system_event_log_{system_id}.json
R2 manager_existing_log_dump Redfish_R2_existing_dump_{id}.tar.xz
R3 hgx_manager_on_demand_log_dump Redfish_R3_hgx_manager_dump_{manager_id}_{task_id}.tar.xz
-> R3 Supported Boards: Blackwell-HGX-8-GPU, HGX B300, Hopper-HGX-8-GPU, GB200 NVL, GB300 NVL, GH200 NVL
R4 manager_journal_log Redfish_R4_journal_log_entries_{manager_id}.json
R5 manager_fpga_register_dump Redfish_R5_fpga_dump_{system_id}_{task_id}.tar.xz
R6 manager_erot_dump Redfish_R6_erot_dump_{system_id}_{task_id}.tar.xz
R8 firmware_inventory Redfish_R8_firmware_inventory.json
R9 firmware_inventory_expand_query Redfish_R9_firmware_inventory_expand_query.json
R10 chassis_info Redfish_R10_chassis_info.json
R11 chassis_expand_query Redfish_R11_chassis_expand_query.json
R12 system_info Redfish_R12_system_info.json
R13 system_expand_query Redfish_R13_system_expand_query.json
R14 manager_info Redfish_R14_manager_info.json
R15 manager_expand_query Redfish_R15_manager_expand_query.json
R16 hgx_manager_retimer_dump Redfish_R16_hgx_retimer_dump_{system_id}_{task_id}.tar.xz
-> R16 Supported Boards: Blackwell-HGX-8-GPU, HGX B300, Hopper-HGX-8-GPU
R18 telemetry_metric_reports Redfish_R18_report_{metric_report}.json
R19 chassis_thermal_metrics Redfish_R19_chassis_{chassis}_thermal_metrics.json
R20 firmware_inventory_table Redfish_R20_firmware_inventory_table.txt
R21 system_cper_logs Redfish_R21_cper_logs_{system}_{cper_id}.tar.xz
R22 task_details Redfish_R22_task_{task_id}.json
R23 nvlink_oob_logs Redfish_R23_NVLINK_OOB_Log_{id}.json
R24 hgx_system_fw_attributes_dump Redfish_R24_hgx_system_{system}_fw_attributes_{task_id}.tar.xz
R25 additional_oob_logs Redfish_R25_OOB_Log_{id}.json
R26 chassis_certificates Redfish_R26_chassis_{chassis_id}_certificate.json
R27 spdm_erot_measurements Redfish_R27_spdm_{erot_id}_index_{index}.json
-> R27 Supported Boards: Blackwell-HGX-8-GPU, HGX B300, Hopper-HGX-8-GPU, GB200 NVL, GB300 NVL, GH200 NVL
R28 hgx_system_hardware_checkout_dump Redfish_R28_hgx_system_{system}_hardware_checkout_{task_id}.tar.xz
R29 background_copy_status Redfish_R29_{chassis_id}_copy_status.json
R30 software_inventory Redfish_R30_software_inventory.json
R31 hmc_fdr_log_dump Redfish_R31_{system_id}_hmc_fdr_log_dump
-> R31 Supported Boards: Hopper-HGX-8-GPU
R32 system_post_codes Redfish_R32_system_post_codes
R35 network_device_debug_dump Redfish_R35_network_device_debug_dump
-> R35 Supported Boards: Blackwell-HGX-8-GPU, HGX B300
R36 network_switch_debug_dump Redfish_R36_network_switch_debug_dump
-> R36 Supported Boards: Blackwell-HGX-8-GPU, HGX B300
R37 gpu_debug_dump Redfish_R37_gpu_debug_dump
-> R37 Supported Boards: Blackwell-HGX-8-GPU, HGX B300, GB200 NVL, GB300 NVL
R38 gpu_diagnostic_dump Redfish_R38_gpu_diagnostic_dump
-> R38 Supported Boards: Blackwell-HGX-8-GPU, HGX B300, GB200 NVL, GB300 NVL
R39 sma_debug_dump Redfish_R39_sma_debug_dump
-> R38 Supported Boards: HGX B300, GB300 NVL
R40 custom_dump_service Redfish_R40_custom_dump_service
R41 chassis_thermal_subsystem_leak_detection Redfish_R41_chassis_thermal_subsystem_leak_detection
IPMI
CID Collector Name Log Location
------+-------------------------------------+-------------------------------
I1 mc_info IPMI_I1_mc_info.txt
I2 lan_info IPMI_I2_lan_info.txt
I3 session_info IPMI_I3_session_info.txt
I4 fru_info IPMI_I4_fru_info.txt
I5 sdr_info IPMI_I5_sdr_info.txt
I6 sel_info IPMI_I6_sel_info.txt
I7 sensor_list IPMI_I7_sensor_list.txt
I8 sel_list IPMI_I8_sel_list.txt
I9 sel_raw_dump IPMI_I9_sel_raw_dump.txt
I10 chassis_status IPMI_I10_chassis_status.txt
I11 chassis_restart_cause IPMI_I11_chassis_restart_cause.txt
I12 user_list IPMI_I12_user_list.txt
I13 channel_info IPMI_I13_channel_info.txt
I14 sdr_elist IPMI_I14_sdr_elist.txt
SSH
CID Collector Name Log Location
------+-------------------------------------+-------------------------------
S1 bmc_status BMC_SSH_S1_bmc_status.txt
S2 bmc_dmesg BMC_SSH_S2_bmc_dmesg.txt
S3 network_info BMC_SSH_S3_network_info/...
S4 openbmc_stack_info BMC_SSH_S4_openbmc_stack_info.txt
S5 bmc_list_kernel_modules BMC_SSH_S5_bmc_list_kernel_modules.txt
S6 openbmc_pldm_journal_log BMC_SSH_S6_openbmc_pldm_journal_log.txt
S7 i2c_device_list BMC_SSH_S7_i2c_device_bus_scan{number}.txt
S8 bmc_mem_cpu_utilization BMC_SSH_S8_bmc_mem_cpu_utilization/...
S9 openbmc_boot_status BMC_SSH_S9_openbmc_boot_status.txt
S10 var_log_dir_zip BMC_SSH_S10_var_log_dir_zip{timestamp}.tar.gz
S11 uptime BMC_SSH_S11_uptime.txt
S14 hmc_boot_progress BMC_SSH_S14_hmc_boot_progress.txt
-> S14 Supported Boards: Blackwell-HGX-8-GPU, HGX B300, Hopper-HGX-8-GPU, GB200 NVL, GB300 NVL, GH200 NVL
S15 bmc_power_status BMC_SSH_S15_bmc_power_status/...
S16 virtual_eeprom_data BMC_SSH_S16_virtual_eeprom_data/...
S17 smbus_power_temperature_telemetry BMC_SSH_S17_smbus_power_temperature_telemetry/...
S18 smbpbi_system_status BMC_SSH_S18_smbpbi_system_status/...
Host
CID Collector Name Log Location
------+-------------------------------------+-------------------------------
H1 node_dmesg Host_H1_node_dmesg.tar.gz
H2 node_lspci Host_H2_node_lspci*.txt
H3 node_smbios Host_H3_dmidecode*.txt
H4 node_lshw Host_H4_lshw*.txt
H5 node_nvidia_smi Host_H5_nvidia-smi*.txt
H6 node_kern_log Host_H6_node_kern_log.tar.gz
H7 node_crash_dump Host_H7_node_crash_dump.tar.gz
H8 node_nvme_list Host_H8_nvme_list_-v.txt
H9 node_fabric_manager_log Host_H9_fabricmanager.log
H10 node_nvflash_log Host_H10_nvflash_--check_-i_{num}.txt
H11 nvidia_bug_report Host_H11_nvidia_bug_report_op.log.gz
H15 node_subnet_manager Host_H15_node_subnet_manager/
H16 one_diag_dump Host_H16_one_diag_dump/
H17 node_nvme_log_dump Host_H17_node_nvme_log_dump/
H18 node_os_info Host_H18_cat_etc_os-release.txt
H19 node_memory_info Host_H19_memory_info/
H20 sos_report Host_H20_sos_report/
H21 nvos_cli_dumps Host_21_nvos_cli_dumps/
HealthCheck
CID Collector Name Log Location
------+-------------------------------------+-------------------------------
C1 out_of_band_health_check HealthCheck_C1_out_of_band_health_check.json
x86_64#
$ ./nvdebug -l -t x86_64
Redfish
CID Collector Name Log Location
------+-------------------------------------+-------------------------------
R1 system_event_log Redfish_R1_system_event_log_{system_id}.json
R2 manager_existing_log_dump Redfish_R2_existing_dump_{id}.tar.xz
R4 manager_journal_log Redfish_R4_journal_log_entries_{manager_id}.json
R5 manager_fpga_register_dump Redfish_R5_fpga_dump_{system_id}_{task_id}.tar.xz
R6 manager_erot_dump Redfish_R6_erot_dump_{system_id}_{task_id}.tar.xz
R8 firmware_inventory Redfish_R8_firmware_inventory.json
R9 firmware_inventory_expand_query Redfish_R9_firmware_inventory_expand_query.json
R10 chassis_info Redfish_R10_chassis_info.json
R11 chassis_expand_query Redfish_R11_chassis_expand_query.json
R12 system_info Redfish_R12_system_info.json
R13 system_expand_query Redfish_R13_system_expand_query.json
R14 manager_info Redfish_R14_manager_info.json
R15 manager_expand_query Redfish_R15_manager_expand_query.json
R18 telemetry_metric_reports Redfish_R18_report_{metric_report}.json
R19 chassis_thermal_metrics Redfish_R19_chassis_{chassis}_thermal_metrics.json
R20 firmware_inventory_table Redfish_R20_firmware_inventory_table.txt
R22 task_details Redfish_R22_task_{task_id}.json
R23 nvlink_oob_logs Redfish_R23_NVLINK_OOB_Log_{id}.json
R24 hgx_system_fw_attributes_dump Redfish_R24_hgx_system_{system}_fw_attributes_{task_id}.tar.xz
R25 additional_oob_logs Redfish_R25_OOB_Log_{id}.json
R26 chassis_certificates Redfish_R26_chassis_{chassis_id}_certificate.json
R28 hgx_system_hardware_checkout_dump Redfish_R28_hgx_system_{system}_hardware_checkout_{task_id}.tar.xz
R29 background_copy_status Redfish_R29_{chassis_id}_copy_status.json
R30 software_inventory Redfish_R30_software_inventory.json
R32 system_post_codes Redfish_R32_system_post_codes
R35 network_device_debug_dump Redfish_R35_network_device_debug_dump
-> R35 Supported Boards: Blackwell-HGX-8-GPU, HGX B300
R36 network_switch_debug_dump Redfish_R36_network_switch_debug_dump
-> R36 Supported Boards: Blackwell-HGX-8-GPU, HGX B300
R37 gpu_debug_dump Redfish_R37_gpu_debug_dump
-> R37 Supported Boards: Blackwell-HGX-8-GPU, HGX B300
R38 gpu_diagnostic_dump Redfish_R38_gpu_diagnostic_dump
-> R38 Supported Boards: Blackwell-HGX-8-GPU, HGX B300
IPMI
CID Collector Name Log Location
------+-------------------------------------+-------------------------------
I1 mc_info IPMI_I1_mc_info.txt
I2 lan_info IPMI_I2_lan_info.txt
I3 session_info IPMI_I3_session_info.txt
I4 fru_info IPMI_I4_fru_info.txt
I5 sdr_info IPMI_I5_sdr_info.txt
I6 sel_info IPMI_I6_sel_info.txt
I7 sensor_list IPMI_I7_sensor_list.txt
I8 sel_list IPMI_I8_sel_list.txt
I9 sel_raw_dump IPMI_I9_sel_raw_dump.txt
I10 chassis_status IPMI_I10_chassis_status.txt
I11 chassis_restart_cause IPMI_I11_chassis_restart_cause.txt
I12 user_list IPMI_I12_user_list.txt
I13 channel_info IPMI_I13_channel_info.txt
I14 sdr_elist IPMI_I14_sdr_elist.txt
SSH
CID Collector Name Log Location
------+-------------------------------------+-------------------------------
S2 bmc_dmesg BMC_SSH_S2_bmc_dmesg.txt
S3 network_info BMC_SSH_S3_network_info/...
S5 bmc_list_kernel_modules BMC_SSH_S5_bmc_list_kernel_modules.txt
S6 openbmc_pldm_journal_log BMC_SSH_S6_openbmc_pldm_journal_log.txt
S7 i2c_device_list BMC_SSH_S7_i2c_device_bus_scan{number}.txt
S8 bmc_mem_cpu_utilization BMC_SSH_S8_bmc_mem_cpu_utilization/...
S10 var_log_dir_zip BMC_SSH_S10_var_log_dir_zip{timestamp}.tar.gz
S11 uptime BMC_SSH_S11_uptime.txt
S15 bmc_power_status BMC_SSH_S15_bmc_power_status/...
Host
CID Collector Name Log Location
------+-------------------------------------+-------------------------------
H1 node_dmesg Host_H1_node_dmesg.tar.gz
H2 node_lspci Host_H2_node_lspci*.txt
H3 node_smbios Host_H3_dmidecode*.txt
H4 node_lshw Host_H4_lshw*.txt
H5 node_nvidia_smi Host_H5_nvidia-smi*.txt
H6 node_kern_log Host_H6_node_kern_log.tar.gz
H7 node_crash_dump Host_H7_node_crash_dump.tar.gz
H8 node_nvme_list Host_H8_nvme_list_-v.txt
H9 node_fabric_manager_log Host_H9_fabricmanager.log
H10 node_nvflash_log Host_H10_nvflash_--check_-i_{num}.txt
H11 nvidia_bug_report Host_H11_nvidia_bug_report_op.log.gz
H15 node_subnet_manager Host_H15_node_subnet_manager/
H16 one_diag_dump Host_H16_one_diag_dump/
H17 node_nvme_log_dump Host_H17_node_nvme_log_dump/
H18 node_os_info Host_H18_cat_etc_os-release.txt
H19 node_memory_info Host_H19_memory_info/
HealthCheck
CID Collector Name Log Location
------+-------------------------------------+-------------------------------
C1 out_of_band_health_check HealthCheck_C1_out_of_band_health_check.json
DGX#
$ ./nvdebug -l -t DGX
Redfish
CID Collector Name Log Location
------+-------------------------------------+-------------------------------
R8 firmware_inventory Redfish_R8_firmware_inventory.json
R9 firmware_inventory_expand_query Redfish_R9_firmware_inventory_expand_query.json
R10 chassis_info Redfish_R10_chassis_info.json
R11 chassis_expand_query Redfish_R11_chassis_expand_query.json
R12 system_info Redfish_R12_system_info.json
R13 system_expand_query Redfish_R13_system_expand_query.json
R14 manager_info Redfish_R14_manager_info.json
R15 manager_expand_query Redfish_R15_manager_expand_query.json
R17 dgx_manager_oem_log_dump Redfish_R17_dgx_oem_dump_{manager_id}_{task_id}.tar.xz
R18 telemetry_metric_reports Redfish_R18_report_{metric_report}.json
R19 chassis_thermal_metrics Redfish_R19_chassis_{chassis}_thermal_metrics.json
R20 firmware_inventory_table Redfish_R20_firmware_inventory_table.txt
R22 task_details Redfish_R22_task_{task_id}.json
R23 nvlink_oob_logs Redfish_R23_NVLINK_OOB_Log_{id}.json
R25 additional_oob_logs Redfish_R25_OOB_Log_{id}.json
R26 chassis_certificates Redfish_R26_chassis_{chassis_id}_certificate.json
R30 software_inventory Redfish_R30_software_inventory.json
R32 system_post_codes Redfish_R32_system_post_codes
IPMI
CID Collector Name Log Location
------+-------------------------------------+-------------------------------
I1 mc_info IPMI_I1_mc_info.txt
I2 lan_info IPMI_I2_lan_info.txt
I3 session_info IPMI_I3_session_info.txt
I4 fru_info IPMI_I4_fru_info.txt
I5 sdr_info IPMI_I5_sdr_info.txt
I6 sel_info IPMI_I6_sel_info.txt
I7 sensor_list IPMI_I7_sensor_list.txt
I8 sel_list IPMI_I8_sel_list.txt
I9 sel_raw_dump IPMI_I9_sel_raw_dump.txt
I10 chassis_status IPMI_I10_chassis_status.txt
I11 chassis_restart_cause IPMI_I11_chassis_restart_cause.txt
I12 user_list IPMI_I12_user_list.txt
I13 channel_info IPMI_I13_channel_info.txt
I14 sdr_elist IPMI_I14_sdr_elist.txt
SSH
CID Collector Name Log Location
------+-------------------------------------+-------------------------------
S2 bmc_dmesg BMC_SSH_S2_bmc_dmesg.txt
S3 network_info BMC_SSH_S3_network_info/...
S5 bmc_list_kernel_modules BMC_SSH_S5_bmc_list_kernel_modules.txt
S8 bmc_mem_cpu_utilization BMC_SSH_S8_bmc_mem_cpu_utilization/...
S11 uptime BMC_SSH_S11_uptime.txt
S12 fpga_register_table BMC_SSH_S12_fpga_register_table.txt
S13 hmc_boot_status BMC_SSH_S13_hmc_boot_status.txt
S15 bmc_power_status BMC_SSH_S15_bmc_power_status/...
Host
CID Collector Name Log Location
------+-------------------------------------+-------------------------------
H1 node_dmesg Host_H1_node_dmesg.tar.gz
H2 node_lspci Host_H2_node_lspci*.txt
H3 node_smbios Host_H3_dmidecode*.txt
H4 node_lshw Host_H4_lshw*.txt
H5 node_nvidia_smi Host_H5_nvidia-smi*.txt
H6 node_kern_log Host_H6_node_kern_log.tar.gz
H7 node_crash_dump Host_H7_node_crash_dump.tar.gz
H8 node_nvme_list Host_H8_nvme_list_-v.txt
H9 node_fabric_manager_log Host_H9_fabricmanager.log
H10 node_nvflash_log Host_H10_nvflash_--check_-i_{num}.txt
H11 nvidia_bug_report Host_H11_nvidia_bug_report_op.log.gz
H15 node_subnet_manager Host_H15_node_subnet_manager/
H16 one_diag_dump Host_H16_one_diag_dump/
H17 node_nvme_log_dump Host_H17_node_nvme_log_dump/
H18 node_os_info Host_H18_cat_etc_os-release.txt
H19 node_memory_info Host_H19_memory_info/
HealthCheck
CID Collector Name Log Location
------+-------------------------------------+-------------------------------
C1 out_of_band_health_check HealthCheck_C1_out_of_band_health_check.json
HGX-HMC#
$ ./nvdebug -l -t HGX-HMC
Redfish
CID Collector Name Log Location
------+-------------------------------------+-------------------------------
R1 system_event_log Redfish_R1_system_event_log_{system_id}.json
R2 manager_existing_log_dump Redfish_R2_existing_dump_{id}.tar.xz
R3 hgx_manager_on_demand_log_dump Redfish_R3_hgx_manager_dump_{manager_id}_{task_id}.tar.xz
R4 manager_journal_log Redfish_R4_journal_log_entries_{manager_id}.json
R5 manager_fpga_register_dump Redfish_R5_fpga_dump_{system_id}_{task_id}.tar.xz
R6 manager_erot_dump Redfish_R6_erot_dump_{system_id}_{task_id}.tar.xz
R7 hgx_manager_self_test_report Redfish_R7_hgx_manager_{system_id}_selftest_{task_id}.tar.xz
-> R7 Supported Boards: Hopper-HGX-8-GPU, GH200 NVL, MGX-GH200, MGX C2, MGX-GH200-NVL2, MGX-PCIE-NVL16, DC-Hopper-PCIe
R8 firmware_inventory Redfish_R8_firmware_inventory.json
R9 firmware_inventory_expand_query Redfish_R9_firmware_inventory_expand_query.json
R10 chassis_info Redfish_R10_chassis_info.json
R11 chassis_expand_query Redfish_R11_chassis_expand_query.json
R12 system_info Redfish_R12_system_info.json
R13 system_expand_query Redfish_R13_system_expand_query.json
R14 manager_info Redfish_R14_manager_info.json
R15 manager_expand_query Redfish_R15_manager_expand_query.json
R16 hgx_manager_retimer_dump Redfish_R16_hgx_retimer_dump_{system_id}_{task_id}.tar.xz
R18 telemetry_metric_reports Redfish_R18_report_{metric_report}.json
R19 chassis_thermal_metrics Redfish_R19_chassis_{chassis}_thermal_metrics.json
R20 firmware_inventory_table Redfish_R20_firmware_inventory_table.txt
R22 task_details Redfish_R22_task_{task_id}.json
R23 nvlink_oob_logs Redfish_R23_NVLINK_OOB_Log_{id}.json
R24 hgx_system_fw_attributes_dump Redfish_R24_hgx_system_{system}_fw_attributes_{task_id}.tar.xz
R25 additional_oob_logs Redfish_R25_OOB_Log_{id}.json
R26 chassis_certificates Redfish_R26_chassis_{chassis_id}_certificate.json
R27 spdm_erot_measurements Redfish_R27_spdm_{erot_id}_index_{index}.json
R28 hgx_system_hardware_checkout_dump Redfish_R28_hgx_system_{system}_hardware_checkout_{task_id}.tar.xz
R29 background_copy_status Redfish_R29_{chassis_id}_copy_status.json
R30 software_inventory Redfish_R30_software_inventory.json
R31 hmc_fdr_log_dump Redfish_R31_{system_id}_hmc_fdr_log_dump
R35 network_device_debug_dump Redfish_R35_network_device_debug_dump
-> R35 Supported Boards: Blackwell-HGX-8-GPU, HGX B300
R36 network_switch_debug_dump Redfish_R36_network_switch_debug_dump
-> R36 Supported Boards: Blackwell-HGX-8-GPU, HGX B300
R37 gpu_debug_dump Redfish_R37_gpu_debug_dump
R38 gpu_diagnostic_dump Redfish_R38_gpu_diagnostic_dump
Note: TCP port forwarding support on host-BMC is required for HGX board logs collection via Redfish.
IPMI
CID Collector Name Log Location
------+-------------------------------------+-------------------------------
I1 mc_info IPMI_I1_mc_info.txt
I2 lan_info IPMI_I2_lan_info.txt
I3 session_info IPMI_I3_session_info.txt
I4 fru_info IPMI_I4_fru_info.txt
I5 sdr_info IPMI_I5_sdr_info.txt
I6 sel_info IPMI_I6_sel_info.txt
I7 sensor_list IPMI_I7_sensor_list.txt
I8 sel_list IPMI_I8_sel_list.txt
I9 sel_raw_dump IPMI_I9_sel_raw_dump.txt
I10 chassis_status IPMI_I10_chassis_status.txt
I11 chassis_restart_cause IPMI_I11_chassis_restart_cause.txt
I12 user_list IPMI_I12_user_list.txt
I13 channel_info IPMI_I13_channel_info.txt
I14 sdr_elist IPMI_I14_sdr_elist.txt
SSH
CID Collector Name Log Location
------+-------------------------------------+-------------------------------
S14 hmc_boot_progress BMC_SSH_S14_hmc_boot_progress.txt
-> S14 Supported Boards: Unknown
S15 bmc_power_status BMC_SSH_S15_bmc_power_status/...
Host
CID Collector Name Log Location
------+-------------------------------------+-------------------------------
H1 node_dmesg Host_H1_node_dmesg.tar.gz
H2 node_lspci Host_H2_node_lspci*.txt
H3 node_smbios Host_H3_dmidecode*.txt
H4 node_lshw Host_H4_lshw*.txt
H5 node_nvidia_smi Host_H5_nvidia-smi*.txt
H6 node_kern_log Host_H6_node_kern_log.tar.gz
H7 node_crash_dump Host_H7_node_crash_dump.tar.gz
H8 node_nvme_list Host_H8_nvme_list_-v.txt
H9 node_fabric_manager_log Host_H9_fabricmanager.log
H10 node_nvflash_log Host_H10_nvflash_--check_-i_{num}.txt
H11 nvidia_bug_report Host_H11_nvidia_bug_report_op.log.gz
H15 node_subnet_manager Host_H15_node_subnet_manager/
H16 one_diag_dump Host_H16_one_diag_dump/
H17 node_nvme_log_dump Host_H17_node_nvme_log_dump/
H18 node_os_info Host_H18_cat_etc_os-release.txt
H19 node_memory_info Host_H19_memory_info/
HealthCheck
CID Collector Name Log Location
------+-------------------------------------+-------------------------------
C1 out_of_band_health_check HealthCheck_C1_out_of_band_health_check.json
NVSwitch#
$ ./nvdebug -l -t NVSwitch
Redfish
CID Collector Name Log Location
------+-------------------------------------+-------------------------------
R1 system_event_log Redfish_R1_system_event_log_{system_id}.json
R2 manager_existing_log_dump Redfish_R2_existing_dump_{id}.tar.xz
R8 firmware_inventory Redfish_R8_firmware_inventory.json
R9 firmware_inventory_expand_query Redfish_R9_firmware_inventory_expand_query.json
R10 chassis_info Redfish_R10_chassis_info.json
R11 chassis_expand_query Redfish_R11_chassis_expand_query.json
R12 system_info Redfish_R12_system_info.json
R13 system_expand_query Redfish_R13_system_expand_query.json
R14 manager_info Redfish_R14_manager_info.json
R15 manager_expand_query Redfish_R15_manager_expand_query.json
R19 chassis_thermal_metrics Redfish_R19_chassis_{chassis}_thermal_metrics.json
R20 firmware_inventory_table Redfish_R20_firmware_inventory_table.txt
R22 task_details Redfish_R22_task_{task_id}.json
R25 additional_oob_logs Redfish_R25_OOB_Log_{id}.json
R26 chassis_certificates Redfish_R26_chassis_{chassis_id}_certificate.json
R29 background_copy_status Redfish_R29_{chassis_id}_copy_status.json
R30 software_inventory Redfish_R30_software_inventory.json
IPMI
CID Collector Name Log Location
------+-------------------------------------+-------------------------------
NA NA NA
SSH
CID Collector Name Log Location
------+-------------------------------------+-------------------------------
S2 bmc_dmesg BMC_SSH_S2_bmc_dmesg.txt
S3 network_info BMC_SSH_S3_network_info/...
S5 bmc_list_kernel_modules BMC_SSH_S5_bmc_list_kernel_modules.txt
S11 uptime BMC_SSH_S11_uptime.txt
Host
CID Collector Name Log Location
------+-------------------------------------+-------------------------------
H1 node_dmesg Host_H1_node_dmesg.tar.gz
H2 node_lspci Host_H2_node_lspci*.txt
H3 node_smbios Host_H3_dmidecode*.txt
H6 node_kern_log Host_H6_node_kern_log.tar.gz
H7 node_crash_dump Host_H7_node_crash_dump.tar.gz
H9 node_fabric_manager_log Host_H9_fabricmanager.log
H12 nvos_inventory Host_H12_nv_show_platform_{type}.txt
H13 nvos_gnmi_config Host_H13_nvos_gnmi_config.txt
H14 nvos_tech_support_dump Host_H14_nvos_tech_support_dump/
H19 node_memory_info Host_H19_memory_info/
HealthCheck
CID Collector Name Log Location
------+-------------------------------------+-------------------------------
NA NA NA
PowerShelf#
$ ./nvdebug -l -t PowerShelf
Redfish
CID Collector Name Log Location
------+-------------------------------------+-------------------------------
R1 system_event_log Redfish_R1_system_event_log_{system_id}.json
R2 manager_existing_log_dump Redfish_R2_existing_dump_{id}.tar.xz
R8 firmware_inventory Redfish_R8_firmware_inventory.json
R9 firmware_inventory_expand_query Redfish_R9_firmware_inventory_expand_query.json
R10 chassis_info Redfish_R10_chassis_info.json
R11 chassis_expand_query Redfish_R11_chassis_expand_query.json
R14 manager_info Redfish_R14_manager_info.json
R15 manager_expand_query Redfish_R15_manager_expand_query.json
R19 chassis_thermal_metrics Redfish_R19_chassis_{chassis}_thermal_metrics.json
R20 firmware_inventory_table Redfish_R20_firmware_inventory_table.txt
R22 task_details Redfish_R22_task_{task_id}.json
R25 additional_oob_logs Redfish_R25_OOB_Log_{id}.json
R26 chassis_certificates Redfish_R26_chassis_{chassis_id}_certificate.json
R33 power_equipment_info Redfish_R33_power_equipment_info
R34 power_equipment_expand_query Redfish_R34_power_equipment_expand_query
IPMI
CID Collector Name Log Location
------+-------------------------------------+-------------------------------
NA NA NA
SSH
CID Collector Name Log Location
------+-------------------------------------+-------------------------------
NA NA NA
Host
CID Collector Name Log Location
------+-------------------------------------+-------------------------------
NA NA NA
HealthCheck
CID Collector Name Log Location
------+-------------------------------------+-------------------------------
NA NA NA
Running Specific Collector Groups and Collectors#
Redfish Collectors#
$ nvdebug -i $BMC_IP -u $BMC_USER -p $BMC_PASS -t $TARGET -g Redfish
Run Specific Firmware Inventory Collector#
$ nvdebug -i $BMC_IP -u $BMC_USER -p $BMC_PASS -t $TARGET -S R8
Configuration Considerations#
Running nvdebug in IPv6 Networks#
By default, nvdebug uses IPv4. For IPv6, set IP_NETWORK to ipv6 in the DUT configuration. When providing IPv6 addresses for the BMC/Host, do not use square brackets.