NVIDIA BlueField Platform Software Troubleshooting Guide

Security

This guide outlines how to resolve boot failures on BlueField DPUs caused by a corrupted or missing Microsoft UEFI certificate when Secure Boot is enabled. It includes preparation steps to proactively craft a recovery image and resolution steps to restore the certificate without disabling Secure Boot.

Command

Description

Copy
Copied!
            

dpu_golden_image

BMC utility used to retrieve and store a copy of the current BlueField Arm BFB image

Copy
Copied!
            

mlx-mkbfb

BlueField Arm utility used to inject EFI capsules into BFB images

Copy
Copied!
            

cat new_arm_dpu_golden_image.bfb > /dev/rshimX/boot

Where X is the appropriate RShim device.

Writes a BFB image from the DPU BMC to the BlueField Arm over RShim

User has Secure Boot enabled and Microsoft DB certificate gets corrupted or deleted

If the Microsoft UEFI certificate is missing or corrupted, the BlueField Arm will fail to boot with Secure Boot enabled. The output will resemble:

Copy
Copied!
            

3 seconds remain... 2 seconds remain... 1 seconds remain... 0 seconds remain... Failed to boot 'ubuntu0' <<=== Fails here Failed to boot 'NET-NIC_P0-IPV4' Failed to boot 'NET-NIC_P0-IPV6'   >>Start PXE over IPv4

The UEFI database contains a list of trusted X.509 certificates and hashes used to validate binaries during boot. In this case, the SHIM EFI binary (shim.efi or shimaa64.efi) is signed by Microsoft's certificate and cannot be authenticated.

Example output from mokutil showing a typical database:

Copy
Copied!
            

root@dpu-arm:~# mokutil --db | grep "Subject:"           Subject: C=US, ST=MA, L=Westborough, O=NVIDIA Corporation, OU=BlueField Secure Boot, CN=NVIDIA BlueField Secure Boot UEFI db Signing 2021         Subject: C=US, ST=CA, L=Santa Clara, O=NVIDIA Corporation, OU=NBU, CN=NVIDIA BlueField Secure Boot EFI Signing 2022-A         Subject: C=US, ST=Washington, L=Redmond, O=Microsoft Corporation, CN=Microsoft Corporation UEFI CA 2011 <<<==== This is corrupted or missng         Subject: C=US, ST=California, L=Palo Alto, O=VMware, Inc., CN=VMware Secure Boot Signing         Subject: C=GB, ST=Isle of Man, L=Douglas, O=Canonical Ltd., CN=Canonical Ltd. Master Certificate Authority

The SHIM binary is typically signed by Microsoft:

Copy
Copied!
            

shimaa64.efi (Microsoft) signature 1 image signature issuers: - /C=US/ST=Washington/L=Redmond/O=Microsoft Corporation/CN=Microsoft Corporation UEFI CA 2011 <<<======== Signed by this module image signature certificates: - subject: /C=US/ST=Washington/L=Redmond/O=Microsoft Corporation/CN=Microsoft Windows UEFI Driver Publisher issuer: /C=US/ST=Washington/L=Redmond/O=Microsoft Corporation/CN=Microsoft Corporation UEFI CA 2011 - subject: /C=US/ST=Washington/L=Redmond/O=Microsoft Corporation/CN=Microsoft Corporation UEFI CA 2011 issuer: /C=US/ST=Washington/L=Redmond/O=Microsoft Corporation/CN=Microsoft Corporation Third Party Marketplace Root

To preserve Secure Boot integrity and enable large-scale recovery, a customer solution was developed to restore the missing Microsoft certificate using the DPU BMC—without disabling Secure Boot.

Solution

To recover from a missing Microsoft certificate, the BlueField Arm BFB image must be updated with the appropriate EFI capsule (efi_sbkeysync.cap) which includes the required certificate.

Note

This process assumes that a recovery image is prepared before the problem occurs. A single BFB image may be reused across all affected DPUs if they share the same configuration.


Prerequisites

  • Python 3 must be installed on the BlueField Arm.

  • The EFI capsule is available at:

    /usr/lib/firmware/mellanox/boot/capsule/efi_sbkeysync.cap

  • The mlx-mkbfb tool is installed:

    Copy
    Copied!
                

    root@dpu-arm:~# which mlx-mkbfb /usr/bin/mlx-mkbfb   root@dpu-arm:~# python3 --version Python 3.10.12

Preparation Steps

These steps must be performed in advance and stored on the DPU BMC:

  1. Create a golden image on the DPU BMC (if one does not already exist):

    Copy
    Copied!
                

    root@dpu-bmc:~# dpu_golden_image golden_image_arm -r /tmp/arm_golden_image.bfb

  2. Verify the image was created:

    Copy
    Copied!
                

    root@dpu-bmc:~# ls -l /tmp/arm_golden_image.bfb -rw-r--r--    1 root     root      14713136 Jul  7 13:55 /tmp/arm_golden_image.bfb

  3. Copy the golden image from the BMC to the BlueField Arm:

    Copy
    Copied!
                

    root@dpu-arm:~# scp root@<bmc_ip>:/tmp/arm_golden_image.bfb ~/.

  4. Craft a new BFB image using the Microsoft EFI capsule:

    Copy
    Copied!
                

    root@dpu-arm:~# /usr/bin/mlx-mkbfb --capsule /usr/lib/firmware/mellanox/boot/capsule/efi_sbkeysync.cap arm_golden_image.bfb new_arm_golden_image.bfb

    Info

    This injects a valid Microsoft UEFI certificate into the new BFB.

  5. Verify the new image was created:

    Copy
    Copied!
                

    root@dpu-arm:~# ls -l new_arm_golden_image.bfb -rw-r--r-- 1 root root 7366872 Jul  7 14:54 new_arm_golden_image.bfb

  6. Copy the new image back to the BMC (or to other BMCs as needed):

    Copy
    Copied!
                

    root@dpu-arm:~# scp new_arm_golden_image.bfb root@10.255.6.141:/tmp/.

Resolution Steps

The following steps can be triggered after the Microsoft certificate is lost or corrupted:

  1. Stop RShim on the x86 host (to allow BMC access):

    Copy
    Copied!
                

    [root@x86-host]# systemctl stop rshim

  2. Confirm RShim is inactive:

    Copy
    Copied!
                

    [root@x86-host]# systemctl status rshim | grep -i Active    Active: inactive (dead)

  3. Start RShim on the DPU BMC (if not already running):

    Copy
    Copied!
                

    root@dpu-bmc:~# systemctl restart rshim

  4. Confirm RShim is active on the BMC:

    Copy
    Copied!
                

    root@dpu-bmc:~# systemctl status rshim | grep -i Active      Active: active (running)

  5. Write the new recovery image to the BlueField Arm:

    Copy
    Copied!
                

    root@dpu-bmc:~# cat /tmp/new_arm_golden_image.bfb > /dev/rshim0/boot

  6. Observe the console output during boot:

    Copy
    Copied!
                

    FmpDxe: EFI Capsule Authentication Successful, Status: Success. [PMI] DB update started. Enable Custom Mode, Status: Success Enroll key, Status: Success ... [PMI] Total number of updates: 6 [PMI] Errors during updates : 0 CapsuleRuntimeDxe: ProcessCapsuleImage 0, Status: Success

The Microsoft UEFI certificate is restored, and the BlueField Arm should now boot successfully with Secure Boot enabled.

© Copyright 2025, NVIDIA. Last updated on Jul 17, 2025.