NVIDIA BlueField Platform Software Troubleshooting Guide

NVIDIA BlueField Platform Software Troubleshooting Guide Download PDF

About This Document

This guide provides troubleshooting information for common issues and misconfigurations encountered when using BlueField Platform Software.

Technical Support

Customers who purchased NVIDIA products directly from NVIDIA are invited to contact us through the following methods:

Customers who purchased NVIDIA M-1 Global Support Services, please see your contract for details regarding technical support.

Customers who purchased NVIDIA products through an NVIDIA-approved reseller should first seek assistance through their reseller.

Glossary

Term

Description

ACE

AXI coherency extensions

ACPI

Advanced configuration and power interface

AMBA®

Advanced microcontroller bus architecture

ARB

Arbitrate

ATF

Arm-trusted firmware

AXI4

Advanced eXtensible Interface 4

BDF address

Bus, device, function address. This is the device's PCIe bus address to uniquely identify the specific device.

BERT

Boot error record table

BF_INST_DIR

The directory where the BlueField software is installed

BFB

BlueField bootstream

BMC

Board management controller

BSD

BlueField software distribution

BSP

BlueField support package

BUF

Buffer

CBS

Committed burst size

CHI

Coherent hub interface; Arm® protocol used over the BlueField Skymesh specification

CIR

Committed information rate

CL

Cache line

CMDQ

Command queue

CMO

Cache maintenance operation

COB

Collision buffer

DAT

Data

DEK

Data encryption key

DMA

Direct memory access

DOCA

DPU SDK

DOT

Device ownership transfer

DPA

Data path accelerator; a n auxiliary processor designed to accelerate data-path operations

DPDK

Data plane development kit

DPI

Deep packet inspection

DPU

Data processing unit, the third pillar of the data center with CPU and GPU

DVM

Distributed virtual memory

DW

Dword

EBS

Excess burst size

ECPF

Embedded CPU physical function

EIR

Excess information rate

EMEM/EMI

External memory interface; block in the MSS which performs the actual read/write from the DDR device

eMMC

Embedded multi-media card

ESP

EFI system partition

ESP header

Encapsulating security payload

EU

Execution unit. HW thread; a logical DPA processing unit.

FIPS

Federal Information Processing Standards

FPGA

Field-programmable gate arrays

FS

File system

FW

Firmware

GDB

GNU debugger

GPT

GUID partition table

HCA

Host-channel adapter

HNF

Home node interface

Host

When referring to "the host" this documentation is referring to the server host. When referring to the Arm based host, the documentation will specifically call out "Arm host".

  • Server host OS refers to the Host Server OS (Linux or Windows)

  • Arm host refers to the AARCH64 Linux OS which is running on the BlueField Arm Cores

HW

Hardware

hwmon

Hardware monitoring

IB

InfiniBand

ICM

Interface configuration memory

IKE

Internet key exchange

IPMB

Intelligent platform management bus

IPMI

Intelligent platform management interface

IR

Intermediate representation

KGDB

Kernel debugger

KGDBOC

Kernel debugger over console

LAT

Latency

LCRD

Link credit

LSO

Large send offload

LTO

Link-time optimization

MMIO

Memory-mapped I/O

MSB

Most significant bit

MSS

Memory subsystem

MST

Mellanox software tools

NAT

Network address translation

NIC

Network interface card

NIST

National Institute of Standards and Technology

NS

Namespace

OCD

On-chip debugger

OOB

Out-of-band

OS

Operating system

OVS

Open vSwitch

PBS

Peak burst size

PCIe

PCI Express; Peripheral Component Interconnect Express

PF

Physical function

PIR

Peak information rate

PK

Platform key

PKA

Public key accelerator

POC

Point of coherence

RD

Read

RDMA

Remote direct memory access

RegEx

Regular expression

REQ

Request

RES

Response

RMC

Remote management controller

RN

Request node

RN-F – Fully coherent request node

RN-D – IO coherent request node with DVM support

RN-I – IO coherent request node

RNG

Random number generator/generation

RoCE

Ethernet and RDMA over converged Ethernet

RQ

Receive queue

RShim

Random Shim

RTT

Round-trip time

RX

Receive

SA

Security association

SBSA

Server base system architecture

SDK

Software development kit

SF

Sub-function or scalable function

SG

Scatter-gather

SHA

Secure hash algorithm

SMMU

System memory management unit

SNP

Snooping

SQ

Send queue

SR-IOV

Single-root IO virtualization

STL

Stall

Sync event

Synchronization event

TBU

Translation buffer unit

TIR

Transport interface receive

TIS

Transport interface send

TLS

Transport layer security

TRB

Trail buffer

TSO

TCP send offload

TSO

Total store order

TX

Transmit

UDS

Unix domain socket

UEFI

Unified extensible firmware interface

UPVS

UEFI persistent variable store

VF

Virtual function

VFE

Virtio full emulation

VM

Virtual machine

VPI

Virtual protocol interconnect

VST

Virtual switch tagging

WorkQ or workq

Work queue

WQE

Work queue elements

WR

Write

WRDB

Write data buffer

© Copyright 2024, NVIDIA. Last updated on Sep 19, 2024.