DOCA DevEmu PCI
This library is supported at alpha level; backward compatibility is not guaranteed.
DOCA DevEmu PCI is part of the DOCA Device Emulation subsystem. It provides low-level software APIs that allow management of an emulated PCIe device using the emulation capability of NVIDIA® BlueField® networking platforms.
It is a common layer for all PCIe emulation modules, such as DOCA DevEmu PCIe Generic Emulation, and DOCA DevEmu Virtio subsystem emulation.
This library follows the architecture of a DOCA Core Context. It is recommended read the following sections beforehand :
Generic device emulation is part of DOCA device emulation. It is recommended to read the following guides beforehand:
DOCA DevEmu PCI Emulation is supported only on the BlueField target. The BlueField must meet the following requirements
DOCA version 2.7.0 or greater
BlueField-3 firmware 32.41.1000 or higher
Please refer to the DOCA Backward Compatibility Policy.
The library must be run with root privileges.
Perform the following:
Configure the BlueField to work in DPU mode as described in NVIDIA BlueField Modes of Operation.
Enable the PCIe switch emulation capability needed for hot plugging emulated PCIe devices. This can be done by running the following command on the host or BlueField:
host/bf>
sudo
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_SWITCH_EMULATION_ENABLE=1Perform a BlueField system-level reset for the mlxconfig settings to take effect.
To support hot-plug feature, the host must have the following boot parameters:
Intel CPU:
intel_iommu=on iommu=pt pci=realloc
AMD CPU:
iommu=pt pci=realloc
This can be done using the following steps:
This process may vary depending on the host OS. Users can find multiple guides online describing this process.
Add the boot parameters:
host>
sudo
nano /etc/default/grub Find the variable GRUB_CMDLINE_LINUX_DEFAULT="<existing-params>"
Add the params at the end GRUB_CMDLINE_LINUX_DEFAULT="<existing-params> intel_iommu=on iommu=pt pci=realloc"
Update configuration.
For Ubuntu:
host> update-grub
For RHEL:
host> grub2-mkconfig -o /boot/grub2/grub.cfg
Perform warm boot.
Confirm that the parameters are in effect:
host>
cat
/proc/cmdline <existing-params> intel_iommu=on iommu=pt pci=realloc
The DOCA DevEmu PCI library provides 2 main software abstractions, the PCIe type, and the PCIe device. The PCIe type represents the configurations of the emulated device, while the PCIe device represents an instance of an emulated device. Furthermore, any PCIe device instance must be associated with a single PCIe type, while PCIe type can be associated with many PCIe devices.
Pre Defined PCI Type vs. Generic PCI Type
A PCIe type object can be acquired in 2 different ways:
Acquire a pre-defined type, using emulation libraries of existing protocols such as DOCA DevEmu Virtio FS library
Create from scratch using the DOCA DevEmu Generic library
In case of pre-defined type, the configurability of the type is limited.
PCIe Type Name
As part of the DOCA PCIe emulation, every type has a name assigned to it. This property is not part of the PCIe specification, but rather it is a mechanism in DOCA that uniquely identifies the PCIe type.
There cannot be 2 different PCIe types with the same name, even across different processes, unless the type in the second process is configured in identical manner to the first one. Furthermore, attempting to configure the second type with same name but with slight configuration difference will fail.
Create Emulated Device
After configuring the desired DOCA Devemu PCIe type, it is possible to create an emulated device based on the configured type using doca_devemu_pci_dev_create_rep. This sequential process ensures that the DOCA DevEmu PCIe device is created with the specified parameters and configuration defined by the PCIe type object. Furthermore, it is possible to destroy the emulated device using doca_devemu_pci_dev_destroy_rep.
The created device representor starts in "power_off" state and is not visible to the host until hot-plug sequence is issued by the user, see Hot-plug Emulated Device. The device can then be destroyed only while in "power_off" state.
The created emulated device may outlive the application that created it, see Objects Lifecycle and Persistency.
Hot-plug Emulated Device
Hot-plugging refers to the process of emulating the physical attachment of a PCIe device to the host PCIe subsystem after the system has been powered on and initialized. Note that some operating systems require additional settings to enable the process of hot-plugging a PCIe device. For supported systems, t his feature proves particularly advantageous for systems that need to remain operational at all times while expanding their hardware resources, such as additional storage and networking capabilities. DOCA DevEmu PCI provides software APIs that allow users to emulate this process in an asynchronous manner.
When creating a PCIe device object, if it starts in "power off" state, then the device is not yet visible to the host. It is possible then, from the BlueField, to hot-plug the device. This starts an async process of the device getting hot-plugged towards the host. Once the process completes, the emulated device transitions to "power on" and becomes visible to the host. Usually at this stage, the emulated device receives its BDF address. The hot-unplug process works in similar async manner.
Using DOCA API, the BlueField Arm can register to any changes to the hot-plug state of each emulated device using doca_devemu_pci_dev_event_hotplug_state_change_register.
Emulated Device Discovery
The emulated device is represented as a doca_devinfo_rep. It is possible to iterate through all the emulated devices as explained in DOCA Core Representor Discovery.
There are 2 ways of filtering the list of emulated devices:
Get all emulated devices – use DOCA_DEVINFO_REP_FILTER_EMULATED as the filter argument in doca_devinfo_rep_create_list
Get all emulated devices that belong to a certain type – doca_devemu_pci_type_create_rep_list
Objects Lifecycle and Persistency
This section creates distinction between firmware resources and software resources:
Firmware resources persist until the next power cycle, and can be accessible from different processes on the BlueField Arm. Such resources are not cleared once the application exits.
Software resources are representations of firmware resources, and are only relevant for the same thread
Using this terminology, it is possible to describe the objects as follows:
The PCIe type object doca_devemu_pci_type represents a PCIe type firmware resource. The resource persists if any of the following apply:
There is at least 1 process holding reference to the PCIe type
There is at least 1 PCIe device firmware resource belonging to this type
The emulated device representor, doca_devinfo_rep, represents an emulated PCIe function firmware resource:
doca_devemu_pci_dev_create_rep can be used to create such firmware resource
To destroy the firmware resource, doca_devemu_pci_dev_destroy_rep can be used
For static functions, the representor resource persists until configured otherwise in NVCONFIG
To find existing PCIe device firmware resources, use doca_devemu_pci_type_create_rep_list
Function Level Reset
The created emulated devices support PCIe function level reset (FLR).
Using DOCA API, the BlueField Arm can register to FLR event using doca_devemu_pci_dev_event_flr_register. Once the driver requests FLR, this event is triggered, calling the user provided callback.
Once FLR is detected, it is expected for the BlueField Arm to do the following:
Destroy all resources related to the PCIe device. For information on such resources, refer to the guide of concrete PCIe type (generic/virtiofs).
Stop the PCIe device
Start the PCIe device again
DOCA PCIe Device emulation requires a device to operate. For picking a device, see DOCA Core Device Discovery.
The device emulation library is only supported for BlueField-3.
As device capabilities may change in the future ( see Capability Checking ), it is recommended that users choose a device using the following method:
doca_devemu_pci_cap_type_is_hotplug_supported – for create and hot-plug support
doca_devemu_pci_cap_type_is_mgmt_supported – for device discovery only
Configuration Phase
To start using the DOCA DevEmu PCI Device, users must first go through a configuration phase as described in DOCA Core Context Configuration Phase.
This section describes how to configure and start the context to allow retrieval of events.
Configurations
The context can be configured to match the application use case.
To find if a configuration is supported or what its min/max value is, refer to Device Support.
Mandatory Configurations
All mandatory configurations are provided during the creation of the PCIe device.
These configurations are as follows:
A DOCA DevEmu PCIe type object
A DOCA Device Representor, representing an emulated function with the same type as the provided PCIe object type
A DOCA Progress Engine object
Optional Configurations
These configurations are optional. If not set, then a default value is used:
Registering to events as described in the "Events" section. By default, the user does not receive events.
Execution Phase
This section describes execution on CPU using DOCA Core Progress Engine.
Events
The DOCA DevEmu PCI device exposes asynchronous events to notify about sudden changes according to DOCA Core architecture.
Common events are described in DOCA Core Event.
Hotplug State Change
The hotplug state change event allows users to receive notifications whenever the hotplug state of the emulated device changes. See section " Hot-plug Emulated Device".
Configuration
Description |
API to Set the Configuration |
API to Query Support |
Register to the event |
doca_devemu_pci_dev_event_hotplug_state_change_register |
doca_devemu_pci_cap_type_is_hotplug_supported |
Trigger Condition
The event is triggered anytime an asynchronous transition happens as follows:
DOCA_DEVEMU_PCI_HP_STATE_PLUG_IN_PROGRESS → DOCA_DEVEMU_PCI_HP_STATE_POWER_ON
DOCA_DEVEMU_PCI_HP_STATE_UNPLUG_IN_PROGRESS → DOCA_DEVEMU_PCI_HP_STATE_POWER_OFF
DOCA_DEVEMU_PCI_HP_STATE_POWER_ON → DOCA_DEVEMU_PCI_HP_STATE_UNPLUG_IN_PROGRESS (when initiated by the host)
Any transition initiated by user is not triggered (e.g., calling hotplug to transition from POWER_OFF to PLUG_IN_PROGRESS).
The following APIs can be used to initiate hotplug or hot-unplug transition processes:
doca_devemu_pci_dev_hotplug
doca_devemu_pci_dev_hotunplug
Output
Common output as described in DOCA Core Event.
Additionally, the internal cached hotplug state is updated and can be fetched using doca_devemu_pci_dev_get_hotplug_state .
Event Handling
Once the event is triggered, it means that the hotplug state has changed. The application is expected to do the following:
Retrieve the new hotplug state using doca_devemu_pci_dev_get_hotplug_state
Function Level Reset
The FLR event allows users to receive notifications whenever the host initiates an FLR flow. See section " Function Level Reset".
Configuration
Description |
API to Set the Configuration |
Register to the event |
doca_devemu_pci_dev_event_flr_register |
Trigger Condition
The event is triggered anytime the host driver initiates an FLR flow. See section "Function Level Reset".
Output
Common output as described in DOCA Core Event.
Additionally, the internal cached FLR indicator is updated and can be fetched using doca_devemu_pci_dev_is_flr .
Event Handling
Once the event is triggered, it means that the host driver has initiated the FLR flow.
The user must handle the FLR flow by performing the following:
Flushing all the outstanding requests back to the associated resource.
Releasing all the PCIe device resources dynamically created after device start.
Stopping the PCIe device, doca_ctx_stop.
Starting the PCIe device again, doca_ctx_start. That is, c alling doca_pe_progress repeatedly until the PCIe device transitions to "running" state.
For more information on starting the PCIe device again, refer to section "State Machine".
State Machine
The DOCA DevEmu PCI device object follows the context state machine as described in DOCA Core Context State Machine.
The following section describes how to transition to any state and what is allowed in each state.
Idle
In this state, it is expected that application either:
Destroys the context
Starts the context
Allowed operations:
Configuring the context according to section "Configurations"
Starting the context
It is possible to reach this state as follows:
Previous State |
Transition Action |
None |
Create the context |
Running |
Call stop after making sure all resources have been destroyed |
Stopping |
Call progress until all resources have been destroyed |
Starting
In this state, it is expected that application:
Calls progress to allow transition to next state
Keeps context in this state until FLR flow is complete
It is possible to reach this state as follows:
Previous State |
Transition Action |
Idle |
Call start after receiving FLR event (i.e., while FLR is in progress) |
Running
In this state, it is expected that application:
Calls progress to receive events
Creates/destroys PCIe device resources
It is possible to reach this state as follows:
Previous State |
Transition Action |
Idle |
Call start after configuration |
Starting |
Call progress until FLR flow is completed |
Stopping
In this state, it is expected that application:
Destroys all emulated device resources as described in section " Function Level Reset".
Allowed operations:
Destroying PCIe device resources
It is possible to reach this state as follows:
Previous State |
Transition Action |
Running |
Call stop without freeing emulated device resources |