DOCA SNAP Virtio-fs Application Guide
This guide describes the DOCA SNAP Virtio-fs Application, which emulates a virtio-fs PCIe function on NVIDIA® BlueField® DPUs. The application leverages DOCA DevEmu APIs and NFS backends via libnfs to provide high-performance, hardware-accelerated file system emulation.
This feature is currently supported at alpha level.
The DOCA SNAP Virtio-fs Application demonstrates how to use the DOCA DevEmu PCI Generic API to emulate a virtio-fs PCIe function, enabling hardware-accelerated file system emulation on the BlueField DPU. This solution offloads virtio-fs handling from the host, facilitating seamless remote file access via NFS.

Host Side (Virtio-fs Client Driver)
Runs a standard virtio-fs client driver in the host operating system.
Host applications issue file system operations (e.g., open, read, write, mkdir).
The virtio-fs driver translates these into virtio-fs protocol I/O requests.
BlueField DPU (DOCA SNAP Virtio-fs Application)
Runs the DOCA SNAP Virtio-fs Application.
Retrieves virtio-fs I/O requests from the host via Virtio queues.
Decodes the requests into file operations (lookup, getattr, read, write, mkdir, etc.).
Translates file operations into NFS requests using libnfs APIs (based on libnfs4).
Network Communication (libnfs)
Communicates with a local or remote NFS server using libnfs.
Executes file operations remotely on the NFS server.
Receives responses from the NFS server and returns them to the host.
NFS Server
Receives NFS requests (read, write, getattr, mkdir, etc.) sent by libnfs.
Performs I/O operations on local storage.
Sends responses back to the DPU, which forwards them to the host through virtio-fs.
End-to-End Flow
Host application initiates a file system operation.
Virtio-fs client driver sends a request to the DPU.
The DPU processes the request and forwards it to the NFS server via libnfs.
The NFS server executes the request and returns the result.
The DPU sends the result back to the host through virtio-fs.
This architecture offloads file system emulation from the host to the DPU, bridging virtio-fs requests to remote storage using the NFS protocol.
Virtio-fs is a virtualization file system designed for efficient and secure file sharing between a host and guest VMs. It uses the Virtio framework to establish a high-performance communication channel, bypassing traditional network-based file system protocols to improve throughput and reduce latency.
Virtio-fs maps files and directories from a host or remote storage system into the guest’s address space, achieving near-native file system performance.
Virtio-fs emulation allows this functionality to be simulated in virtualized or development environments without the need for dedicated hardware devices.
Virtio-fs Protocol Components
FUSE (Filesystem in Userspace)
Virtio-fs uses the FUSE protocol to forward file system operations from the guest VM to the host. Key FUSE operations include:
LOOKUP – Resolves file names and retrieves attributes.
GETATTR – Fetches file metadata.
READ/WRITE – Performs I/O operations.
MKDIR/MKNOD/UNLINK – Manages file and directory creation and deletion.
Virtio Queues
Virtio-fs employs request and response queues to communicate between guest and host:
Request queue – The guest places file system operation requests.
Response queue – The host returns operation results or error statuses.
Initialization Process
The virtio-fs initialization process involves:
Setting up virtio-fs request and response queues.
Establishing shared memory regions for efficient data transfer.
Negotiating feature capabilities between guest and host.
Mounting the shared filesystem into the guest environment.
libnfs Integration
While virtio-fs typically interfaces with local host storage, integrating libnfs enables seamless file sharing with remote NFS targets. libnfs provides a user-space NFS client implementation, allowing the DOCA SNAP Virtio-fs Application to translate virtio-fs requests into NFS protocol operations.
File Operations Workflow
Guest VM submits a file system request (e.g., open, read).
Virtio-fs forwards the request to the host via the Virtio request queue.
The DPU processes the request and uses libnfs to communicate with the NFS server.
The NFS server executes the file operation and returns the result to the DPU.
The DPU forwards the response to the guest VM via the Virtio response queue.
Reset and Shutdown
Proper reset and shutdown procedures ensure file system integrity and resource cleanup:
Reset – Reinitializes Virtio queues and shared memory regions without affecting persistent data.
Shutdown – Gracefully terminates file operations, ensuring all pending transactions are flushed and committed.
This application leverages the following DOCA libraries:
For additional information about the used DOCA libraries, please refer to the respective programming guides.
The following software components are required to build and run the DOCA SNAP Virtio-fs Application:
BlueField-3 DPU
libnfs 4.0.0-1 for the FSDEV backend: https://packages.debian.org/bullseye/libnfs-dev. Install using the following command:
apt install libnfs-dev
For details on installing BlueField software and DOCA packages, refer to the DOCA Installation Guide for Linux.
DOCA reference applications are distributed with their source code and build instructions. This allows you to compile the applications "as-is" or modify the source code and rebuild custom versions.
For more information on DOCA applications, development workflows, and build guidance, see the DOCA Reference Applications documentation page.
The source code for the application is located at: /opt/mellanox/doca/applications/virtiofs/
.
Compiling All Applications
All DOCA reference applications are part of a unified meson project. By default, building the project compiles all available applications.
To build all applications:
cd /opt/mellanox/doca/applications/
meson /tmp/build
ninja -C /tmp/build
doca_virtiofs
is created under /tmp/build/virtiofs/
Compiling Only the Current Application
To build only the virtio-fs application, use the following command to disable all other applications:
cd /opt/mellanox/doca/applications/
meson /tmp/build -Denable_all_applications=false
-Denable_virtiofs=true
ninja -C /tmp/build
doca_virtiofs
is created under /tmp/build/virtiofs/
.
Alternatively, you can configure build flags directly in the meson_options.txt
file rather than passing them as command-line arguments:
Edit the following file
/opt/mellanox/doca/applications/meson_options.txt
.Modify the flags:
Set
enable_all_applications
tofalse
Set
enable_virtiofs
totrue
Run the standard build commands:
cd /opt/mellanox/doca/applications/ meson /tmp/build ninja -C /tmp/build
Infodoca_virtiofs
is created under/tmp/build/virtiofs/
Troubleshooting
For troubleshooting guidance related to building DOCA applications, refer to the NVIDIA BlueField Platform Software Troubleshooting Guide.
Prerequisites
Enable the firmware configuration with virtio-fs emulation PF. A cold reboot is required after updating the firmware configuration:
[dpu] mlxconfig -d /dev/mst/mt41692_pciconf0 s VIRTIO_FS_EMULATION_ENABLE=1 VIRTIO_FS_EMULATION_NUM_PF=1
This application is based on DOCA Flow and requires huge page allocation. Ensure huge pages are configured on the system.
Configure a local or remote mounted NFS target.
Remote NFS server example:
sudo apt install nfs-kernel-server echo "/export/data 10.245.16.0/24(rw,sync,no_subtree_check,insecure)" >> /etc/exports sudo systemctl start nfs-kernel-server.service
Local NFS server example:
apt-get install nfs-kernel-server mkdir /VIRTUAL echo "/VIRTUAL 127.0.0.1(rw,no_root_squash,insecure,no_subtree_check,sync)" >> /etc/exports service nfs-kernel-server restart
Application Execution
This application is provided in source form; therefore, compilation is required before execution. Refer to the "Compiling the Application" section for build instructions.
Applicatoin Usage
Usage: doca_virtiofs [DOCA Flags] [Program Flags]
DOCA Flags:
-h, --help Print a help synopsis
-v, --version Print program version information
-l, --log-level Set the (numeric) log level for
the program <10
=DISABLE, 20
=CRITICAL, 30
=ERROR, 40
=WARNING, 50
=INFO, 60
=DEBUG, 70
=TRACE>
--sdk-log-level Set the SDK (numeric) log level for
the program <10
=DISABLE, 20
=CRITICAL, 30
=ERROR, 40
=WARNING, 50
=INFO, 60
=DEBUG, 70
=TRACE>
Program Flags:
-m, --core-mask <core_mask> Set core mask.
-s, --nfs-server <nfs_server> Set NFS server.
-e, --nfs-export <nfs_export> Set NFS export.
Use the -h
or --help
flag to print this usage information at runtime:
./doca_virtiofs -h
For additional information, refer to the "Command Line Flags" section.
Example Command Line Execution
Run the application on BlueField with a specified core mask, NFS server, and export path:
./doca_virtiofs -m 0xff
-s 1.1
.1.11
-e /export/data
JSON-Based Deployment Mode
The application also supports JSON-based configuration for deployment. All parameters can be provided via a JSON file:
./doca_virtiofs --json [json_file]
For example:
./doca_virtiofs --json ./virtiofs_params.json
Ensure the JSON file includes the correct configuration parameters, particularly the PCIe addresses required for deployment.
Mounting the Virtio-fs Device (Host/VM Side)
To mount the virtio-fs device from the host or guest VM:
Create a mount point and load the
virtiofs
module (if not already loaded):mkdir /tmp/test modprobe -v virtiofs mount -t virtiofs docavirtiofs /tmp/test
To unmount and clean up:
umount /tmp/test modprobe -rv virtiofs
Command Line Flags
Flag Type | Short Flag | Long Flag/JSON Key | Description | JSON Content |
General flags |
|
| Prints a help synopsis | N/A |
|
| Prints program version information | N/A | |
|
| Set the log level for the application:
|
| |
N/A |
| Sets the log level for the program:
|
| |
|
| Parse all command flags from an input json file | N/A | |
Program flags |
|
| Application core mask |
|
|
| Local or remote NFS server IP. |
| |
|
|
Local or remote NFS export folder. |
|
Refer to DOCA Arg Parser for more information regarding the supported flags and execution modes.
Troubleshooting
Please refer to the NVIDIA BlueField Platform Software Troubleshooting Guide for any issue you may encounter with the installation or execution of the DOCA applications.
This section provides a comprehensive guide to implementing a virtio-fs device emulation application using the DOCA library. The DOCA Virtio-fs Reference Application demonstrates how to leverage hardware acceleration on a Data Processing Unit (DPU) to emulate a fully functional file system storage device, appearing to the host as a standard virtio-fs PCIe device.
The implementation provides hardware-accelerated filesystem emulation, utilizing DPU resources for high-performance operation processing. The multi-threaded architecture, with explicit CPU core affinity, ensures optimal resource utilization and minimizes contention between worker threads. Asynchronous processing throughout the pipeline maintains high throughput by avoiding blocking operations that could degrade system performance.
The design supports basic filesystem functionality, including file and directory operations, metadata management, and data transfers. While the reference implementation uses default configuration values and supports a single device, the underlying architecture is designed for scalability and can be extended to support multiple devices with custom configurations in production environments.
This modular approach allows developers to understand each phase of the application flow independently while appreciating how they integrate into a complete virtio-fs emulation solution using DOCA libraries and hardware acceleration.
Framework Initialization
This section outlines the essential APIs and data structures required to initialize the DOCA virtio-fs emulation framework before creating and managing virtio-fs devices.
Initialization Flow
Parse application arguments
Initialize the argument parser framework using
doca_argp_init()
.Register application-specific parameters via a custom function (e.g.,
register_application_params()
).Store parsed values in a user-defined
struct application_config
.
Start argument parsing
Parse command-line arguments using
doca_argp_start()
.Validate arguments and populate the configuration structure with parsed values.
Create logging backends
Configure application log backend using
doca_log_backend_create_standard()
.Set up SDK internal message routing with
doca_log_backend_create_with_file_sdk()
.Adjust SDK log verbosity using
doca_log_backend_set_sdk_level()
.Logging configurations are maintained using
struct doca_log_backend
.
Create VFS configuration
Create a virtio-fs configuration object using
doca_devemu_vfs_cfg_create()
.The returned
struct doca_devemu_vfs_cfg
will hold framework parameters and the device list.
Discover available devices
Enumerate DOCA-capable hardware devices using
doca_devinfo_create_list()
.The function returns an array of
struct doca_devinfo
containing device properties and capabilities.
Filter supported devices
For each discovered device, verify virtio-fs emulation capability using
doca_devemu_vfs_is_default_vfs_type_supported()
.
Open supported devices
For devices supporting virtio-fs emulation, open device handles using
doca_dev_open()
.This returns
struct doca_dev
handles necessary for subsequent operations.
Add devices to configuration
Register opened device handles into the virtio-fs configuration object using
doca_devemu_vfs_cfg_add_dev()
.
Set framework parameters
Configure per-request context size and other framework-specific parameters using
doca_devemu_vfs_cfg_set_vfs_fuse_req_user_data_size()
.
Initialize the virtio-fs framework
Finalize the global framework setup with
doca_devemu_vfs_init()
.After this call, the framework assumes ownership of all registered devices.
Clean up temporary configuration objects
Release the temporary configuration object using
doca_devemu_vfs_cfg_destroy()
.Free device information resources with
doca_devinfo_destroy_list()
, as the framework now owns the devices.
Key Points
Resource Ownership – After initialization, the virtio-fs framework takes ownership of all registered device handles.
Global Initialization – Framework initialization must be performed once per application instance.
Thread Safety – All initialization procedures should be executed from the main thread before spawning worker threads.
Resource Management
This section describes the setup and initialization of core resources required for virtio-fs operation after the framework initialization is complete. These resources include device managers, worker threads, progress engines, memory pools, DMA contexts, and thread-local configurations necessary for high-performance virtio-fs emulation.
Resource Management Flow
Create device managers
Retrieve the default virtio-fs emulation type using
doca_devemu_vfs_type_create()
, which returns astruct doca_devemu_vfs_type
.For each opened device, create a device manager using
doca_devemu_vfs_manager_create()
. Each manager is represented by astruct doca_devemu_vfs_manager
.
Initialize manager list
Organize all device managers into a list structure for efficient management.
Use application-defined linked list operations (e.g.,
SLIST_INIT
,SLIST_INSERT_HEAD
) to build and maintain the manager list.The list is defined using an application-specific
struct manager_list
.
Determine thread configuration
Calculate the number of worker threads based on the user-specified CPU core mask.
Use CPU enumeration functions to identify available cores and select which cores to allocate for virtio-fs worker threads.
Create worker threads
For each designated CPU core, spawn a worker thread using
pthread_create()
.Bind each thread to a specific CPU core using
pthread_setaffinity_np()
to ensure core affinity and minimize resource contention.Thread handles are stored in
pthread_t
, with CPU affinity settings managed viacpu_set_t
.
Initialize progress engines
For each worker thread, create a progress engine using
doca_pe_create()
.Connect application contexts to the progress engines with
doca_pe_connect_ctx()
to enable asynchronous operation handling.Progress engine handles are stored in
struct doca_pe
.
Set up memory pools
Create memory pools for efficient buffer allocation and management using
doca_mpool_create()
.For DMA operations, create buffer inventories using
doca_buf_inventory_create()
.Memory pools and inventories are represented by
struct doca_mpool
andstruct doca_buf_inventory
, respectively.
Create DMA contexts
Initialize DMA contexts for data transfer operations using
doca_dma_create()
.Configure DMA task parameters, such as memcpy settings, using
doca_dma_task_memcpy_set_conf()
.DMA context handles are maintained using
struct doca_dma
.
Configure thread contexts
Set up per-thread context structures using application-defined initialization routines.
Assign memory pools and DMA contexts to each thread.
Configure thread-local storage to maintain isolation and minimize cross-thread contention.
Thread-specific data is managed using an application-defined
struct thread_context
.
Start worker threads
Launch all worker threads using pthread-based startup mechanisms.
Each thread enters its main processing loop, where it continuously polls for incoming requests and performs I/O operations.
Verify resource readiness
Validate that all required resources have been properly initialized:
All device managers are created and registered.
Worker threads are running and bound to designated CPU cores.
Memory pools and DMA contexts are allocated and ready.
Progress engines are connected to their corresponding contexts.
Key Points
Thread Affinity – Each worker thread must be bound to a dedicated CPU core to ensure optimal performance and minimize scheduling overhead.
Admin Thread Role – The first worker thread (admin thread) is responsible for device lifecycle operations, such as polling control events, while other threads focus on I/O processing.
Memory Pool Sizing – Memory pool sizes should be configured based on anticipated workload, queue depths, and expected concurrency levels to prevent allocation bottlenecks.
Progress Engines – Each worker thread requires its own progress engine to independently handle asynchronous operations and maintain high throughput.
Resource Isolation – Ensure that threads have isolated resources (e.g., memory pools, DMA contexts) to eliminate cross-thread contention and performance degradation.
Error Recovery – If resource creation fails at any stage, the application must perform cleanup of already-initialized resources to avoid leaks and maintain system stability.
Scalability – Resource allocation logic should scale with the number of configured CPU cores, allowing the application to adapt to various deployment environments.
Device Configuration
This section describes the process of creating and configuring virtio-fs device instances after the resource management infrastructure is in place. The reference application demonstrates how to emulate a virtio-fs PCIe function using hardware acceleration to fully emulate a file system storage device.
For simplicity, this implementation supports a single virtio-fs device created with default configuration values (number of queues, queue size, filesystem tag). However, the DOCA DevEmu PCI Generic API supports scalable deployments with multiple devices and fully customizable parameters.
Device Configuration Flow
Create NFS backend filesystem
Initialize the NFS filesystem backend using an application-defined
nfs_fsdev_create()
function, which takes the NFS server address and export path (provided via command-line arguments).The backend must be ready before creating the virtio-fs device, as it will be linked during device instantiation.
InfoBackend creation details are provided in the "Backend Integration" section.
Search for available virtio-fs functions
Discover available virtio-fs PCIe functions using
doca_devinfo_rep_create_list()
with theDOCA_DEVINFO_REP_FILTER_EMULATED
filter.Iterate through the list to find the first available virtio-fs function for device creation.
This reference implementation selects the first match for simplicity.
Create device representors
Open the selected representor device using
doca_dev_rep_open()
to establish a communication channel between the DPU and the host.Representor handles are managed via
struct doca_dev_rep
, while representor information is stored instruct doca_devinfo_rep
.
Match VUIDs to representors
Retrieve the Virtual Unique Identifier (VUID) for each representor using
doca_devinfo_rep_get_vuid()
.Match the VUID against configured values to associate representors with specific virtio-fs virtual functions visible to the host.
Create virtio-fs devices
Instantiate the virtio-fs device using
doca_devemu_vfs_dev_create()
, providing the selected representor, the admin thread’s progress engine, and the initialized NFS backend.The resulting device handle is maintained via
struct doca_devemu_vfs_dev
.
Configure device attributes
Set device-specific parameters:
Filesystem tag using
doca_devemu_vfs_dev_set_tag()
(default tag in reference implementation).Number of request queues using
doca_devemu_vfs_dev_set_num_request_queues()
(default value).Queue size using
doca_devemu_virtio_dev_set_queue_size()
(default size).
In production, these parameters can be customized to meet performance and scaling requirements.
Configuration details are maintained in an application-defined
struct device_config
.
Set up device callbacks
Register event handlers for device lifecycle management:
Device reset events via
doca_devemu_virtio_dev_event_reset_register()
.PCIe Function Level Reset (FLR) events via
doca_devemu_pci_dev_event_flr_register()
.State change notifications using
doca_ctx_set_state_changed_cb()
.
Application-defined callback functions handle these events appropriately.
Configure virtio properties
Specify the number of I/O contexts using
doca_devemu_virtio_dev_set_num_required_running_virtio_io_ctxs()
(should match the number of worker threads).Set the total number of virtqueues using
doca_devemu_virtio_dev_set_num_queues()
, including request queues and a high-priority queue for metadata operations.
Initialize device contexts
Allocate per-device context structures using application-defined initialization functions.
Link the virtio-fs device with its assigned device manager and NFS backend.
These contexts maintain the association between virtio-fs emulation and underlying storage.
Validate device configuration
Ensure all device parameters are correctly configured:
Queue sizes within hardware limits.
Filesystem tags properly formatted.
VUIDs correspond to valid virtual functions.
All required event callbacks are registered.
Validation prevents runtime misconfiguration and operational failures.
Add devices to global list
Register the configured device into the application’s global device list using linked list operations (
SLIST_INSERT_HEAD
).Maintain counters for tracking active devices.
Although this reference supports a single device, the infrastructure allows scaling to multiple devices.
Prepare for device startup
Set the initial device state to "stopped."
Store startup callback information for later use when the host virtio-fs driver initiates device activation.
Devices remain in the configuration state until explicitly started by host-side interactions.
Key Points
Backend-first approach – The NFS backend must be created before virtio-fs device creation, as the device requires a functional storage backend during initialization.
Function discovery – This reference uses the first available virtio-fs function. Production deployments can implement advanced selection logic based on performance or topology requirements.
Reference implementation scope – Demonstrates single-device virtio-fs emulation with default configurations. DOCA APIs support large-scale, multi-device deployments with customizable parameters.
Hardware acceleration – The virtio-fs PCIe function leverages hardware acceleration on the DPU for high-performance file system emulation.
VUID association – Each virtio-fs device must be linked to a unique VUID that maps to a specific PCIe virtual function visible to the host OS.
Admin thread execution – All device creation and configuration operations should be performed by the admin thread to ensure synchronization and thread safety.
Queue configuration – The total number of queues includes request processing queues and a dedicated high-priority queue for handling filesystem metadata operations.
Event handling – Correct registration of reset, FLR, and state change callbacks is essential for managing device lifecycle events triggered by the host.
Resource validation – Comprehensive validation of device parameters ensures system stability and avoids runtime failures during activation.
State management – Devices are initialized in a stopped state and require explicit commands to transition to an operational state when activated by the host.
Scalability potential – Though this reference focuses on a single device, the architecture supports scalable deployments with multiple virtio-fs devices, each configurable for different use cases.
Backend Integration
This section details the setup and integration of the NFS filesystem backend, which provides the actual storage functionality behind the virtio-fs emulation. The backend acts as a bridge between FUSE operations received from the host and the underlying NFS storage system.
Backend Integration Flow
Initialize NFS client library
Set up the NFS client infrastructure using libnfs initialization routines.
This includes initializing the RPC layer and NFS protocol handlers that facilitate communication with the remote NFS server.
NFS client state is managed through libnfs context structures.
Create NFS filesystem device
Instantiate the NFS backend using an application-defined
nfs_fsdev_create()
function, providing the configured NFS server address and export path.This creates the main NFS filesystem device object (
struct nfs_fsdev
), which encapsulates the connection to the NFS server and provides the interface for file operations.
Configure NFS connection parameters
Set connection parameters such as:
NFS protocol version (typically NFSv3) using
nfs_set_version()
.RPC timeout values using
nfs_set_timeout()
.Connection-specific settings such as buffer sizes and retry counts.
These parameters should be tuned based on network conditions and server performance characteristics.
Establish NFS server connection
Connect to the NFS server using
nfs_mount()
to mount the specified export path.This step involves mount protocol negotiations and, if required, authentication procedures.
The connection must be fully established before initiating any file operations.
Set up per-thread NFS contexts
For each worker thread, create a dedicated NFS client context using
nfs_fsdev_get()
.Thread-local NFS contexts enable concurrent operations without requiring synchronization across threads.
Per-thread state is managed through
struct nfs_fsdev_thread_ctx
.
Initialize operation mapping
Define a translation layer that maps FUSE operations (e.g., LOOKUP, READ, WRITE, GETATTR) to corresponding NFS RPC calls (e.g., LOOKUP3, READ3, WRITE3, GETATTR3).
Application-defined handler functions are used to perform this mapping for each supported FUSE operation.
Configure asynchronous operation support
Implement asynchronous NFS operations using callback mechanisms.
NFS requests are submitted asynchronously, with completion handled via registered callbacks to ensure non-blocking execution.
Application-defined callback structures are used for tracking in-flight operations and handling their completion.
Initialize request context pools
Allocate memory pools for NFS request contexts to efficiently manage allocation and reuse.
Each NFS operation requires a context structure to track its state and manage associated resources until completion.
Set up error handling and recovery
Implement error handling mechanisms to:
Translate NFS error codes to their corresponding FUSE error codes.
Retry transient failures where applicable.
Recover from NFS server disconnections by attempting reconnection or triggering failover logic.
Register progress polling
Integrate progress polling functions that are periodically invoked within each worker thread’s main loop.
These functions check for completed NFS operations and trigger their associated completion callbacks.
Efficient polling ensures timely processing of operations without idle wait cycles.
Validate backend readiness
Perform backend readiness checks, such as:
Validating connectivity by performing a stat operation on the root directory of the NFS export.
Ensuring that all per-thread NFS contexts are properly initialized.
Confirming that the FUSE-to-NFS operation mapping is complete and functional.
Integrate with virtio-fs device
Link the prepared NFS backend to the virtio-fs device instance.
This integration ensures that FUSE requests received by the virtio-fs emulation layer are routed to the appropriate NFS backend handlers for processing.
Key Points
NFS Protocol Support – The backend typically uses NFSv3 for broad compatibility, though NFSv4 can be employed for advanced features.
Thread-Safe Operations – Each worker thread must operate with its own NFS client context to avoid synchronization overhead and maintain parallelism.
Asynchronous Processing – All NFS operations are performed asynchronously, preventing blocking behavior and ensuring high throughput.
Error Translation – Correct mapping between NFS and FUSE error codes is essential for consistent behavior on the host side.
Connection Resilience – The backend must gracefully handle NFS server disconnections, implementing automatic reconnection or recovery procedures.
Performance Tuning – NFS client parameters (e.g., timeouts, buffer sizes) should be tuned based on workload characteristics and network conditions.
Memory Management – Efficient allocation and reuse of request context structures are critical for performance at scale.
Progress Polling – Regular polling for completed NFS operations ensures timely request completion and system responsiveness.
Operation Mapping – A comprehensive FUSE-to-NFS operation mapping ensures full filesystem functionality for the host.
Backend Abstraction – The architecture supports backend abstraction, enabling future support for alternative storage backends (e.g., local POSIX FS, object storage) through the same interface.
Request Handler Registration
This section describes the process of registering FUSE operation handlers to process filesystem requests from the host, setting up DMA contexts for efficient data transfers, and configuring asynchronous callbacks for request completion and error handling.
Request Handler Registration Flow
Create VFS I/O contexts
Initialize I/O contexts for each worker thread using
doca_devemu_vfs_io_create()
.Each worker thread receives its own
struct doca_devemu_vfs_io
handle to process FUSE requests independently, avoiding cross-thread contention.
Register FUSE operation handlers
For every supported FUSE operation, register corresponding handler functions:
Lookup:
doca_devemu_vfs_io_event_vfs_fuse_lookup_req_handler_register()
Read:
doca_devemu_vfs_io_event_vfs_fuse_read_req_handler_register()
Write:
doca_devemu_vfs_io_event_vfs_fuse_write_req_handler_register()
Getattr, Setattr, Create, Mkdir, etc.
Each handler translates the incoming FUSE request into corresponding backend operations (e.g., NFS RPC calls) and manages the request lifecycle.
Set up DMA contexts for data transfers
Create DMA contexts for each I/O context using
doca_dma_create()
.Configure DMA memcpy task parameters using
doca_dma_task_memcpy_set_conf()
, including completion and error callbacks.DMA contexts are essential for enabling efficient zero-copy data transfers between host memory and DPU memory.
Configure DMA completion callbacks
Register completion and error handlers for DMA operations using
doca_dma_task_memcpy_set_conf()
.Application-defined callback functions will handle the completion of data transfers and resume processing of the originating FUSE requests.
Connect I/O contexts to progress engines
Associate each I/O context with its corresponding thread’s progress engine using
doca_pe_connect_ctx()
.This ensures that I/O operations are executed within the correct thread context, maintaining thread affinity and balanced load distribution.
Initialize request context management
Set up request context pools using application-defined allocation mechanisms.
Each incoming FUSE request is assigned a context structure that tracks its state, associated DMA operations, and completion handlers.
Configure asynchronous operation handling
FUSE request handlers should submit backend operations (e.g., NFS RPC calls) asynchronously.
Completion callbacks are registered to be invoked when backend operations complete, enabling concurrent request processing without blocking worker threads.
Set up request queuing and scheduling
Implement queuing mechanisms using application-defined structures to manage pending requests during high-load conditions.
Queuing helps maintain fairness and prevents resource exhaustion by controlling the number of active in-flight operations.
Register error handling callbacks
Configure error handling routines to:
Translate backend error codes to corresponding FUSE error responses.
Handle DMA operation failures gracefully.
Ensure proper cleanup of request contexts in failure scenarios.
Initialize request statistics and monitoring
Set up performance monitoring structures to track:
Request counts.
Latency metrics.
Error rates.
Other relevant performance indicators.
This data aids in debugging, optimization, and capacity planning.
Configure request validation
Implement input validation routines to verify:
Request parameters.
Validity of file handles.
Access permissions.
Format and correctness of incoming requests.
Early validation helps prevent crashes, resource leaks, and security vulnerabilities.
Start I/O context processing
Activate each I/O context using
doca_ctx_start()
.Once started, I/O contexts will begin accepting and processing FUSE requests from the host, dispatching them to the appropriate registered handlers.
Key Points
Complete Operation Coverage – All relevant FUSE operations must have registered handlers to ensure full filesystem functionality on the host.
Thread-Specific Contexts – Each worker thread must have its own VFS I/O context to avoid contention and maintain scalability.
DMA Integration – Proper setup of DMA contexts and completion callbacks is essential for efficient data movement during read/write operations.
Asynchronous Processing – Handlers should submit backend operations asynchronously to maximize throughput and avoid blocking threads.
Robust Error Handling – Comprehensive error translation, recovery, and cleanup routines ensure stable and predictable application behavior.
Request Lifecycle Management – Every request must be tracked from reception to completion, ensuring proper resource allocation and cleanup.
Efficient Resource Management – Request contexts and DMA buffers should be managed via efficient allocation and reuse strategies to support high workloads.
Performance Monitoring – Request statistics provide visibility into system performance, aiding in bottleneck identification and optimization efforts.
Input Validation – Validating incoming requests early prevents invalid operations and mitigates security risks.
Progress Engine Integration – All I/O contexts must be connected to progress engines to ensure correct and efficient operation processing.
Callback Management – Completion and error callbacks for all operations must be correctly registered and invoked to maintain request flow and ensure system integrity.
Device Lifecycle Management
This section describes the procedures for starting device emulation, handling device state transitions, managing reset events, and implementing error recovery mechanisms to ensure robust and continuous virtio-fs device operation throughout its lifecycle.
Device Lifecycle Management Flow
Start VFS device context
Begin device emulation by starting the main virtio-fs device context using
doca_ctx_start()
.This transitions the device from IDLE to STARTING state and initiates the device initialization sequence.
The operation is asynchronous, with completion signaled through state change callbacks.
Start I/O contexts
Activate all worker thread I/O contexts using
doca_ctx_start()
for eachstruct doca_devemu_vfs_io
created during request handler registration.All I/O contexts must be started to begin processing FUSE requests from the host.
Contexts are started in parallel across all worker threads.
Monitor device state transitions
Use registered state change callbacks (
doca_ctx_set_state_changed_cb()
) to track device state transitions:IDLE → STARTING → RUNNING
These callbacks update application-level state tracking and trigger actions appropriate to each transition.
Handle device startup completion
When the device reaches RUNNING state:
Update device status.
Log successful startup.
Invoke any registered startup completion callbacks.
At this point, the device is fully operational and visible to the host system.
Manage device visibility to host
Upon entering RUNNING state, the virtio-fs device is exposed to the host as a PCIe device.
The host OS can now mount the filesystem and start sending FUSE requests for processing.
Handle device reset events
Process device reset requests from the host using the registered reset event callback (
doca_devemu_virtio_dev_event_reset_register()
).Reset handling includes:
Stopping the device.
Clearing internal state.
Re-initializing and restarting the device to resume normal operation.
Handle function-level reset (FLR)
Process PCIe Function Level Reset (FLR) events using the registered FLR callback (
doca_devemu_pci_dev_event_flr_register()
).FLR handling requires stopping all device activity and reinitializing the device to a clean state to maintain PCIe protocol compliance.
Implement graceful device stop
Use
doca_ctx_stop()
to gracefully shut down device operation.The shutdown sequence involves:
Stopping all I/O contexts first.
Then stopping the main device context.
Ensuring all pending operations are either completed or safely aborted.
Manage device state during stop
Track device shutdown transitions using state change callbacks:
RUNNING → STOPPING → IDLE
Each transition requires specific cleanup actions and resource deallocation to ensure a clean shutdown.
Handle error recovery
Implement error recovery mechanisms to detect device failures and perform recovery actions:
Attempt soft recovery where possible.
Escalate to a full device restart if necessary.
These procedures ensure device availability during transient failures.
Coordinate multi-context lifecycle
Synchronize lifecycle operations across the main device context and all associated I/O contexts.
Application-defined coordination mechanisms should be used to ensure proper sequencing of start/stop operations and avoid race conditions.
Cleanup on device removal
On device removal or application shutdown:
Ensure all contexts are properly stopped.
Free all allocated resources.
Close backend connections.
Cleanly remove the device from the host system to prevent resource leaks or dangling devices.
Key Points
Asynchronous Operations – Device start and stop procedures are asynchronous; completion is monitored via state change callbacks.
State Synchronization – Synchronizing the state between the main device context and multiple I/O contexts is essential for correct and reliable operation.
Host Visibility – The device becomes visible to the host only after successfully transitioning to
RUNNING
state.Reset Handling – Proper handling of both VirtIO resets and PCIe FLR events is required to maintain compatibility with standard host drivers.
Graceful Shutdown – A structured stop sequence ensures all in-flight operations are safely completed or canceled, preventing data loss or corruption.
Error Resilience – Robust error detection and recovery procedures help maintain device availability in the event of transient or recoverable errors.
Resource Cleanup – All resources (contexts, memory pools, DMA buffers) must be freed upon shutdown to prevent memory leaks and resource exhaustion.
Thread Coordination – Lifecycle operations (start/stop) must be carefully coordinated across multiple worker threads to ensure ordered and race-free execution.
Callback Management – State change, reset, and FLR callbacks are critical for tracking device status and triggering necessary actions throughout the lifecycle.
Host Driver Compatibility – Proper lifecycle management is essential to maintaining compatibility with standard virtio-fs drivers on the host side.
Startup Sequencing – Device and I/O contexts must be started in the correct order to ensure proper initialization and readiness.
Recovery Procedures – Failed startup or runtime errors should trigger cleanup routines and proper error reporting to maintain system stability.
Runtime Operations
This section describes the core operational phase where the application processes incoming FUSE requests from the host, executes corresponding filesystem operations via the NFS backend, and manages data transfers between the host and storage with hardware-accelerated efficiency.
Runtime Operations Flow
Receive FUSE requests from host
Incoming filesystem requests from the host are received through the virtio-fs device.
The appropriate FUSE operation handler is invoked based on the request's operation code (e.g., LOOKUP, READ, WRITE, GETATTR).
Requests are dispatched to their corresponding handler functions for further processing.
Parse and validate FUSE requests
Extract request parameters using:
struct fuse_in_header
for common fields.Operation-specific structures (e.g.,
struct fuse_read_in
,struct fuse_write_in
) for request-specific parameters.
Validate fields such as file handles, offsets, sizes, and permissions to ensure requests are well-formed before processing.
Allocate request context
Allocate a request context structure from a pre-allocated pool to manage the request’s lifecycle.
The context tracks request state, intermediate results, associated DMA operations, and completion callbacks.
Translate FUSE operations to NFS calls
Map FUSE operations to corresponding NFS RPC calls:
FUSE_LOOKUP → NFS LOOKUP3
FUSE_READ → NFS READ3
FUSE_WRITE → NFS WRITE3
Parameter translation ensures compatibility between FUSE and NFS semantics.
Submit operations to NFS backend
Submit NFS operations asynchronously using thread-specific NFS contexts.
Asynchronous submission allows multiple operations to be in-flight concurrently without blocking worker threads.
Handle data transfers for read/write operations
For read operations:
DMA is used to transfer data from NFS buffers to host memory.
For write operations:
DMA transfers data from host memory to NFS buffers before sending it to the NFS server.
DMA enables zero-copy data movement, reducing CPU overhead and improving performance.
Process NFS operation completion
Upon NFS operation completion, registered callbacks are invoked.
Callbacks process NFS responses, extract result data, and prepare corresponding FUSE responses for the host.
Manage concurrent operations
Per-thread request queues and asynchronous processing allow worker threads to handle multiple requests concurrently.
Non-blocking operations ensure high throughput even under heavy load.
Handle operation errors
Failed operations are processed by translating NFS error codes to appropriate FUSE error responses.
Common errors include:
ENOENT (file not found)
EACCES (permission denied)
EIO (I/O errors)
Errors are propagated back to the host to ensure correct application behavior.
Complete FUSE responses
Responses are sent back to the host using
doca_virtiofs_req_complete()
or similar APIs.Operation results are packaged into FUSE response structures and delivered through the virtio-fs interface to the host driver.
Update request statistics
Application-defined statistics collection tracks:
Request counts.
Latency metrics.
Error rates.
Throughput measurements.
Continuous monitoring helps identify bottlenecks and optimize system performance.
Manage memory and resource cleanup
After request completion:
Release DMA buffers.
Free or recycle request contexts.
Ensure all resources are properly cleaned up to prevent memory leaks during prolonged operation.
Key Points
High Concurrency – Multiple requests are processed simultaneously across worker threads, ensuring maximum throughput and low latency.
Zero-Copy Transfers – DMA operations enable efficient data transfers between host and DPU memory without involving the CPU.
Asynchronous Processing – All filesystem and data operations are handled asynchronously to prevent thread blocking and maintain system responsiveness.
Error Propagation – Accurate translation of NFS errors to FUSE responses is essential for correct host-side application behavior.
Resource Efficiency – Request contexts and DMA buffers are pooled and reused to minimize allocation overhead and support high workloads.
Thread Isolation – Each worker thread processes requests independently, avoiding shared resource contention and synchronization bottlenecks.
Performance Monitoring – Real-time statistics collection provides visibility into system health, enabling proactive performance tuning.
Memory Management – Proper allocation and cleanup of resources are critical to prevent memory leaks during long-running operations.
Protocol Compliance – All FUSE responses must conform to the virtio-fs specification to ensure compatibility with standard host drivers.
Load Balancing – Request distribution across worker threads balances workload and optimizes hardware utilization.
Data Consistency – Proper ordering and synchronization of requests ensure data consistency between the host and the NFS backend.
Scalability – The architecture supports scaling to handle high request rates and large numbers of concurrent operations across multiple worker threads.
Cleanup and Shutdown
This section describes the procedures for gracefully shutting down the application, freeing allocated resources, closing active connections and contexts, and ensuring a clean termination without leaving resource leaks or dangling system components.
Cleanup and Shutdown Flow
Initiate graceful shutdown –
On receiving termination signals (SIGINT, SIGTERM), signal handlers set shutdown flags to begin the termination process.
The shutdown sequence ensures that all in-flight operations are allowed to complete before services are stopped.
Stop accepting new requests
Disable the acceptance of new FUSE requests.
Request handlers will reject incoming operations with appropriate error responses while allowing ongoing operations to complete.
Drain pending operations
Wait for all in-flight operations to complete:
NFS operations.
DMA transfers.
Outstanding FUSE requests.
Ensure all request contexts are returned to their respective pools before proceeding.
Stop VFS I/O contexts
Invoke
doca_ctx_stop()
on each VFS I/O context associated with worker threads.These operations are asynchronous; completion is monitored through state change callbacks.
Stopping I/O contexts halts request processing and triggers per-thread resource cleanup.
Stop main VFS device context
Stop the primary device context using
doca_ctx_stop()
.The device transitions from RUNNING → STOPPING → IDLE, becoming invisible to the host.
Device-level operations are halted.
Cleanup DMA resources
Free DMA contexts using
doca_dma_destroy()
for each allocated DMA context.Ensure any pending DMA operations are properly finalized and associated memory allocations are released.
Destroy VFS I/O contexts
Free I/O contexts for each worker thread using
doca_devemu_vfs_io_destroy()
.This step ensures cleanup of per-thread processing resources.
Close NFS backend connections
Unmount NFS exports and destroy NFS client contexts:
Use
nfs_umount()
to unmount NFS paths.Use
nfs_destroy_context()
to release NFS client-side resources.
Clean up per-thread NFS contexts and connection pools to ensure backend disconnection.
Stop worker threads
Terminate worker threads gracefully:
Use
pthread_join()
to wait for threads to exit cleanly.Use
pthread_cancel()
for unresponsive threads if necessary.
Ensure no threads remain active before continuing with shared resource deallocation.
Destroy progress engines
Free progress engines using
doca_pe_destroy()
for each progress engine created during initialization.This ensures proper cleanup of asynchronous operation handling infrastructure.
Free memory pools and inventories
Destroy memory pools using
doca_mpool_destroy()
and buffer inventories usingdoca_buf_inventory_destroy()
.This step returns all allocated buffers and memory resources to the system.
Close devices and cleanup framework
Close all opened devices and representors:
Use
doca_dev_close()
for device handles.Use
doca_dev_rep_close()
for representor handles.
Finalize VFS framework cleanup to ensure hardware resources are fully released.
Destroy device and manager resources
Free application-specific resources:
Device lists.
Manager structures.
Configuration objects.
Clean up linked lists, hash tables, and other dynamic data structures used in the application.
Cleanup argument parser and logging infrastructure
Destroy the argument parser with
doca_argp_destroy()
.Tear down logging backends to clean up logging resources and ensure a clean exit.
Verify resource cleanup
Perform a final validation step to confirm:
All memory allocations are freed.
All contexts are destroyed.
No background threads or pending operations remain active.
Check for potential memory leaks or orphaned resources.
Key Points
Graceful Shutdown – Ensure in-flight operations are completed before shutting down to avoid data corruption or loss.
Ordered Cleanup – Resources must be released in the reverse order of their creation, respecting dependencies between components.
Signal Handling – Proper handling of external termination signals ensures orderly shutdown procedures are triggered.
Thread Synchronization – Worker threads must be fully stopped and joined before deallocating shared resources.
Resource Tracking – A systematic cleanup approach is critical to prevent resource leaks and ensure system stability after termination.
Error Handling During Cleanup – Cleanup operations should proceed even if individual cleanup steps encounter errors, ensuring best-effort resource release.
State Verification – Confirm that all contexts and resources have been freed and no residual state remains.
Backend Cleanup – NFS backend connections and contexts must be properly closed to prevent lingering open connections.
Device Visibility – Devices should be made invisible to the host before initiating cleanup to avoid host-side errors.
Timeout Handling – Ensure that cleanup operations have timeouts to prevent the application from hanging during shutdown.
Dependency Management – Cleanup operations must respect dependencies, ensuring no premature deallocation of dependent resources.
Memory Leak Prevention – All dynamically allocated memory (DMA buffers, request contexts, etc.) must be freed.
Hardware Resource Release – All device handles, representors, and associated contexts must be properly closed to return hardware resources to the system.
Framework Cleanup – DOCA framework resources must be de-initialized to restore the system to a clean state.
Logging and Monitoring – Shutdown progress and any errors during cleanup should be logged to aid in debugging and post-mortem analysis.
The following table summarizes the FUSE operations supported by the virtio-fs reference application, along with their implementation status and brief descriptions:
FUSE Operation | Status | Description |
LOOKUP | ✅ Supported | Look up a directory entry by name and return file attributes. |
FORGET | ✅ Supported | Forget an inode (decrease reference count). |
GETATTR | ✅ Supported | Get file attributes (similar to |
SETATTR | ✅ Supported | Set file attributes ( |
READLINK | ❌ Not Implemented | Read the target of a symbolic link. |
SYMLINK | ❌ Not Implemented | Create a symbolic link. |
MKNOD | ✅ Supported | Create a file node (regular file, device file, etc.). |
MKDIR | ✅ Supported | Create a directory. |
UNLINK | ✅ Supported | Remove a file. |
RMDIR | ❌ Not Implemented | Remove a directory. |
RENAME | ❌ Not Implemented | Rename or move a file or directory. |
LINK | ❌ Not Implemented | Create a hard link to a file. |
OPEN | ✅ Supported | Open a file and return a file handle. |
READ | ✅ Supported | Read data from a file. |
WRITE | ✅ Supported | Write data to a file. |
STATFS | ❌ Not Implemented | Retrieve filesystem statistics (disk space, etc.). |
RELEASE | ✅ Supported | Close a file handle. |
FSYNC | ❌ Not Implemented | Synchronize file contents to storage. |
SETXATTR | ❌ Not Implemented | Set extended file attributes. |
GETXATTR | ❌ Not Implemented | Get extended file attributes. |
LISTXATTR | ❌ Not Implemented | List extended file attributes. |
REMOVEXATTR | ❌ Not Implemented | Remove extended file attributes. |
FLUSH | ❌ Not Implemented | Flush cached data (called on file close). |
INIT | ✅ Supported | Initialize filesystem connection. |
OPENDIR | ✅ Supported | Open a directory for reading. |
READDIR | ✅ Supported | Read directory entries. |
RELEASEDIR | ✅ Supported | Close a directory handle. |
FSYNCDIR | ❌ Not Implemented | Synchronize directory contents. |
GETLK | ❌ Not Implemented | Test for file locks. |
SETLK | ❌ Not Implemented | Set non-blocking file locks. |
SETLKW | ❌ Not Implemented | Set blocking file locks. |
ACCESS | ❌ Not Implemented | Check file access permissions. |
CREATE | ❌ Not Implemented | Atomically create and open a file. |
INTERRUPT | ❌ Not Implemented | Interrupt a pending operation. |
BMAP | ❌ Not Implemented | Map file block to device block. |
IOCTL | ❌ Not Implemented | Device-specific input/output control. |
POLL | ❌ Not Implemented | Poll for I/O readiness. |
FALLOCATE | ❌ Not Implemented | Preallocate space for a file. |
DESTROY | ❌ Not Implemented | Cleanup filesystem connection. |
NOTIFY_REPLY | ❌ Not Implemented | Reply to kernel notification. |
BATCH_FORGET | ❌ Not Implemented | Batch version of FORGET. |
READDIRPLUS | ✅ Supported | Read directory entries with attributes. |
RENAME2 | ❌ Not Implemented | Extended rename with flags. |
COPY_FILE_RANGE | ❌ Not Implemented | Copy data between files efficiently. |
SETUPMAPPING | ❌ Not Implemented | Set up DAX (Direct Access) memory mapping. |
REMOVEMAPPING | ❌ Not Implemented | Remove DAX memory mapping. |
SYNCFS | ❌ Not Implemented | Synchronize entire filesystem. |
LSEEK | ❌ Not Implemented | Seek to a position within a file. |
TMPFILE | ❌ Not Implemented | Create an unnamed temporary file. |
STATX | ❌ Not Implemented | Extended file status query. |