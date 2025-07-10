On This Page
DOCA DPACC Compiler
This document describes DOCA DPACC compiler and instructions about DPA toolchain setup and usage.
DPACC is a high-level compiler for the DPA processor which compiles code targeted for the data-path accelerator (DPA) processor into a device executable and generates a DPA program.
The DPA program is a host library with interfaces encapsulating the device executable. This DPA program is linked with the host application to generate a host executable. The host executable can invoke the DPA code through FlexIO runtime API.
DPACC uses DPA compiler (
dpa-clang) to compile code targeted for DPA. dpa-clang is part of the DPA toolchain package which is an LLVM-based cross-compiling bare-metal toolchain. It provides Clang compiler, LLD linker targeting DPA architecture, and other utilities.
Glossary
Term
Definition
Device
DPA as present on the BlueField DPU
Host
CPU that launches the device code to run on the DPA
Device function
Any C function that runs on the DPA device
DPA global function
Device function that is the point of entry when offloading any work on DPA
Host compiler
Compiler used to compile the code targeting the host CPU
Device compiler
Compiler used to compile code targeting the DPA
Fatbinary
File that contains code for multiple target DPA architectures
DPA program
Host library that encapsulates the DPA device executable (
Offloading Work on DPA
To invoke a DPA function from host, the following things are required:
DPA device code – C programs, targeted to run on the DPA. DPA device code may contain one or more entry functions.
Host application code – the corresponding host application. Refer to DPA Subsystem for more details.
Runtime – FlexIO or DOCA DPA library provides the runtime
The generated DPA program, when linked with a host application results in a host executable which also contains the device executable. The host application oversees loading the device executable on the device.
DPACC Predefined Macros
DPACC predefines the following macros:
Macro
Description
Defined when compiling device code file
Defined to the target DPA hardware identifier macros
See Architecture Macros for more details.
Defined to the major version number of DPACC
Defined to the minor version number of DPACC
Defined to the patch version number of DPACC
Writing DPA Applications
DPA device code is a C code with some restrictions and special definitions.
FlexIO or DOCA-DPA APIs provide interfaces to DPA.
Language Support
The DPA is programmed using a subset of the C11 language standard. The compiler documents any constructs that are not available. Language constructs, where available, retain their standard definitions.
Restrictions on DPA Code
Use of C thread local storage is not allowed for any variables
Identifiers with
_dpacc/__dpaccprefix are reserved by the compiler. Use of such identifiers may result in an error or undefined behavior
DPA processor does not have native floating-point support; use of floating point operations is disabled
DPA RPC Functions
A remote procedure call function is a synchronous call that triggers work in DPA and waits for its completion. These functions return a type
uint64_t value. They are annotated with a
__dpa_rpc__ attribute.
DPA Global Functions
A DPA global function is an event handler device function referenced from the host code. These functions do not return anything. They are annotated with a
__dpa_global__ attribute.
For more information, refer to DPA Subsystem documentation.
Characteristics of Annotated Functions
Global functions must have
voidreturn type and RPC functions must have
uint64_treturn type
Annotated functions cannot accept C pointers and arrays as arguments (e.g.,
void my_global (int *ptr, int arr[]))
Annotated functions cannot accept a variable number of arguments
Inline specifier is not allowed on annotated functions
Handling User-defined Data Types
User-defined data types, when used as global function arguments, require special handling. They must be annotated with a
__dpa_global__ attribute.
If the user-defined data type is
typedef'd, the
typedef statement must be annotated with a
__dpa_global__ attribute along the data type itself.
Characteristics of Annotated Types
They must have a copy of the definition in all translation units where they are used as global function arguments
They cannot have pointers, variable length arrays, and flexible arrays as members
Fixed-size arrays as C structure members are supported
These characteristics apply recursively to any user-defined/
typedef'd types that are members of an annotated type
DPACC processes all annotated functions along with annotated types and generates host and device interfaces to facilitate the function launch.
DPA Intrinsics
DPA features such as fences and processor-specific instructions are exposed via intrinsics by the DPA compiler. All intrinsics defined in the header file
dpaintrin.h are guarded by the
DPA_INTRIN_VERSION_USED macro. The current
DPA_INTRIN_VERSION is
1.3.
Example:
#define DPA_INTRIN_VERSION_USED (DPA_INTRIN_VERSION(1, 3))
#include <dpaintrin.h>
…
__dpa_thread_writeback_window(); // Fence for write barrier
For more information, refer to DPA Subsystem documentation.
Package
Instructions
Host compiler
Compiler specified through
Note
Minimum supported version for clang as hostcc is
Device compiler
The default device compiler is the "DPA compiler". Installing the DPACC package also installs the DPA compiler binaries
Note
FlexIO SDK and C library
Available as part of the DOCA software package. DPA toolchain does not provide C library and corresponding headers. Users are expected to use the C library for DPA from the FlexIO SDK.
DPACC Inputs and Outputs
DPACC can produce DPA programs in a single command by accepting all source files as input. DPACC also offers the flexibility of producing DPA object files or libraries from input files.
DPA object files contain both host stub objects (DPACC-generated interfaces) and device objects. These DPA object files can later be given to DPACC as input to produce the DPA library.
Phase
Option Name
Default Output File Name
Compile input device code files to DPA object files
Compile and link the input device code files/DPA object files, and produce a DPA program
No specific option
No default name, output file name must be specified
Compile and build DPA library from input device code files/DPA object files
No default name, output library name must be specified
DPACC can accept the following file types as input:
Input File Extension
File Type
Description
C source file
DPA device code
DPA object file
Object file generated by DPACC, containing both host and device objects
DPA object archive
An archive of DPA object files. User can generate this archive from DPACC-generated DPA objects.
Based on the mode of operations, DPACC can generate the following output files:
Output File Type
Input Files
DPA object file
C source files
DPA program
C source files, DPA object files, and/or DPA object archives
DPA library
(DPA host library and DPA device library)
C source files, DPA object files, and/or DPA object archives
The following provides the commands to generate different kinds of supported output file types for each input file type:
Input
Output
DPACC Command
C source file
DPA program
dpacc -hostcc=<cc> -mcpu=<targets> in.c -o libprog.a
DPA object
dpacc -hostcc=<cc> -mcpu=<targets> in.c -c
DPA library
dpacc -hostcc=<cc> -mcpu=<targets> in.c -o lib<name> -gen-libs
DPA object
DPA program
dpacc -hostcc=<cc> -mcpu=<targets> in.dpa.o -o libprog.a
DPA library
dpacc -hostcc=<cc> -mcpu=<targets> in.dpa.o -o lib<name> -gen-libs
DPA object archive
DPA program
dpacc -hostcc=<cc> -mcpu=<targets> in.a -o libprog.a
DPA library
dpacc -hostcc=<cc> -mcpu=<targets> in.a -o lib<name> -gen-libs
DPA Program
When invoked in compile-and-link mode,
dpacc produces a DPA program, which is a host library containing:
DPACC-generated host stubs – Used to register the DPA application and facilitate invocation of the DPA entry-point from the host application.
Device executable – Produced by compiling and linking input DPA device code.
The resulting DPA program library must be linked with a host application that uses the appropriate runtime APIs to load and execute the device code on the target hardware.
A DPA program may contain device executables for multiple hardware targets. These target-specific executables are packaged into a fatbinary container, which is embedded as a dedicated section within the host object that forms part of the host library.
DPA Object
When
dpacc is run in compile-only mode, it produces a DPA object file—a host object with a structure similar to that of a DPA program. It includes:
Host stubs – Generated by
dpaccto enable future integration with host code.
Device object – Compiled from the input DPA device code.
Like a DPA program, a DPA object may contain device objects for multiple hardware targets, encapsulated in a fatbinary container that is embedded within the host object file.
DPA Library
A DPA library is composed of two separate static archives:
DPA host library – Contains host interface objects that correspond to the device objects found in the DPA device library.
DPA device library – Contains device objects compiled from DPA device source code.
These libraries serve distinct roles:
The DPA device library is consumed by
dpaccduring DPA program generation.
The DPA host library can optionally be linked with other host-side code and redistributed.
The DPA device library is packaged as a specialized fatbinary container format, which itself contains archives of device objects for different targets.
The fatbinary container—illustrated by the dark-gray box in the associated diagram—is intended to be treated as an opaque object. It should not be manually extracted or modified by end users.
DPACC Trajectory
The following diagram illustrates DPACC compile-and-link mode trajectory.
Modes of Operation
In all modes described below,
dpacc accepts one or more target names via the
--mcpu option. The compiler then generates output that supports all specified targets, enabling multi-target deployment from a single build.
Compile-and-link Mode
This is a one-step mode that accepts C source files or DPA object files and produces the DPA program. Specifying the output library name is mandatory in this mode.
Example commands:
$ dpacc in1.c in2.c -o myLib1.a -hostcc=gcc -mcpu=nv-dpa-bf3 # Takes C sources to produce myLib1.a library which supports a single target - nv-dpa-bf3
$ dpacc in3.dpa.o in4.dpa.o -o myLib2.a -hostcc=gcc -mcpu=nv-dpa-bf3,nv-dpa-cx8 # Takes DPA object files to produce myLib2.a library which supports multiple targets - nv-dpa-bf3 and nv-dpa-cx8
$ dpacc in1.c in3.dpa.o -o myLib3.a -hostcc=gcc -mcpu=nv-dpa-bf3,nv-dpa-cx7,nv-dpa-cx8 # Takes C source and DPA object to produce myLib3.a library which supports multiple targets - nv-dpa-bf3, nv-dpa-cx7 and nv-dpa-cx8
Compile-only Mode
This mode accepts C source code and produces
.dpa.o object files. These files can be given to DPACC to produce the DPA program. The mode is invoked by the
--compile or
-c option.
The user can explicitly provide the output object file name using the
--output-file or
-o option.
Example commands:
$ dpacc -c input1.c -hostcc=gcc -mcpu=nv-dpa-cx7 # Produces input1.dpa.o which supports a single target - nv-dpa-cx7
$ dpacc -c input2.c -o myObj.dpa.o -hostcc=gcc -mcpu=nv-dpa-cx8,nv-dpa-cx7 # Produces myObj.dpa.o which supports multiple targets - nv-dpa-cx7 and nv-dpa-cx8
$ dpacc -c input3.c input4.c -hostcc=gcc -mcpu=nv-dpa-bf3,nv-dpa-cx7,nv-dpa-cx8 # Produces input3.dpa.o and input4.dpa.o which support multiple targets - nv-dpa-bf3, nv-dpa-cx7 and nv-dpa-cx8
Library Generation Mode
This mode accepts C source files or DPA object files and produces the DPA program. Specifying the output DPA library name is mandatory in this mode.
Example commands:
$ dpacc in1.c in2.c -o libdummy1 -hostcc=gcc -mcpu=nv-dpa-cx8 -gen-libs # Takes C sources to produce a DPA-Library (libdummy1_host.a and libdummy_device.a archives) which supports a single target - nv-dpa-cx8
$ dpacc in3.dpa.o in4.dpa.o -o libdummy2 -hostcc=gcc -mcpu=nv-dpa-cx8,nv-dpa-bf3 -gen-libs # Takes DPA object files to produce a DPA-Library (libdummy2_host.a and libdummy2_device.a archives) which supports multiple targets - nv-dpa-bf3 and nv-dpa-cx8
$ dpacc in1.c in3.dpa.o -o outdir/libdummy3 -hostcc=gcc -mcpu=nv-dpa-bf3,nv-dpa-cx7,nv-dpa-cx8 -gen-libs # Takes C source and DPA object to produce a DPA-Library (outdir/libdummy3_host.a and outdir/libdummy3_device.a archives) which supports multiple targets - nv-dpa-bf3, nv-dpa-cx7 and nv-dpa-cx8
To execute DOCA DPACC compiler:
Usage: dpacc <list-of-input-files> -hostcc=<path> -mcpu=<targets> [other options]
Helper Flags:
-h, --help Print help information about DPACC
-V, --version Print DPACC version information
-v, --verbose List the compilation commands generated by this invocation while also executing every command in verbose mode
-dryrun, --dryrun Only list the compilation commands generated by DPACC, without executing them
-keep, --keep Keep all intermediate files that are generated during internal compilation steps in the current directory
-keep-dir, --keep-dir Keep all intermediate files that are generated during internal compilation steps in the given directory
-optf, --options-file <file>,... Include command line options from the specified file
Mandatory Arguments
Flag
DPACC Mode
Description
List of one or more input files
All
List of C source files or DPA object file names. Specifying at least one input file is mandatory. A file with an unknown extension is treated as a DPA object file.
All
Specify the list of target DPA hardware for code generation. See DPA Hardware Architectures for more details.
Multiple target names can be specified through this option.
Supported values:
All
Specify the host compiler. This is typically the native compiler present on the host system.
Note
The host compiler used to link the host application with the DPA program must be link-compatible with the
Compile-and-link/library generation
Specify name and location of the output file.
Commonly Used Arguments
Use
--help option for a list of all supported options.
Flag
Description
Specify DPA application name for the DPA program. This option is required if multiple DPA programs are part of a host application because each DPA application must have a unique name. Default name is
Enable link-time optimization (LTO) for device code. Specify this option during compilation along with an optimization level in
Specify the list of options to pass to the device compiler.
Specify the list of options to pass during device linking stage.
Specify the list of device libraries including their names (in
Specify include search paths common to host and device code compilation. FlexIO headers paths are included by DPACC by default.
Specify name and location of the output file.
Specify the list of options to pass to the host compiler.
Generate a DPA library from input files
Link with DOCA-DPA libraries
Using machine dependent options through
-devicecc-options to influence compiler code generation is not supported. Examples of unsupported options through
-devicecc-options:
-mcpu,
-march,
-mabi.
The
devicecc-options option allows passing any option to the device compiler. However, passing options that prevent compilation of the input file may lead to unexpected behavior (e.g.,
-devicecc-options="-version" makes the device compiler print the version and not process input files).
Incompatible options that affect DPA global function argument sizes during DPACC invocation and host application compilation may lead to undefined behavior during execution (e.g., passing
-hostcc-options="-fshort-enums" to DPACC and missing this option when building the host application).
DPA Hardware Architectures
The table below outlines the supported DPA hardware architectures, including the corresponding values used with the compiler’s
--mcpu option, as well as the predefined macros the compiler uses to identify each architecture.
Hardware name
Value
Macro
ConnectX-7
BlueField-3
ConnectX-8
Since ConnectX-7 and BlueField-3 share the same DPA hardware architecture,
nv-dpa-cx7 is treated as an alias for
nv-dpa-bf3 by the compiler.
Link Compatibility
Only relocatable objects that are link-compatible can be linked together. If incompatible objects are detected during linking, the toolchain will emit an error.
To ensure successful linking, the linker's toolchain version must match the version of the compiler used to produce the input objects.
If two architectures, A and B, are link-compatible, and B is newer than A, then:
Valid: Objects built for A can be linked to produce an application targeting B.
Invalid: Objects built for B cannot be linked to build an application for A.
If two architectures 'A' and 'B' are link-compatible and 'B' is newer than 'A', objects built for target 'A' can be linked to build an app for target 'B'. However, the inverse i.e. linking objects built for target 'B' to build an app for target 'A' is not valid.
For example, BlueField-3/ConnectX-7 and ConnectX-8 are link-compatible. This means that objects built for BlueField-3/ConnectX-7 can be linked together to produce an application targeting ConnectX-8.
Architecture Macros
The compiler defines architecture identifier macros for each supported DPA hardware version, as listed in the DPA Hardware Architectures section. Each macro is assigned a unique integer value, where:
Newer DPA hardware generations are assigned strictly greater values than older ones.
Known aliases (e.g., BlueField-3 and ConnectX-7) share the same macro value.
During compilation, the macro
__NV_DPA is defined to reflect the current target architecture. This allows conditional compilation of device code based on the target hardware. For example:
#if __NV_DPA == __NV_DPA_BF3
// Code for Bluefield-3 here
#elif __NV_DPA > __NV_DPA_BF3
// Code for devices after Bluefield-3 here
#endif
The numeric ordering of architecture macros does not imply feature parity or progression. It is the developer's responsibility to ensure that hardware-specific features used in the code are actually supported by the target architecture.
LTO Usage Guidelines
Restrictions
Only the default linker script is supported with LTO
Using options
-fPIC/
-fpic/
-shared/
-mcmodel=largethrough
-devicecc-optionsis not supported when LTO is enabled
Fat bitcode objects containing both LLVM bitcode and ELF representation are not supported
Thin LTO is not supported
Compatibility
During compilation, LLVM generates the object as bitcode IR (intermediate representation) when LTO is enabled instead of ELF representation. The bitcode IR generated by the DPA compiler is only guaranteed to be compatible within the same version. The toolchain version of the compiler which builds the objects involved in link-time optimization (enabled with
-flto) and the toolchain version of the linker which performs LTO must be the same.
Deprecated Features
The '-ldpa' option which links with DOCA-DPA libraries is deprecated and will be removed in future releases. The option '-ldoca_dpa' is to be used instead of '-ldpa'.
Examples
This section provides some common use cases of DPACC and showcases the
dpacc command.
Building Libraries
This example shows how to build DPA libraries using DPACC. Libraries for DPA typically contain two archives, one for the host and one for the device.
dpacc input.c -hostcc=gcc -mcpu=nv-dpa-bf3 -o lib<name> -gen-libs -hostcc-options="-fPIC"
This command generates the output files
lib<name>_host.a and
lib<name>_device.a.
The host stub archive can be linked with other host code to generate a shared/static host library.
Generating a static host library:
ar x lib<name>_host.a # Extract objects to generate *.o ar cr lib<name>.a <*src.host.o> *.o # Generate final static archive with all objects
Generating a shared host library:
gcc -shared -o lib<name>.so <*src.host.o> -Wl,-whole-archive -l<name>_host -Wl,-no-whole-archive # Link the generated archive to build a shared library
Linking with DPA Device Library
The DPA device library generated by DPACC using
-gen-libs as part of a DPA library can be consumed by DPACC using the
-device-libs option.
dpacc input.c -hostcc=gcc -mcpu=nv-dpa-bf3 -o libInput.a -device-libs="-L <path-to-library> -l<libName>"
Enabling Link-time Optimizations
Link-time optimizations can be enabled using
-flto along with an optimization level specified for device compilation.
dpacc input1.c -hostcc=gcc -mcpu=nv-dpa-bf3 -c -flto -devicecc-options="-O2"
dpacc input2.c -hostcc=gcc -mcpu=nv-dpa-bf3 -c -flto -devicecc-options="-O2"
dpacc -mcpu=nv-dpa-bf3 input1.dpa.o input2.dpa.o -hostcc=gcc -o libInput.a
Including Headers
This example includes headers for device compilation using
devicecc-options and host compilation using
hostcc-options. You may also specify headers for any compilation on both the host and device side using the
-I option.
dpacc input.c -hostcc=gcc -mcpu=nv-dpa-bf3 -o libInput.a -I <common-headers-path> -devicecc-options="-I <device-headers-path>" -hostcc-options="-I <host-headers-path>"
Dumping Targets Supported by a Fatbinary File
The
dpa-fatbin tool can be used to list the target architectures supported by a fatbinary file. This is especially useful for inspecting device archives in a DPA library.
Examples:
dpa-fatbin --list libfoo_device.a
dpa-fatbin --list device_exec.fatbin
Dumping Target of Device ELF File
To identify the target architecture for which a device ELF file was built, use the
dpa-objdump tool:
Example:
dpa-objdump --file-headers foo.o
Generating Output as Source Code
dpacc supports a
--src-output option that generates the output as host-side C source code. This can be compiled using a standard host compiler (e.g.,
gcc) to produce the same output that
dpacc would normally emit directly.
The following are examples for each output type.
DPA Program Source
Generate DPA Program source:
dpacc input.c -hostcc=gcc -mcpu=nv-dpa-bf3 -o libfoo.c --src-output
Compile the generated source into an object file and archive it:
gcc libfoo.c -c -I /opt/mellanox/flexio/include \
-Wno-attributes -Wno-pedantic -Wno-unused-parameter -Wno-return-type -Wno-implicit-function-declaration \
-D__DPACC_SRC_TARGET__
ar cr libfoo.a libfoo.o
Define
__DPACC_SRC_TARGET__ to exclude unnecessary code when compiling from source.
DPA Library Source
Generate DPA Library source:
dpacc input.c -hostcc=gcc -mcpu=nv-dpa-bf3 -o libfoo --gen-libs --src-output
This produces:
libfoo_device.a– device archive
libfoo.lib.c,
input.dpa.c– host source files
Compile and archive the host library:
gcc libfoo.lib.c input.dpa.c -c -I /opt/mellanox/flexio/include \
-Wno-attributes -Wno-pedantic -Wno-unused-parameter -Wno-return-type -Wno-implicit-function-declaration \
-D__DPACC_SRC_TARGET__
ar cr libfoo_host.a libfoo.lib.o input.dpa.o
DPA Object Source
Generate DPA Object source:
dpacc input.c -hostcc=gcc -mcpu=nv-dpa-bf3 -c --src-output
This creates a single host source file
input.dpa.c. Compile it using:
gcc input.dpa.c -c -I /opt/mellanox/flexio/include \
-Wno-attributes -Wno-pedantic -Wno-unused-parameter -Wno-return-type -Wno-implicit-function-declaration
DPA Compiler Usage
The DPA Compiler is an LLVM-based backend used by
dpacc to compile and link DPA device code. Users can pass additional options to the underlying compiler and linker using the following
dpacc options:
--devicecc-options– pass options to the device compiler
--devicelink-options– pass options to the device linker
To find valid flags that can be passed through these options, refer to:
Directly invoking the compiler, assembler, or linker (outside of
dpacc) may result in unexpected errors or undefined behavior.
Linker options must be passed through the compiler driver (
dpa-clang), not directly to
lld.
Unlike
GNU ld, the LLD linker script does not replace the default configuration—it is applied in addition to it. To override default behaviors, additional flags may be required.
Enabling optional C library extensions using the
__STDC_WANT_LIB_EXT1__ macro is not supported in the DPA standard library.
dpacc-extract Command Line Options
dpacc-extract is a tool for extracting a device executable out of a DPA program or a host executable containing DPA program(s).
To execute
dpacc-extract:
Usage: dpacc-extract <input-file> -o=<output-file> [other options]
Helper Flags:
-o, --output-file Specify name of the output file
-app-name, --app-name <name> Specify name of the DPA application to extract
-mcpu, --mcpu <target> Specify name of the device for which the application is to be extracted
-h, --help Print help information about dpacc-extract
-V, --version Print dpacc-extract version information
-optf, --options-file <file>,... Include command line options from the specified file
Mandatory arguments:
Flag
Description
Input file
DPA program or host executable containing DPA program. Specifying one input file is mandatory.
Specify the name and location of the output device executable.
Specify the name of the DPA application to extract. Mandatory if input file has multiple DPA apps.
Specify the name of the device for which the application is to be extracted. Mandatory if there are multiple target variants for an app.
Objdump Command Line Options
The
dpa-objdump utility prints the contents of object files and final linked images named on the command line.
For more information, please refer to the Objdump command line reference.
Archiver Command Line Options
dpa-ar is a Unix ar-compatible archiver.
For more information, please refer to the Archiver command line reference.
NM Tool Command Line Options
The
dpa-nm utility lists the names of symbols from object files and archives.
For more information, please refer to the NM tool command line reference.
Miscellaneous Notes
Object files produced by LLD are not compatible with those generated by other linkers (e.g., GNU
ld). Mixing them may result in linker errors or runtime issues.
Ensure that your host application includes at least one reference to the device entry-point function defined in the DPA Program. Otherwise, the host linker may silently discard the DPA Program during linking, treating it as unused code.
Changes in DPACC 1.10.0
New Features
Support for fatbinaries where DPACC accepts multiple targets through
mcpuoption
Enforced linking policy where only compatible objects can be linked together
dpa-objdumpinfers the target automatically without the need to explicitly specify
mcpuoption
Set
FLEXIO_VER_USEDin generated host stubs
Support for new builtin
__dpa_thread_l1_flushon
nv-dpa-cx8target
Limitations
DPACC generates a warning about unknown target when building a DPA Library from DPA Objects produced by v1.9.0 or older