DOCA Argus Service Guide
NVIDIA's BlueField DPUs offer a cutting-edge technology for live machine introspection at the hardware level called DOCA Argus. This technology analyzes specific snippets of volatile memory directly, providing attested insights into the operation of various workloads, whether they are bare-metal, virtualized, or containerized. Volatile memory is considered the ground truth for understanding workload operations. Privacy is a fundamental design requirement, ensuring that no user data is accessed. This unique security technology is available as part of the DOCA software framework, both as an SDK and as a service.
DOCA Argus Service for Workload Threat Detection is a novel approach for container threat detection in AI workloads and microservices, utilizing a Bluefield DPU to perform live machine introspection at the hardware level. This approach analyzes specific snippets of volatile memory to provide real-time visibility into container activity and behavior at the network, host, and application levels.
The state of container node images is continuously monitored in real-time, checking for deviations from their secure, compliant versions and configurations to detect and stop runtime attacks. These insights also include the ability to identify attacks targeting network facing applications/services.
The Argus service provides events and data on any object on the OS (host/VM) without any configuration needed and without any active part from the user or the host.
Examples what Argus service provides:
Any new processes with its PID, name, attributes, and status.
Reverse shells with process and network connection details such as source & destination IP and number of transferred bytes.
SHA256 hash of running executable and loaded libraries
The DOCA Argus service can only operate on DPU targets (NVIDIA BlueField-2 and later models).
The service must run with the DPU configured in DPU mode (as described in BlueField Modes of Operation)
A firmware version of 24.35.0388 or later is required for the service to function properly.
Supported BlueField image versions are 4.11.0 or later.
The DOCA Argus service container must be set to privileged mode to enable DMA reads across the entire host system.
The Argus service has been tested exclusively on KVM hypervisors.
Currently, only Linux OS is supported on both bare-metal and VMs, with Windows support planned for future releases.
Kata containers are only supported when NVIDIA-DPU support is activated.
The service currently supports only x86 architectures in 64-bit mode, with AARCH64 support planned for future releases
Only 4-level paging is supported at present, though 5-level paging support in development. For a workaround regarding 5-level paging, refer to Section Prerequisites (4.3).
Configure the NVIDIA BlueField networking platform's (DPU or SuperNIC) firmware.
On BlueField, configure the PF base address register. Replace
mst_device
withmt41686_pciconf0
for BF2 andmt41692_pciconf0
for BF3.dpu> mlxconfig -d /dev/mst/<mst_device> s PF_BAR2_SIZE=2 PF_BAR2_ENABLE=1
If working with VFs, configure NVME emulation, SR-IOV, and the number of VFs:
dpu> mlxconfig -d /dev/mst/<mst_device> s NVME_EMULATION_ENABLE=1 SRIOV_EN=1 NUM_OF_VFS=<vf-number>
Perform a graceful shutdown and a cold boot from the host. You can combine this step with the reboots in steps 2.d/3.d.
Verify the configurations using the following command after the cold boot:
dpu> mlxconfig -d /dev/mst/<mst_device> q | grep -E "NVME|BAR|SRIOV|NUM_OF_VFS"
Perform IOMMU passthrough. This stage is only necessary if IOMMU is not enabled by default (for example, when the host is using an AMD CPU).
Skip this step if you are not sure whether it is necessary. Return to it only if DMA fails with a message similar to the following in
dmesg
:host> dmesg
[ 3839.822897] mlx5_core 0000:81:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0047 address=0x2a0aff8 flags=0x0000]
Locate your OS's grub file (most likely
/boot/grub/grub.conf
,/boot/grub2/grub.cfg
, or/etc/default/grub
) and open it for editing. Run the following command:host> vim /etc/default/grub
Search for the line defining
GRUB_CMDLINE_LINUX_DEFAULT
and set the IOMMU flags according to your CPU vendor. For example:GRUB_CMDLINE_LINUX_DEFAULT="iommu=pt <intel/amd>_iommu=on"
Update the GRUB configuration:
For Ubuntu:
host> sudo update-grub
For CentOS:
host> grub2-mkconfig -o /boot/grub2/grub.cfg
Reboot the system to apply the changes.
Argus supports 4-level virtual memory layers. Therefore, it is necessary to check if the target system uses a 5-level virtual memory layer and if so, adjust the configuration:
Check the virtual memory layer:
grep la57 /proc/cpuinfo
If the
la57
flag appears in the output, the target system is using a 5-level virtual memory layer and the 5-level paging must be deactivated.flags : ... la57 ...
To deactivate 5-level paging, modify the kernel parameters using GRUB. Add the following flag to your GRUB configuration:
GRUB_CMDLINE_LINUX_DEFAULT="no5lvl"
Update GRUB configuration:
For Ubuntu:
host> sudo update-grub
For CentOS:
host> grub2-mkconfig -o /boot/grub2/grub.cfg
Reboot the system to apply the changes.
Prepare the target. The service should automatically detect the target system config files. If not, you can manually set the targets:
Download target system (host/VM) symbols.
For Ubuntu:
host> sudo tee /etc/apt/sources.list.d/ddebs.list << EOF
deb http://ddebs.ubuntu.com/ $(lsb_release -cs) main restricted universe multiverse
deb http://ddebs.ubuntu.com/ $(lsb_release -cs)-updates main restricted universe multiverse
deb http://ddebs.ubuntu.com/ $(lsb_release -cs)-proposed main restricted universe multiverse
EOF
host> sudo apt install ubuntu-dbgsym-keyring
host> sudo apt-get update
host> sudo apt-get install linux-image-$(uname -r)-dbgsym
For CentOS/RHEL:
host> yum install --enablerepo=base-debuginfo kernel-devel-$(uname -r) kernel-debuginfo-$(uname -r) kernel-debuginfo-common-$(uname -m)-$(uname -r)
Install DOCA on the target system or copy from doca tools doca_apsh_config.py
Create the JSON files. Run the following commands using Python 3.9:
target-system> cd /opt/mellanox/doca/tools/target-system> pip3 install psutil pdbparse
target-system> python3 doca_apsh_config.py --files memregions symbols --os <windows/linux> --path <path to dwarf2json executable>
target-system> cp /opt/mellanox/doca/tools/*.* <shared-folder-with-baremetal>
dpu> scp <shared-folder-with-baremetal>/* <path-to-app-shield-binary>
If the target system does not have DOCA installed, you can copy the script from BlueField.
dwarf2json
is required, but not provided with DOCA. You can download the latest release from GitHub.If the kernel receives an update, you may need to rerun this step.
For information about the deploying DOCA containers on top of the BlueField DPU, refer to the NVIDIA DOCA Container Deployment Guide.
Service-specific configurations and deployment instructions can be found under the service's container page.
The DOCA Argus Service can also be deployed on DPUs that are not connected to the Internet. For instructions, refer to the relevant section in the NVIDIA DOCA Container Deployment Guide.
The Argus service inspects the AI node memory on an operating system, whether it's a bare-metal or virtual machine, that includes NIMs, AI container workloads, and microservices. This inspection is performed from within the Bluefield DPU using DOCA DMA. The service then repeatedly uses the Memory Query Engine with DOCA Argus Libraries to decode the memory into OS objects such as processes, threads, and files.
Once the memory is decoded, the Argus service applies Behavioral Profiles and Indicators to detect any alerts (refer to the list of alerts below). Additionally, whenever there is a change on the host, the Argus service triggers Situational Awareness and reports these changes using JSON-based FluentBit Telemetry to send events (refer to the list of alerts below). All the collected data can be transmitted to a Security Management or Data Lake system to trigger an automated response.
Furthermore, the Argus service can integrate with remote data processing logics, both in-house (NVIDIA) and third-party, to enhance the data and verify threats.

Argus is configurable via the SERVICE_CONFIG_FILE
section in the container's YAML file. Edit the configuration according to your deployment.
Service
Immediate Shutdown
Do not wait for service to gracefully shutdown; close it immediately upon a SIGINT/SIGTERM signal.
Service Log Level
Configures DOCA Argus service log verbosity, based on DOCA's logging levels. The default is 50 (INFO). Options include:
10=DISABLE
20=CRITICAL
30=ERROR
40=WARNING
50=INFO
60=DEBUG
70=TRACE
System Scanner Sleep Time
Configures the sleep duration between system scans (s=seconds, m=minutes, ms=milliseconds)
DOCA Argus Configuration
Auto Scan
Enable scanning of all systems that can be monitored, in addition to the systems that are configured in the "systems" section (see below). All auto-scanned systems will use the default configurations. Auto Scan mode is the default if the systems section is left empty.
Default
Default system configurations that will be used if not overwritten in the "systems" section. For the list of possible default configurations, please look at "Per Systems Configurations" section below.
Systems
List of user-defined systems (Host/VM) that that will be scanned while overwriting some of their configurations . For a list of default configurations, refer to the "Per Systems Configurations" section. The "Representor ID" and "DMA Device Name" configurations must be overwritten.
Per-System Configurations
Representor ID
Representor ID of the VF/PF the service needs to track, currently only accepts VU.
For PF, find the VU ID by running the following on the baremetal:
host> lspci -vv -s <PF_pci_address> | grep VU | cut -d " " -f 4
To list PCI addresses of available PFs, run:
host> lspci | grep "Ethernet controller: Mellanox Technologies"
Ensure that you run the
lspci
command on the host and not on BF. Running thelspci
command on BF will cause an addition of "EC" in the middle of the VU ID string. For example, MT2333XZ06YAMLNXS0D0F0 on the host will appear as MT2333XZ06YAECMLNXS0D0F0 on the DPU.For VF, take the VU ID of the PF that this VF is attached to, and add the suffix "VF<x>", where <x> is the number of the VF. for example, "MT2333XZ06YAMLNXS0D0F0VF1" is the VU ID of VF number 1.
Memory Regions Path
Path to a JSON file that contains the memory regions of the host OS, excluding devices. For more information, refer to doca_apsh_system in the DOCA App Shield Programming Guide.
OS Symbol Path
Path to the OS symbol manifest, which can be a single JSON file or a folder containing multiple JSON files. For more information, refer to doca_apsh_system in the DOCA App Shield Programming Guide.
OS Type
Specifies the type of operating system (Linux or Windows).
DMA Device Name
The name of the DMA (Direct Memory Access) device to connect to, which is related to the representor ID. To list available devices, run the following command on the DPU:
DPU> ibv_devinfo | grep 'hca_id' | awk '{print $2}'
Make sure to match the device with the VU ID of the relevant system. Typically, the last number in the VU ID string matches the number of the device. For example, MT2333XZ06YAMLNXS0D0F0 would match mlx5_0.
Service Log Level
Configures and overrides service log verbosity, based on DOCA's logging levels. The default is 50 (INFO). Options include :
10=DISABLE
20=CRITICAL
30=ERROR
40=WARNING
50=INFO
60=DEBUG
70=TRACE
SDK Log Level
Configures SDK log verbosity, based on DOCA's logging levels. The default is 50 (INFO). Options include:
10=DISABLE
20=CRITICAL
30=ERROR
40=WARNING
50=INFO
60=DEBUG
70=TRACE
Limits
String Length
Limits the length of strings (for example, process command names) that can be tracked to avoid excessive resource usage.
Process
Limits the number of processes that can be tracked to prevent endless or long loops.
File Handles
Limits the number of file descriptors (for example, files opened by processes, connections) that can be tracked.
Threads
Limits the number of threads that can be tracked per process.
Process Memory
Limits the number of memory areas (VMAs) that can be tracked per process.
Events
Container Filter
Filters for activities within the container. Non-containerized processes are not filtered.
SBOM
List SHA signatures of authorized executables and loaded libraries. Each signature is a separate line. You can optionally include the size of each file, separated by a comma (for example,
<SHA>, <size>
)Containers: specifies signatures for containerized processes.
Non-containers: specifies signatures for non-containerized processes.
Collection
Events
Flag to enable each event.
Output
Log Events to stdout
Logs events to the standard output (stdout).
Log Folder Path
Path to the folder where logs are stored. Set to false to turn off logging for a folder.
Log Threshold Size
Sets the size limit for log rotation. When the log file reaches this size, it will be rotated.
Log Max Files Count
Sets the maximum retention number for log files. Older log files will be deleted when this limit is reached.
Telemetry Address
Address of aggregator for telemetry records. Set to false to turn off telemetry logging.
Telemetry Tag
Tag that will be added to each telemetry record for Fluent-Bit integration
Telemetry Format
Format in which the telemetry record will be sent. Options include JSON and syslog.
Telemetry User Data
User configured data that would be added to each record
Standard Output
Only important logs will be displayed, for example version, successful starts, and failures.
Debug Log Output
A complete log output for debugging, including events (partial data), trace logs, collection failures, etc. It is located in the /var/log/doca_argus/
directory.
Event Log Output
A complete event log is stored in JSON format in the log folder path (config file). For users who prefer local logs, the log is rotated as specified with Linux logrotate
. You can overwrite the logrotate
configuration in /etc/cron.d/logrotate
and /etc/logrotate.d/argus
.
Telemetry Output
The Argus service provides telemetry records in JSON and syslog formats. It has been tested with Fluent Bit integration, which should be run independently. Fluent Bit is integrated with the Argus service telemetry system to handle telemetry data exports. This integration ensures that logs and metrics are efficiently collected and forwarded to the proper destinations for analysis and monitoring. For the integration, a local node of Fluent Bit was run on the DPU with the Argus service, using the following input section in the Fluent Bit configuration:
[INPUT]
Name tcp
Tag <your prefered tag>
Listen 0.0
.0.0
Port 24224
Format json
The Tag
should correlate to telemetry_tag
in the in the service config. By default, telemetry is disabled in the service. To enable telemetry, specify an address in the service config.
If you are using Splunk, add the following encapsulation filter to the Fluent Bit config file:
[FILTER]
Name nest
Match *
Operation nest
Wildcard *
Nest_under event
Fluent Bit is flexible and can be run in many ways. The following example displays a basic Fluent Bit configuration that integrates with Elasticsearch, followed by the container's run
command:
[INPUT]
Name tcp
Tag elastic_forward_input
Listen 0.0
.0.0
Port 24224
Format json
[SERVICE]
Log_Level info
[OUTPUT]
Name es
Match *
Host <elastic search IP>
Port <elastic search port>
Index argus
Suppress_Type_Name On
Log_Level info
docker run --rm --net=host -v <path_to_fluentbit_conf_file>:/fluent-bit/etc/fluent-bit.conf --name fluent_bit -it fluent/fluent-bit
Refer to the Fluent Bit manual for additional outputs.
Parameter | Data Type | Parent Object | Description |
| |||
| enum |
| NVIDIA |
| enum |
| DOCA_ARGUS |
| string |
| |
| enum |
| Can be either |
| enum |
| The severity of the event/alert/system activity |
| string |
| As the schema may evolve over time, allow the message to describe which format version it is using |
| string |
| Allows to uniquely identify a message, and to update it (when supporting additional functionality) |
| integer |
| The timestamp of when the message occurred, in UTC milliseconds |
| string |
| The local display time of when the message occurred, in RFC3339 format |
| string |
| The UTC display time of when the message occurred, in RFC3339 format |
| string |
| Configured user data |
|
| Information about the BlueField system | |
| array |
| A list of all Bluefield configured interfaces, their names, IP addresses, and MAC addresses |
| string |
| The interface name |
| string |
| The interface MAC address |
| string |
| The interface IP address |
|
| Information about the system | |
| string |
| Unique ID of target system, system name in systems section of configuration or VUID of system in case of auto-scanned system |
| string |
| The OS of the workload. "Linux Kernel x.y”; “Microsoft Windows major.minor.build” |
| array |
| A list of all workload interfaces, their names, IP addresses, and MAC addresses |
| string |
| The interface name |
| string |
| The interface MAC address |
| string |
| The interface IP address |
|
| Details about the activity reported | |
| string |
| The name of the alert/event/system activity |
| object |
| Detailed information of collector that triggered event or alert |
| object |
| Detailed information about parent activities that triggered the collection of current activity. |
The following example is a JSON message that describes the data that produced for each event and alert:
{
"vendor_name"
: "NVIDIA"
,
"product_name"
: "DOCA_ARGUS"
,
"product_version"
: "<version>"
,
"message_type"
: "<EVENT | ALERT | SYSTEM_ACTIVITY>"
,
"severity"
: "<INFO | ERROR | WARNING | MEDIUM | HIGH | CRITICAL>"
,
"schema_version"
: "1.0"
,
"message_id"
: "<unique_message_id>"
,
"occurred_message_timestamp_utc_ms"
: "14367294690321"
,
"occurred_message_display_time_local_rfc3339"
: "2025-04-10T16:50:03.836+00:00"
,
"occurred_message_display_time_utc_rfc3339"
: "2025-04-10T16:50:03.836Z"
,
"user_data"
: "NONE"
,
"bluefield_system_information"
: {
"bluefield_networking_interfaces"
: {
"0"
: {
"bluefield_network_interface_name"
: "<>"
,
"bluefield_network_interface_mac_address"
: "<>"
,
"bluefield_network_interface_ip_address"
: "<>"
},
"..."
}
},
"workload_information"
: {
"unique_identifier"
: "<>"
,
"os_version"
: "<>"
,
"workload_networking_interfaces"
: {
"0"
: {
"network_interface_name"
: "<>"
,
"network_interface_ip_address"
: "<>"
,
"network_interface_mac_address"
: "<>"
},
"..."
}
},
"activity_data"
: {
"name"
: "<the name of the event | alert | system_activity>"
,
"<activity>_details"
: {
"..."
},
"<parent_activity>_details"
: {
"..."
}
}
}
Processes
Comm: The command name of the process.
PID: The unique process identifier.
Self Exec ID: Identifier for the process's own execution.
Process SHA256: SHA256 hash of the process's executable file.
Process SHA1: SHA1 hash of the process's executable file.
Process MD5: MD5 hash of the process's executable file.
File Size: Size of the process's executable file.
Folder Path: Path to the folder containing the process's executable.
Cmd Line Args: Command line arguments used to start the process.
Creation Time: Time when the process was created.
PPID: Parent process identifier.
UID: User identifier of the process owner.
GID: Group identifier of the process owner.
State: Current state of the process.
CPU Cycles: Number of CPU cycles consumed by the process.
Container ID: Identifier of the container running the process.
PID Namespace: Namespace for process identifiers.
MNT Namespace: Namespace for mount points.
NET Namespace: Namespace for network resources.
Thread
Thread ID: The unique thread identifier.
Self Exec ID: Identifier for the thread's own execution.
Exit State: the thread's exit state.
MM virtual address: Pointer to the thread's MM structure.
File Handles
PID: The associated process's identifier.
File Descriptor: The file descriptor identifier.
Network Connections
FD: File descriptor which associated with the socket.
State: Current state of the network connection.
TCP Creation Time: Time when the TCP connection was created.
Protocol: Network protocol used (e.g., TCP, UDP).
Src IP: Source IP address.
Src Port: Source port number.
Dst IP: Destination IP address.
Dst Port: Destination port number.
TCP Bytes In: Number of bytes received.
TCP Bytes Out: Number of bytes sent.
TCP Segments In: Number of TCP segments received.
TCP Segments Out: Number of TCP segments sent.
Interface Name: Name of the network interface.
Interface MAC: List of MAC addresses of the network interface.
Interface IP: List of IP addresses of the network interface.
Average In Packet Size: Average size of in packets transmitted.
Average Out Packet Size: Average size of out packets transmitted.
Process Memory
PID: the associated process's identifier.
VM Start: Start address of the virtual memory area.
VM End: End address of the virtual memory area.
VM Next: Pointer to the next virtual memory area structure.
VM Prev: Pointer to the previous virtual memory area structure.
VM Protection: Protection associated with the virtual memory area.
Anon VMA address: Pointer of anonymous virtual memory area.
VM File virtual address: Pointer to File structure associated with the virtual memory area.
Is Executable: is VMA the main process executable.
File Path: Path to the file associated with the virtual memory area.
Loaded Executables and Libraries (Attestation)
Inode Num: Inode number of the ELF file.
ELF Name: Name of the ELF file.
ELF Type: Type of the ELF file.
ELF Full Path: Full path to the ELF file.
ELF SHA256: SHA256 hash of the ELF file.
ELF SHA1: SHA1 hash of the ELF file.
ELF MD5: MD5 hash of the ELF file.
ELF Size: Size of the ELF file.
Is Executable: is main process's executable.
Container
Container Created
Enables detection of new containers (for example, Docker containers).
Container Terminated
Enables detection of containers that have terminated or are no longer visible.
Process
Process Created
Enables detection of new processes.
Process Terminated
Enables detection of processes that have terminated or are no longer visible.
Process Zombie
Enables detection of processes in a zombie state.
Process Hidden
Enables detection of processes in a hidden state.
File Handle
File Handle Created
Enables detection of new file descriptors (for example, files opened by processes).
File Handle Terminated
Enables detection of file descriptors that have terminated or are no longer visible.
Network Connection
Network Connection Created
Enables detection of new network connections.
Network Connection Terminated
Enables detection of network connections that have terminated or are no longer visible.
TCP Connection Excessive Data
Enables detection of network connections with excessive data usage.
TCP Connection Excessive Data In Limit
Sets the limit for excessive data usage for incoming data in bytes (K=kilo, M=mega, G=giga).
TCP Connection Excessive Data Out Limit
Sets the limit for excessive data usage for outgoing data in bytes (K=kilo, M=mega, G=giga).
TCP Long Lasting Connection
Enables detection of network connections that last longer than a specified limit.
TCP Long Lasting Connection Limit
Sets the limit for the duration of long-lasting connections (s=seconds, m=minutes, ms=milliseconds, default is ms).
TCP Network Connections State Change
Enables detection of changes in the state of network connections (for example, SYNSENT, SYNRECV).
Reverse Shell Detected
Enables detection of reverse shells (for example, remote bash consoles).
Process Memory (VMA)
Process Memory Created
Enables detection of new virtual memory areas (for example, heap, stack, executables).
Process Memory Terminated
Enables detection of virtual memory areas that have terminated or are no longer visible.
New Executable Anonymous Memory Mapped
Enables alerts on mapping executable anonymous memory areas.
File Unmapped
Enables alerts on un-mapping of file backed executable areas.
Executable Permissions Added
Enables alerts on enablement of executable permission for a memory area.
Executable Permissions Removed
Enables alerts on disablement of executable permission for a memory area.
Loaded Executables and Libraries (Attestation)
New File Mapped
Enables detection of new ELF files loaded by processes (for example, executables, libraries).
Foreign Binary Executed
Enables detection of executable programs that are not part of the Software Bill of Materials (SBOM).
Foreign Binary Loaded
Enables detection of loaded libraries that are not part of the Software Bill of Materials (SBOM).
Thread
Thread Created
Enables detection of new threads.
Thread Terminated
Enables detection of threads that have terminated or are no longer visible.
DOCA Argus Service Initialization Started
Logged upon start of DOCA Argus service initialization process for specific system.
DOCA Argus Service Initialization Successful
Logged upon successful finish DOCA Argus service initialization process for a specific system.
DOCA Argus Service Initialization Failed
Logged upon Argus service initialization failure for specific system. When reported, Argus service does not monitor this system until the next initialization.
DOCA Argus Service Runtime Failure
Logged upon service runtime failure for specific system. When reported, Argus service does not monitor this system until the next initialization.
DOCA Argus Service Gracefully Shutdown
Logged upon user requested shutdown; reported for each system.
Details Gathering Failed
Logged upon failure in one of the event engines.
Host Initialization Started
Logged upon start of host detection.
Host Initialization Failed
Logged upon failure of host detection.
Host Initialization Successful
Logged upon success of host detection.
Loading Profile Candidate
Logged upon detection of candidate for system's OS profile.
Profile Parsing Failed
Logged upon failure to parse a specific OS profile configuration.
Profile Verification Failed
Logged upon failure to verify initialization with a specific profile candidate; will continue to subsequent profile candidate.
Profile Verification Successful
Logged upon success to verify initialization with specific profile candidate.
OS Identifier Found
Logged when host OS version is found.
Unable To Determine Target OS
Logged upon failure in automatic OS detection.
No Matching Profile Found
Logged upon failure to find matching OS profile.