DOCA Argus Service Guide
NVIDIA® BlueField® DPUs introduce a cutting-edge security technology for live machine introspection at the hardware level, known as DOCA Argus. Argus enables real-time analysis of selected volatile memory regions, providing attested insights into the runtime behavior of workloads—whether running on bare metal, in virtual machines, or in containers. Because it inspects volatile memory, Argus delivers a ground-truth view of workload execution. Privacy is a foundational design principle: Argus does not access or expose user data.
Available as part of the DOCA software framework (both as an SDK and as a service), DOCA Argus Service for Workload Threat Detection provides a novel approach for detecting threats in AI workloads and microservices. Leveraging the BlueField DPU, it inspects system memory to expose container behavior at the network, host, and application layers in real time.
The service continuously monitors the container node image state for deviations from secure and compliant baselines to detect and prevent runtime attacks, including those targeting network-facing services.
The Argus service operates transparently without requiring host-side configuration. It passively inspects the system to report OS-level objects such as:
Newly spawned processes (PID, name, attributes, status)
Reverse shells, including process and network metadata (source/destination IPs, data transfer size)
SHA256 hashes of running executables and loaded libraries
Hardware – NVIDIA BlueField-2 or later
Operating mode – DPU must be in DPU mode (see BlueField Modes of Operation)
Firmware version – 24.35.0388 or later
Supported BlueField image – Version 4.11.0 or later
Container mode – The Argus service container must run in privileged mode to enable DMA reads across the host system
Only tested with KVM hypervisors
Linux-only support (for bare-metal and VMs); Windows support is planned
Kata containers supported only when NVIDIA-DPU support is enabled
Only x86_64 architecture is currently supported; AArch64 support is planned
Currently supports 4-level paging only (refer to section "Disable 5-Level Paging" for instructions)
Configure BlueField Firmware
Configure PF BAR settings and features on the DPU:
[dpu] mlxconfig -d /dev/mst/<mst_device> s PF_BAR2_SIZE=2
PF_BAR2_ENABLE=1
Replace <mst_device>
with:
mt41686_pciconf0
for BlueField-2mt41692_pciconf0
for BlueField-3
If using VFs:
[dpu] mlxconfig -d /dev/mst/<mst_device> s NVME_EMULATION_ENABLE=1
SRIOV_EN=1
NUM_OF_VFS=<vf-number>
Perform a cold reboot and verify:
[dpu] mlxconfig -d /dev/mst/<mst_device> s NVME_EMULATION_ENABLE=1
SRIOV_EN=1
NUM_OF_VFS=<vf-number>
Enable IOMMU Passthrough (Optional)
This step is necessary if DMA fails and the host is using an AMD CPU. Example error:
[dmesg] mlx5_core 0000
:81
:00.0
: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0047
address=0x2a0aff8
flags=0x0000
]
Steps:
Edit GRUB configuration:
[host] vim /etc/
default
/grubSet IOMMU flags for your CPU:
For Intel:
GRUB_CMDLINE_LINUX_DEFAULT=
"iommu=pt intel_iommu=on"
For AMD:
GRUB_CMDLINE_LINUX_DEFAULT=
"iommu=pt amd_iommu=on"
Update GRUB:
Ubuntu:
[host] sudo update-grub
CentOS/RHEL:
[host] grub2-mkconfig -o /boot/grub2/grub.cfg
Reboot the host.
Disable 5-Level Paging
Check whether the target system uses 5-level paging:
[host] grep la57 /proc/cpuinfo
If la57
is present, disable it:
Edit GRUB:
GRUB_CMDLINE_LINUX_DEFAULT=
"no5lvl"
Update GRUB:
Ubuntu:
[host] sudo update-grub
CentOS/RHEL:
[host] grub2-mkconfig -o /boot/grub2/grub.cfg
Reboot the host.
Prepare the Target System
Download kernel symbols:
Ubuntu:
sudo tee /etc/apt/sources.list.d/ddebs.list <<EOF deb http:
//ddebs.ubuntu.com/ $(lsb_release -cs) main restricted universe multiverse
deb http://ddebs.ubuntu.com/ $(lsb_release -cs)-updates main restricted universe multiverse
deb http://ddebs.ubuntu.com/ $(lsb_release -cs)-proposed main restricted universe multiverse
EOF sudo apt install ubuntu-dbgsym-keyring sudo apt-get update sudo apt-get install linux-image-$(uname -r)-dbgsymCentOS/RHEL:
yum install --enablerepo=base-debuginfo \ kernel-devel-$(uname -r) \ kernel-debuginfo-$(uname -r) \ kernel-debuginfo-common-$(uname -m)-$(uname -r)
Install DOCA or copy helper script (
doca_apsh_config.py
).Generate JSON files.
[target] cd /opt/mellanox/doca/tools/ [target] pip3 install psutil pdbparse [target] python3 doca_apsh_config.py --files memregions symbols --os <windows/linux> --path <path-to-dwarf2json>
Copy the output to the DPU:
[dpu] scp <generated-files> <dpu-path>
Notedwarf2json
is required but not included in DOCA. Download the latest release from GitHub. Re-run this step if the kernel is updated.
For general instructions, see the DOCA Container Deployment Guide.
Service-specific deployment details can be found on the Argus service container page.
Air-gapped deployment is supported. See the "Offline Deployment" section in the DOCA Container Deployment Guide.
The Argus service inspects host memory from within the DPU using DOCA DMA. This memory may belong to:
Bare-metal operating systems
Virtual machines
AI containers and microservices
Argus uses the Memory Query Engine and DOCA Argus libraries to decode memory into OS objects (e.g., processes, threads, files).
The decoded data is processed using:
Behavioral Profiles and Indicators – to detect known attack patterns.
Situational Awareness – to identify significant runtime changes.
Events are generated and exported using FluentBit in JSON format and can be sent to a security management system or data lake for correlation and automated response.
The service can also integrate with external threat intelligence and telemetry pipelines (NVIDIA or third-party) for enhanced analysis.

The DOCA Argus service is configurable via the SERVICE_CONFIG_FILE
section of the container's YAML file. Adjust the configuration according to your deployment requirements.
Service Settings
Immediate Shutdown
If enabled, the service shuts down immediately upon receiving a SIGINT
or SIGTERM
signal, without waiting for a graceful termination.
Service Log Level
Controls the verbosity of the Argus service logs. The default level is 50
(INFO). Available values:
10=DISABLE
20=CRITICAL
30=ERROR
40=WARNING
50=INFO (default)
60=DEBUG
70=TRACE
System Scanner Sleep Time
Sets the sleep interval between consecutive system scans. Units supported: s
(seconds), m
(minutes), ms
(milliseconds).
DOCA Argus Configuration
Auto Scan
Enables automatic scanning of all detectable systems. These systems will be scanned using the default configuration. If the systems
section is empty, Auto Scan mode is enabled by default.
Default
Defines the default system configuration, used unless overridden in the systems
section. See section "Per-System Configuration" for available parameters.
Systems
Defines a list of explicitly configured systems (host/VM) that should be scanned with custom settings. The following parameters must be overridden:
Representor ID
DMA Device Name
Per-System Configuration
Each system defined under systems
supports the following parameters:
Representor ID
Specifies the ID of the VF/PF to be monitored. Only VU (Virtual Unique) format is supported.
For PFs:
[host] lspci -vv -s <pf_pci_address> | grep VU | cut -d
" "
-f4
To list PF PCIe addresses:
[host] lspci | grep
"Ethernet controller: Mellanox Technologies"
NoteAlways run
lspci
on the host, not the BlueField. The VU ID on BlueField will appear with “EC” inserted (e.g.,MT2333XZ06YAECMLNXS0D0F0
), which is invalid.For VFs: Append
VF<x>
to the PF's VU ID. For example:PF VU ID:
MT2333XZ06YAMLNXS0D0F0
VF #1 VU ID:
MT2333XZ06YAMLNXS0D0F0VF1
Memory Regions Path
Path to a JSON file containing memory region definitions (excluding device regions) for the monitored OS. Refer to doca_apsh_system
in the DOCA App Shield Programming Guide.
OS Symbol Path
Path to the OS symbol manifest (single file or directory). Refer to doca_apsh_system
for details.
OS Type
Specifies the OS type: linux
or windows
.
DMA Device Name
Name of the DMA (Direct Memory Access) device to be used. To list available devices:
[dpu] ibv_devinfo | grep 'hca_id'
| awk '{print $2}'
Typically, the last number in the VU ID correlates with the DMA device (e.g., mlx5_0
).
Service Log Level
Overrides service logging verbosity (same options as above).
SDK Log Level
Controls logging verbosity for the DOCA SDK.
10=DISABLE
20=CRITICAL
30=ERROR
40=WARNING
50=INFO (default)
60=DEBUG
70=TRACE
System Scanner Sleep Time
Sets the sleep interval between consecutive system scans. Units supported: s
(seconds), m
(minutes), ms
(milliseconds).
DOCA Argus Configuration
Auto Scan
Enables automatic scanning of all detectable systems. These systems will be scanned using the default configuration. If the systems
section is empty, Auto Scan mode is enabled by default.
Default
Defines the default system configuration, used unless overridden in the systems
section. See section "Per-System Configuration" for available parameters.
Systems
Defines a list of explicitly configured systems (host/VM) that should be scanned with custom settings. The following parameters must be overridden:
Representor ID
DMA Device Name
Per-System Configuration
Each system defined under systems
supports the following parameters:
Representor ID
Specifies the ID of the VF/PF to be monitored. Only VU (Virtual Unique) format is supported.
For PFs:
[host] lspci -vv -s <pf_pci_address> | grep VU | cut -d
" "
-f4
To list PF PCIe address:
[host] lspci | grep
"Ethernet controller: Mellanox Technologies"
For VFs, append
VF<x>
to the PF's VU ID. For example:PF VU ID:
MT2333XZ06YAMLNXS0D0F0
VF #1 VU ID:
MT2333XZ06YAMLNXS0D0F0VF1
Memory Regions Path
Path to a JSON file containing memory region definitions (excluding device regions) for the monitored OS. Refer to doca_apsh_system
in the DOCA App Shield Programming Guide.
OS Symbol Path
Path to the OS symbol manifest (single file or directory). Refer to doca_apsh_system
for details.
OS Type
Specifies the OS type: linux
or windows
.
DMA Device Name
Name of the DMA (Direct Memory Access) device to be used. To list available devices:
[dpu] ibv_devinfo | grep 'hca_id'
| awk '{print $2}'
Typically, the last number in the VU ID correlates with the DMA device (e.g., mlx5_0
).
Service Log Level
Overrides service logging verbosity (same options as above).
SDK Log Level
Controls logging verbosity for the DOCA SDK.
10=DISABLE
20=CRITICAL
30=ERROR
40=WARNING
50=INFO (default)
60=DEBUG
70=TRACE
Limits
These settings prevent resource exhaustion or excessive scanning overhead.
String length – Maximum length of strings (e.g., command names) to track.
Process – Maximum number of processes to track.
File handles – Maximum number of file descriptors (e.g., open files, sockets) to track.
Threads – Maximum number of threads to track per process.
Process memory – Maximum number of VMAs (Virtual Memory Areas) to track per process.
Events
Container filter – Enables filtering of activities originating from within containers. Non-containerized processes are not filtered.
SBOM – Specifies authorized SHA256 hashes of executables and libraries. Format:
<SHA256>[, <size>]
.Containers
: Defines SBOM entries for containerized processes.Non-containers
: Defines SBOM entries for non-containerized processes.
Collection
Events – Toggle to enable/disable event collection.
Output
Logging
Log events to stdout – Enables or disables console logging.
Log folder path – Directory for saving log files. Set to
false
to disable.Log threshold size – Maximum size of a single log file before rotation.
Log max files count – Number of rotated log files to retain.
Telemetry
Telemetry address – Destination address for sending telemetry records. Set to
false
to disable.Telemetry tag – Tag added to each telemetry record (used by Fluent Bit for filtering).
Telemetry format – Format for telemetry data:
json
orsyslog
.Telemetry user data – Optional user-defined metadata appended to each telemetry record.
The DOCA Argus service provides several logging outputs for monitoring service behavior, debugging, and event collection.
Standard Output
The standard output stream displays only essential service messages, including:
Version information
Successful startup confirmations
Failure messages and critical errors
This output is intended for general runtime visibility.
Debug Log Output
A complete debug log is available at /var/log/doca_argus/
.
This log includes detailed information such as:
Event metadata (partial data)
Trace-level service logs
Collection and processing failures
System-level diagnostics
These logs are useful for in-depth troubleshooting and should be enabled in non-production or debug deployments.
Event Log Output
Event logs are written in JSON format to the log folder specified in the service configuration file. These logs contain complete records of all Argus-detected events and are suitable for local analysis or archival.
Logs are automatically rotated using the system's
logrotate
utility.To customize rotation policies, edit the following configuration files:
/etc/cron.d/logrotate
/etc/logrotate.d/argus
Telemetry Output
Telemetry data can be exported in JSON or syslog formats. The Argus service is designed to integrate with Fluent Bit for real-time forwarding of telemetry records to analytics or monitoring systems.
Telemetry is disabled by default. To enable it, specify a telemetry address in the service configuration.
Fluent Bit Integration
To collect and forward telemetry data from the Argus service using Fluent Bit, configure the following input block:
[INPUT]
Name tcp
Tag <your prefered tag>
Listen 0.0
.0.0
Port 24224
Format json
The value of Tag
should match the telemetry_tag
field in the Argus service configuration file.
Optional: Splunk Compatibility
If integrating with Splunk, add the following filter to the Fluent Bit configuration to encapsulate all telemetry data under a single event
field:
[FILTER]
Name nest
Match *
Operation nest
Wildcard *
Nest_under event
Example: Fluent Bit with Elasticsearch
The following is a basic Fluent Bit configuration example that forwards Argus telemetry logs to an Elasticsearch cluster:
[INPUT]
Name tcp
Tag elastic_forward_input
Listen 0.0
.0.0
Port 24224
Format json
[SERVICE]
Log_Level info
[OUTPUT]
Name es
Match *
Host <elastic search IP>
Port <elastic search port>
Index argus
Suppress_Type_Name On
Log_Level info
To run Fluent Bit in a container with this configuration:
docker run --rm --net=host \
-v <path_to_fluentbit_conf_file>:/fluent-bit/etc/fluent-bit.conf \
--name fluent_bit -it fluent/fluent-bit
For additional integrations and advanced features, refer to the Fluent Bit manual.
The DOCA Argus service generates structured telemetry messages for each event, alert, or system activity it detects. Each message includes a standardized header, system metadata, and activity-specific details.
Parameter | Data Type | Parent Object | Description |
| enum |
| Always set to |
| enum |
| Always set to |
| string |
| DOCA Argus service version. |
| enum |
| Type of message: |
| enum |
| Severity level of the message: |
| string |
| Version of the telemetry schema. |
| string |
| Unique identifier for the message. Enables updates or correlation. |
| integer |
| Event timestamp in UTC (milliseconds since epoch). |
| string |
| Local time of the event in RFC3339 format. |
| string |
| UTC time of the event in RFC3339 format. |
| string |
| Optional user-configured metadata. |
| object |
| Details about the DPU system. |
| array |
| List of DPU network interfaces with IPs and MACs. |
| string |
| Name of the BlueField interface. |
| string |
| MAC address of the interface. |
| string |
| IP address of the interface. |
| object |
| Metadata about the scanned host or VM. |
| string |
| VUID of auto-scanned system, or |
| string |
| OS version (e.g., |
| array |
| List of workload interfaces with IPs and MACs. |
| string |
| Name of the workload interface. |
| string |
| MAC address of the workload interface. |
| string |
| IP address of the workload interface. |
| object |
| Describes the detected activity or alert. |
| string |
| Name of the event, alert, or system activity. |
| object |
| Additional data about the triggering activity. |
| object |
| Data about the parent activity that led to this detection. |
The following is a sample JSON message illustrating a typical event or alert generated by DOCA Argus:
{
"vendor_name"
: "NVIDIA"
,
"product_name"
: "DOCA_ARGUS"
,
"product_version"
: "<version>"
,
"message_type"
: "<EVENT | ALERT | SYSTEM_ACTIVITY>"
,
"severity"
: "<INFO | ERROR | WARNING | MEDIUM | HIGH | CRITICAL>"
,
"schema_version"
: "1.0"
,
"message_id"
: "<unique_message_id>"
,
"occurred_message_timestamp_utc_ms"
: "14367294690321"
,
"occurred_message_display_time_local_rfc3339"
: "2025-04-10T16:50:03.836+00:00"
,
"occurred_message_display_time_utc_rfc3339"
: "2025-04-10T16:50:03.836Z"
,
"user_data"
: "NONE"
,
"bluefield_system_information"
: {
"bluefield_networking_interfaces"
: {
"0"
: {
"bluefield_network_interface_name"
: "<>"
,
"bluefield_network_interface_mac_address"
: "<>"
,
"bluefield_network_interface_ip_address"
: "<>"
},
"..."
}
},
"workload_information"
: {
"unique_identifier"
: "<>"
,
"os_version"
: "<>"
,
"workload_networking_interfaces"
: {
"0"
: {
"network_interface_name"
: "<>"
,
"network_interface_ip_address"
: "<>"
,
"network_interface_mac_address"
: "<>"
},
"..."
}
},
"activity_data"
: {
"name"
: "<the name of the event | alert | system_activity>"
,
"<activity>_details"
: {
"..."
},
"<parent_activity>_details"
: {
"..."
}
}
}
The DOCA Argus service extracts and reports a wide range of system-level attributes related to processes, memory, file handles, networking, and executable files. The following sections describe the attributes supported by Argus for each monitored object type.
Processes
Attribute | Description |
| Command name of the process. |
| Unique process identifier. |
| Unique identifier for the process’s own execution. |
| SHA256 hash of the process executable. |
| SHA1 hash of the process executable. |
| MD5 hash of the process executable. |
| Size of the process executable in bytes. |
| Path to the directory containing the executable. |
| Command-line arguments used to launch the process. |
| Timestamp of process creation. |
| Parent process identifier. |
| User ID of the process owner. |
| Group ID of the process owner. |
| Current process state. |
| CPU cycles consumed by the process. |
| ID of the container running the process (if applicable). |
| Namespace for process IDs. |
| Namespace for mount points. |
| Namespace for network resources. |
Threads
Attribute | Description |
| Unique thread identifier. |
| Unique identifier for the thread’s own execution. |
| Thread termination state. |
| Pointer to the thread’s memory management structure. |
File Handles
Attribute | Description |
| Process ID associated with the file descriptor. |
| Numeric file descriptor. |
Network Connections
Attribute | Description |
| File descriptor associated with the socket. |
| Current connection state (e.g., ESTABLISHED, CLOSED). |
| Timestamp of TCP connection creation. |
| Protocol in use ( |
| Source IP address. |
| Source port number. |
| Destination IP address. |
| Destination port number. |
| Bytes received over the connection. |
| Bytes sent over the connection. |
| Number of inbound TCP segments. |
| Number of outbound TCP segments. |
| Name of the interface handling the connection. |
| MAC addresses assigned to the interface. |
| IP addresses assigned to the interface. |
| Average size of inbound packets. |
| Average size of outbound packets. |
Process Memory
Attribute | Description |
| Process ID associated with the memory region. |
| Starting virtual address of the memory region. |
| Ending virtual address of the memory region. |
| Pointer to the next VMA (virtual memory area). |
| Pointer to the previous VMA. |
| Memory protection flags for the VMA. |
| Address of anonymous memory regions. |
| Address of the file structure backing the VMA. |
| Indicates whether this VMA corresponds to the main process executable. |
| Path to the file backing the memory region. |
Loaded Executables and Libraries (Attestation)
Attribute | Description |
| Inode number of the ELF file. |
| Name of the ELF executable or library. |
| ELF file type (e.g., |
| Full file path to the ELF binary. |
| SHA256 hash of the ELF file. |
| SHA1 hash of the ELF file. |
| MD5 hash of the ELF file. |
| File size of the ELF binary. |
| Indicates if this ELF file is the main process executable. |
The DOCA Argus service supports a wide range of runtime event detections across containers, processes, memory, threads, file handles, network connections, and executable files. These events are reported in real time to aid in security monitoring, threat detection, and forensic analysis.
Container Events
Event | Description |
Container Created | Detects creation of new containers (e.g., Docker). |
Container Terminated | Detects termination or disappearance of existing containers. |
Process Events
Event | Description |
Process Created | Detects creation of new processes. |
Process Terminated | Detects process termination or disappearance. |
Process Zombie | Detects processes in a zombie (defunct) state. |
Process Hidden | Detects processes running in a hidden or stealth state. |
File Handle Events
Event | Description |
File Handle Created | Detects creation of new file descriptors (e.g., files opened by processes). |
File Handle Terminated | Detects termination or disappearance of existing file descriptors. |
Network Connection Events
Event | Description |
Network Connection Created | Detects creation of new network connections. |
Network Connection Terminated | Detects termination or disappearance of network connections. |
TCP Connection Excessive Data | Detects network connections exceeding allowed data thresholds. |
TCP Connection Excessive Data In Limit | Sets the threshold for excessive inbound data (units: |
TCP Connection Excessive Data Out Limit | Sets the threshold for excessive outbound data (units: |
TCP Long Lasting Connection | Detects TCP connections exceeding a duration threshold. |
TCP Long Lasting Connection Limit | Sets the maximum allowed connection duration (units: |
TCP Network Connections State Change | Detects state transitions of TCP connections (e.g., |
Reverse Shell Detected | Detects potential reverse shells (e.g., remote interactive bash sessions). |
Process Memory (VMA) Events
Event | Description |
Process Memory Created | Detects creation of new virtual memory areas (e.g., heap, stack, executables). |
Process Memory Terminated | Detects termination or disappearance of memory areas. |
New Executable Anonymous Memory Mapped | Detects executable anonymous memory regions (often indicative of shellcode or injected code). |
File Unmapped | Detects when file-backed executable memory is unmapped. |
Executable Permissions Added | Detects when executable permissions are granted to a memory region. |
Executable Permissions Removed | Detects when executable permissions are revoked. |
Loaded Executables and Libraries (Attestation) Events
Event | Description |
New File Mapped | Detects mapping of new ELF files (e.g., executables, shared libraries). |
Foreign Binary Executed | Detects execution of binaries not present in the Software Bill of Materials (SBOM). |
Foreign Binary Loaded | Detects loading of libraries not included in the SBOM. |
Thread Events
Event | Description |
Thread Created | Detects creation of new threads. |
Thread Terminated | Detects thread termination or disappearance. |
System activity events are emitted by the DOCA Argus service to indicate the internal operational status of the service on a per-system basis. These events help monitor service health, initialization outcomes, and OS profile matching.
Event | Description |
DOCA Argus Service Initialization Started | Logged when the DOCA Argus service begins initialization for a specific system. |
DOCA Argus Service Initialization Successful | Logged when the service completes initialization successfully for a specific system. |
DOCA Argus Service Initialization Failed | Logged when service initialization fails for a specific system. Argus will not monitor the system until a successful reinitialization occurs. |
DOCA Argus Service Runtime Failure | Logged when a runtime failure occurs. The service will stop monitoring the affected system until reinitialized. |
DOCA Argus Service Gracefully Shutdown | Logged when the service is shut down by user request; generated per monitored system. |
Details Gathering Failed | Logged when an internal event engine fails to collect required system data. |
Host Initialization Started | Logged at the start of host system detection. |
Host Initialization Failed | Logged when host detection fails. |
Host Initialization Successful | Logged when the host detection process completes successfully. |
Loading Profile Candidate | Logged when a candidate OS profile is identified for initialization. |
Profile Parsing Failed | Logged when the system fails to parse a specific OS profile configuration. |
Profile Verification Failed | Logged when verification of a candidate OS profile fails. The service will continue evaluating other candidates. |
Profile Verification Successful | Logged when a candidate OS profile is successfully verified. |
OS Identifier Found | Logged when the host OS version is successfully identified. |
Unable to Determine Target OS | Logged when automatic OS detection fails. |
No Matching Profile Found | Logged when no valid OS profile matches the detected host system. |