DOCA Documentation v3.1.0

DOCA Argus Service Guide

This page provides installation, configuration, and usage instructions for the DOCA Argus Service.

DOCA Argus is a DOCA service running on NVIDIA® BlueField® networking platforms, designed to immediately detect and enable response to attacks, minimizing their potential impact and risk.

The DOCA Argus framework provides real-time situational awareness and runtime threat detection by inspecting host memory using advanced memory forensics. Live machine introspection is performed at the hardware level, analyzing specific snippets of volatile host memory to monitor threats in real time without impacting system performance. DOCA Argus does not violate privacy, as information is extracted only from kernel structures.

Unlike conventional tools, Argus runs independently of the host, requiring no agents, integration, or reliance on host-based resources. This agentless, zero-overhead design enhances system efficiency and ensures resilient security in any compute environment, including bare-metal, virtualized, containerized, and multi-tenant infrastructures. By operating outside the host, isolated in its own trust domain, DOCA Argus remains invisible to attackers—even if the system is compromised.

Cybersecurity professionals can integrate DOCA Argus with SIEM, SOAR, and XDR platforms for continuous monitoring, incident response, and automated threat mitigation, extending existing capabilities into AI infrastructure environments.

NVIDIA BlueField provides built-in, data-centric protection for AI workloads at scale. Combining BlueField’s acceleration capabilities with DOCA Argus’ proactive threat detection enables cloud service providers and enterprises to secure AI factories without compromising performance or efficiency.

A single BlueField card with DOCA Argus can monitor an entire node.

Raw activities are collected from host memory and used to outline the operational state of a workload. DOCA Argus uses DOCA DMA to access and inspect host memory. Accessed memory is decoded into logical information (e.g., process and thread data). A policy engine processes these activities, filtering irrelevant content and reporting only meaningful data.

Key concepts:

  • Event – One or more meaningful activities that represent the current recorded state. Provides situational awareness.

  • Alert – One or more meaningful activities that indicate an immediate threat or impact requiring investigation or response.

Events, alerts, and system activity messages are formatted in JSON and syslog, and logged locally. Data can be exported via Fluent Bit integration for delivery to security platforms and data lakes.

image-2025-7-23_12-6-1-version-1-modificationdate-1753261562437-api-v2.png

  • Operates only on DPU targets (BlueField-2 or later).

  • Requires DPU mode (see BlueField Modes of Operation).

  • Requires firmware version 24.35.0388 or later.

  • Supported BlueField image versions: 4.11.0 or later.

  • Argus service container must run in privileged mode to enable full-system DMA reads.

  • Tested only on KVM hypervisors.

  • Supports Linux-based OSs (bare-metal, virtualization, containers). Windows OS support planned.

  • Kata Containers are supported only if NVIDIA-DPU support is enabled.

  • Supports only x86 64-bit architectures. AARCH64 support planned.

  1. Configure BlueField firmware. On BlueField, configure the PF BAR register:

    Copy
    Copied!
                

    dpu> mlxconfig -d /dev/mst/<mst_device> s PF_BAR2_SIZE=2 PF_BAR2_ENABLE=1

    Replace <mst_device> with:

    • mt41686_pciconf0 for BlueField-2

    • mt41692_pciconf0 for BlueField-3

  2. Enable IOMMU passthrough (only if not already enabled).

    Note

    Skip unless DMA fails with messages similar to the following in dmesg:

    Copy
    Copied!
                

    mlx5_core 0000:81:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT ...]

    1. Edit GRUB config:

      Copy
      Copied!
                  

      host> sudo vim /etc/default/grub

    2. Update GRUB_CMDLINE_LINUX_DEFAULT with :

      Copy
      Copied!
                  

      iommu=pt <intel/amd>_iommu=on

    3. Apply changes:

      • For Ubuntu:

        Copy
        Copied!
                    

        sudo update-grub

      • For CentOS/RHEL:

        Copy
        Copied!
                    

        sudo grub2-mkconfig -o /boot/grub2/grub.cfg

    4. Reboot.

  3. Prepare the target system. Argus should auto-detect target config files. If not, configure manually:

    1. Download OS debug symbols.

      • For Ubuntu:

        Copy
        Copied!
                    

        sudo tee /etc/apt/sources.list.d/ddebs.list << EOF deb http://ddebs.ubuntu.com/ $(lsb_release -cs) main restricted universe multiverse deb http://ddebs.ubuntu.com/ $(lsb_release -cs)-updates main restricted universe multiverse deb http://ddebs.ubuntu.com/ $(lsb_release -cs)-proposed main restricted universe multiverse EOF sudo apt install ubuntu-dbgsym-keyring sudo apt-get update sudo apt-get install linux-image-$(uname -r)-dbgsym

      • For CentOS/RHEL:

        Copy
        Copied!
                    

        sudo yum install --enablerepo=base-debuginfo \ kernel-devel-$(uname -r) \ kernel-debuginfo-$(uname -r) \ kernel-debuginfo-common-$(uname -m)-$(uname -r)

    2. Install DOCA on target or copy doca_apsh_config.py from BlueField.

    3. Create JSON files:

      Copy
      Copied!
                  

      cd /opt/mellanox/doca/tools/ pip3 install psutil pdbparse python3 doca_apsh_config.py --files memregions symbols --os <windows/linux> --path <path to dwarf2json> cp /opt/mellanox/doca/tools/*.* <shared-folder> dpu> scp <shared-folder>/* <path-to-app-shield-binary>

      Note

      dwarf2json must be installed separately from GitHub. Repeat this step after kernel updates.

Argus configuration is managed via SERVICE_CONFIG_FILE in the container YAML.

Service

  • Immediate shutdown – Terminate immediately on SIGINT/SIGTERM (skip graceful shutdown).

  • Service log level – DOCA logging verbosity (default 50 = INFO). Options: 10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE.

  • System scanner sleep time – Delay between scans (s = seconds, m = minutes, ms = milliseconds).

DOCA Argus Configuration

  • Auto Scan – Scan all available systems unless systems section is defined.

  • Default – Default configs applied if not overridden in systems.

  • Systems – List of monitored systems with overrides.

Per-System Configurations

  • Representor ID – VU ID of VF/PF to track.

    • PF –

      Copy
      Copied!
                  

      host> lspci -vv -s <PF_pci_address> | grep VU | cut -d " " -f 4

    • VF – Append VF<x> to PF’s VU ID. Example: MT2333XZ06YAMLNXS0D0F0VF1

  • Memory regions path – JSON file path (or auto) for host OS memory map.

  • OS symbol path – JSON file path or directory (or auto).

  • OS type – Linux or Windows.

  • DMA device name – Matches representor ID. List devices:

    Copy
    Copied!
                

    dpu> ibv_devinfo | grep 'hca_id' | awk '{print $2}'

  • Service log level – Overrides service log verbosity.

  • SDK log level – Sets SDK logging verbosity.

  • Limits – Set max values for string length, processes, file handles, threads, VMAs.

  • Events

    • Container filter – Include/exclude containerized processes.

    • SBOM – List SHA signatures of approved executables/libraries.

Collection

  • Events – Enable/disable per event type.

Output

  • Log events to stdout – Enable standard output logging.

  • Log folder path – Directory for file logs.

  • Log threshold size – Rotate logs at this size.

  • Log max files count – Max number of rotated logs.

  • Telemetry address – Aggregator address.

  • Telemetry tag – Tag for Fluent Bit integration.

  • Telemetry format – JSON or syslog.

  • Telemetry user data – Custom user-defined metadata.

Standard Output

Displays only important service logs, such as version information, successful startups, and error messages on failures.

Debug Log Output

Provides a complete log output for debugging, including partial event data, trace logs, collection failures, and more. These logs are stored in the /var/log/doca_argus/ directory.

Event Log Output

Stores a complete event log in JSON format in the log folder path specified in the service configuration file. For local log storage, log rotation is handled by Linux logrotate. You can override the default configuration in /etc/cron.d/logrotate and /etc/logrotate.d/argus.

Telemetry Output

The Argus service can produce telemetry records in JSON or syslog formats.

By default, telemetry is disabled. To enable it, set the telemetry_address parameter in the service configuration file and ensure telemetry_tag matches the tag used in your Fluent Bit configuration.

Telemetry has been tested with Fluent Bit integration, which should run independently from the Argus service.

For example, running Fluent Bit locally on the DPU alongside the Argus service can be configured with the following input section:

Copy
Copied!
            

[INPUT] Name tcp Tag <your preferred tag> Listen 0.0.0.0 Port 24224 Format json

If you are using Splunk, add the following encapsulation filter to the Fluent Bit configuration file:

Copy
Copied!
            

[FILTER] Name nest Match * Operation nest Wildcard * Nest_under event

Fluent Bit is flexible and can integrate with many output destinations.

The following is a basic example that forwards telemetry data to Elasticsearch:

Copy
Copied!
            

[INPUT] Name tcp Tag elastic_forward_input Listen 0.0.0.0 Port 24224 Format json   [SERVICE] Log_Level info   [OUTPUT] Name es Match * Host <elasticsearch_ip> Port <elasticsearch_port> Index argus Suppress_Type_Name On Log_Level info

To run Fluent Bit with this configuration:

Copy
Copied!
            

docker run --rm --net=host -v <path_to_fluentbit_conf_file>:/fluent-bit/etc/fluent-bit.conf --name fluent_bit -it fluent/fluent-bit

Refer to the Fluent Bit manual for details on additional output plugins and configurations.

The DOCA Argus service generates structured output messages containing detailed metadata, system information, and activity data.

The following table describes the fields included in each message:

Parameter

Data Type

Parent Object

Description

message_header

object

Root-level object containing the message metadata.

vendor_name

enum

message_header

Name of the vendor. Value: NVIDIA.

product_name

enum

message_header

Name of the product. Value: DOCA_ARGUS.

product_version

string

message_header

Product version.

message_type

enum

message_header

Can be EVENT, ALERT, or SYSTEM_ACTIVITY.

severity

enum

message_header

Severity of the event/alert/system activity (INFO, ERROR, WARNING, MEDIUM, HIGH, CRITICAL).

schema_version

string

message_header

Schema format version used by the message.

message_id

string

message_header

Unique message identifier.

occurred_message_timestamp_utc_ms

integer

message_header

UTC timestamp (in milliseconds) when the message occurred.

occurred_message_display_time_local_rfc3339

string

message_header

Local display time when the message occurred (RFC 3339 format).

occurred_message_display_time_utc_rfc3339

string

message_header

UTC display time when the message occurred (RFC 3339 format).

message_timezone

string

message_header

Timezone of the message origin.

message_timezone_offset

string

message_header

Offset from UTC for the message timezone.

user_data

string

message_header

Configured user data.

bluefield_system_information

object

message_header

Information about the BlueField system.

bluefield_networking_interfaces

array

bluefield_system_information

List of all configured BlueField interfaces, including their names, IP addresses, and MAC addresses.

bluefield_network_interface_name

string

bluefield_networking_interfaces

Interface name.

bluefield_network_interface_mac_address

string

bluefield_networking_interfaces

MAC address of the interface.

bluefield_network_interface_ipv4_address

string/array

bluefield_networking_interfaces

IPv4 addresses associated with the interface.

bluefield_network_interface_ipv6_address

string/array

bluefield_networking_interfaces

IPv6 addresses associated with the interface.

workload_information

object

message_header

Information about the monitored workload system.

unique_identifier

string

workload_information

Unique ID of the target system (system name in configuration or VUID for auto-scanned systems).

os_version

string

workload_information

OS version of the workload (Linux Kernel x.y or Microsoft Windows major.minor.build).

workload_networking_interfaces

array

workload_information

List of all workload interfaces, including their names, IP addresses, and MAC addresses.

workload_network_interface_name

string

workload_networking_interfaces

Interface name.

workload_network_interface_mac_address

string

workload_networking_interfaces

MAC address of the interface.

workload_network_interface_ipv4_address

string/array

workload_networking_interfaces

IPv4 addresses associated with the interface.

workload_network_interface_ipv6_address

string/array

workload_networking_interfaces

IPv6 addresses associated with the interface.

activity_data

object

message_header

Details about the activity reported.

name

string

activity_data

Name of the event/alert/system activity.

<activity>_details

object

activity_data

Detailed information about the collector that triggered the event or alert.

<parent_activity>_details

object

activity_data

Details about parent activities that triggered the current activity.

The following example is a JSON message that describes the data that produced for each event and alert:

Copy
Copied!
            

{ "vendor_name": "NVIDIA", "product_name": "DOCA_ARGUS", "product_version": "<version>", "message_type": "<EVENT | ALERT | SYSTEM_ACTIVITY>", "severity": "<INFO | ERROR | WARNING | MEDIUM | HIGH | CRITICAL>", "schema_version": "1.0", "message_id": "<unique_message_id>", "occurred_message_timestamp_utc_ms": "1747052933345", "occurred_message_display_time_local_rfc3339": "2025-05-12T12:28:53.458+00:00", "occurred_message_display_time_utc_rfc3339": "2025-05-12T12:28:53.458Z", "message_timezone": "UTC", "message_timezone_offset": "0",     "user_data": "NONE", "bluefield_system_information": { "bluefield_networking_interfaces": { "0": {                "bluefield_network_interface_name": "<>", "bluefield_network_interface_mac_address": "<>", "bluefield_network_interface_ipv4_address": "<>" "bluefield_network_interface_ipv6_address": "<>"             }, "1": {                "bluefield_network_interface_name": "<>", "bluefield_network_interface_mac_address": "<>", "bluefield_network_interface_ipv4_address": "<>" "bluefield_network_interface_ipv6_address": "<>"             }, "..." } }, "workload_information": { "unique_identifier": "<>", "os_version": "<>", "workload_networking_interfaces": { "0": { "network_interface_name": "<>",                 "network_interface_mac_address": "<>" "network_interface_ipv4_address": "<>", "network_interface_ipv6_address": "<>",                             }, "1": { "network_interface_name": "<>",                 "network_interface_mac_address": "<>" "network_interface_ipv4_address": "<>", "network_interface_ipv6_address": "<>",                             }, "..." } }, "activity_data": { "name": "<the name of the EVENT | ALERT | SYSTEM_ACTIVITY>", -- Activity Details to follow per the type of EVENT | ALERT | SYSTEM_ACTIVITY --     } }

DOCA Argus monitors workload and system behavior in real time, generating alerts, events, and system activity messages that provide visibility into security-relevant activities, operational state changes, and detected anomalies. These messages are categorized by type, severity, and activity name, with descriptions to help identify their purpose and implications.

The tables in this section outline the supported activities that Argus can detect, covering a broad range of categories including process creation and termination, network connections, execution of binaries and libraries, process memory changes, file handle operations, thread creation and termination, container lifecycle events, and key system service milestones or errors.

Creation or Modification of System Processes

Type

Severity

Activity Name

Remarks

Event

Info

Process Created

A new process was detected.

Event

Info

Process Terminated

A process was terminated.

Event

Warning

Process Zombie

Detects a process in a zombie state.

Alert

High

Process Hidden

Detects a process in a hidden state.


Network Connections

Type

Severity

Activity Name

Remarks

Event

Info

Network Connection Created

A new TCP network connection was created.

Event

Info

Network Connection Terminated

A TCP network connection was terminated.

Alert

Low

TCP Connection Excessive Data

Monitors a TCP connection’s incoming or outgoing data volume that exceeds a configurable threshold (separate thresholds for incoming and outgoing traffic).

Alert

Low

TCP Long-Lasting Connection

Monitors a TCP connection whose total duration exceeds a configurable time threshold.

Event

Info

TCP Network Connection State Change

Monitors changes in the state of TCP network connections (for example, SYN_SENT, SYN_RECEIVED).

Event

Info

TCP Network Connections Status

Provides a periodic (configurable) summary of currently open TCP connections per process, including packet and byte counts. Disabled by default.

Alert

High

Reverse Shell Detected

Detects a process started with stream redirection to a remote connected socket (stdin bound to a remote socket).


Executed Binaries and Loaded Libraries (Software Bill of Materials/Process Attestation)

Type

Severity

Activity Name

Remarks

Alert

High

Foreign Binary Executed

Detects execution of a binary not included in the original container image or modified from it. May indicate that an attacker has control of the workload and is executing arbitrary commands.

Alert

High

Binary Executed Not as Intended

Detects execution of a binary from the original container image with command-line arguments and/or from a folder path not matching those in the original container image.

Alert

High

Foreign Binary Executed – File Size Mismatch

Detects execution of a binary whose reported file size differs from the file size of the corresponding binary in the original container image.

Alert

High

Foreign Library Loaded

Detects loading of a library not included in the original container image or modified from it. May indicate that an attacker has control of the workload and is running arbitrary code.

Alert

High

Foreign Library Loaded – File Size Mismatch

Detects loading of a library whose reported file size differs from the file size of the corresponding library in the original container image.


Process Memory

Type

Severity

Activity Name

Remarks

Event

Info

Process Memory Created

A new virtual memory area (e.g., heap, stack, executable) was created. Default: off.

Event

Info

Process Memory Terminated

A virtual memory area is no longer visible (terminated). Default: off.

Event

Warning

New Executable Anonymous Memory Mapped

An executable anonymous memory area was mapped.

Alert

Medium

Executable Permissions Added

Executable permissions were added to a memory area.

Alert

Medium

Executable Permissions Removed

Executable permissions were removed from a memory area.

Event

Info

New File Mapped

A new memory-mapped file was detected.

Event

Info

File Unmapped

A memory-mapped file was unmapped.


File Handles

Type

Severity

Activity Name

Remarks

Event

Info

File Handle Created

A new file handle was created.

Event

Info

File Handle Terminated

A file handle was terminated.


Threads

Type

Severity

Activity Name

Remarks

Event

Info

Thread Created

A new thread was created.

Event

Info

Thread Terminated

A thread was terminated.


Containers

Type

Severity

Activity Name

Remarks

Event

Info

Container Started

A new container instance was detected.

Event

Info

Container Terminated

A container was terminated.


System Events

Type

Severity

Activity Name

Remarks

System Activity

Info

Service Initialization Started

The DOCA Argus initialization process has started.

System Activity

Info

Service Initialization Successful

The DOCA Argus initialization process completed successfully.

System Activity

Error

Service Initialization Failed

DOCA Argus failed to initialize.

System Activity

Error

Service Runtime Failure

Critical internal service error; DOCA Argus is offline.

System Activity

Info

Service Gracefully Shutdown

DOCA Argus was successfully shut down following a user request.

System Activity

Error

Details Gathering Failed

Failed to collect required information.

System Activity

Info

Host Initialization Started

Workload detection process has started.

System Activity

Info

Host Initialization Successful

Workload detection process completed successfully.

System Activity

Error

Host Initialization Failed

Workload detection process failed.

System Activity

Info

OS Identifier Found

Successfully detected the underlying OS of the workload.

System Activity

Info

OS Identifier Discovery Extended

Detection of the workload OS is taking longer than expected.

System Activity

Info

Loading Profile Candidate

Identified an OS profile to use.

System Activity

Info

Profile Verification Successful

Successfully initialized using the identified OS profile.

System Activity

Error

Profile Verification Failed

Initialization using the identified OS profile failed; DOCA Argus will attempt subsequent profile candidates.

System Activity

Error

Profile Parsing Failed

DOCA Argus failed to parse the OS profile.

System Activity

Error

No Matching Profile Found

No matching OS profile was found.

System Activity

Error

Unable to Determine Target OS

Failed to detect the underlying OS of the workload.

System Activity

Medium

Process Limit Reached

Reached the configured limit for the number of processes to monitor.

System Activity

Medium

File Handles Limit Reached

Reached the configured limit for the number of file handles to monitor.

System Activity

Medium

Process Memory Limit Reached

Reached the configured limit for the number of virtual address descriptors to monitor.

System Activity

Medium

Threads Limit Reached

Reached the configured limit for the number of threads to monitor.


The following attributes are currently provided for processes, TCP network connections, file handles, threads, process memory, and SBOM/process attestation.

For requests regarding the extraction of additional attributes, please contact NVIDIA.

Processes

Attribute

Description

process_name

Command name of the process.

process_id

Unique process identifier.

process_self_exec_id

Thread-group-change indicator (e.g., incremented on exec calls).

process_hash_sha256

SHA256 hash of the process’s executed binary.

process_hash_sha1

SHA1 hash of the process’s executed binary.

process_hash_md5

MD5 hash of the process’s executed binary.

process_file_size_bytes

File size, in bytes, of the process’s executable.

process_file_name

File name of the process’s executable.

process_folder_path

Path to the folder containing the process’s executable.

process_command_line_arguments

Command line arguments used to start the process.

process_creation_time_nanoseconds

Process creation time in nanoseconds (workload time).

process_parent_process_id

Parent process identifier.

process_real_user_id

Real user ID of the process owner.

process_real_group_id

Real group ID of the process owner.

process_state

Current state of the process.

process_cpu_clock_cycles

Number of CPU cycles consumed by the process.

process_container_id

Container ID, if the process is part of a container.

process_pid_namespace

Namespace for process identifiers.

process_mount_points_namespace

Namespace for mount points.

process_network_namespace

Namespace for network resources.


Threads

Attribute

Description

thread_id

Unique thread identifier.

thread_self_exec_id

Thread-group-change indicator (e.g., incremented on exec calls).

thread_exit_state

Thread’s exit state.


File Handles

Attribute

Description

process_id

Associated process ID.

file_descriptor_id

File descriptor identifier.


TCP Network Connections

Attribute

Description

file_descriptor_id

Unique file descriptor ID associated with the socket.

connection_state

TCP connection state.

protocol

Network protocol used.

source_ip_address

Source IP address.

source_port

Source port number.

destination_ip_address

Destination IP address.

destination_port

Destination port number.

tcp_bytes_in

Amount of data received, in bytes.

tcp_bytes_out

Amount of data sent, in bytes.

tcp_segments_in

Number of TCP segments received.

tcp_segments_out

Number of TCP segments sent.

workload_network_interface_name

Name of the network interface.

workload_network_interface_mac_address

MAC address of the network interface.

workload_network_interface_ipv4_address

IPv4 addresses associated with the interface.

workload_network_interface_ipv6_address

IPv6 addresses associated with the interface.

tcp_connection_creation_time_utc_ms

Time when the TCP connection was observed, in UTC milliseconds.

tcp_connection_termination_utc_ms

Time when the TCP connection was terminated, in UTC milliseconds.

tcp_connection_overall_duration_utc_ms

Overall duration of the TCP connection, in UTC milliseconds.

tcp_average_bytes_in

Average packet size received, in bytes.

tcp_average_bytes_out

Average packet size sent, in bytes.


Process Memory

Attribute

Description

process_id

Associated process’s unique ID.

virtual_memory_area_start_address

Start address of the virtual memory area.

virtual_memory_area_end_address

End address of the virtual memory area.

memory_permissions

Permissions associated with the virtual memory area.

is_main_process_executable

Whether the virtual memory belongs to the process’s main executable.

file_path

Full path (including file name) of the file associated with the memory area.

file_name

File name associated with the memory area.


Executed Binaries and Loaded Libraries (Attestation)

Attribute

Description

elf_file_inode_number

Inode number of the ELF file.

elf_file_name

Name of the ELF file.

elf_file_type

Type of the ELF file.

elf_file_path

File path of the ELF file.

elf_file_hash_sha256

SHA256 hash of the ELF file.

elf_file_hash_sha1

SHA1 hash of the ELF file.

elf_file_hash_md5

MD5 hash of the ELF file.

elf_file_size_bytes

File size of the ELF file, in bytes.

is_main_process_executable

Whether this file is the main executable for the process.


© Copyright 2025, NVIDIA. Last updated on Sep 4, 2025.