DOCA OS Inspector Service Guide
Not part of DOCA release
Contents:
This guide provides instructions on how to use the DOCA OS Inspector on top of NVIDIA® BlueField® DPU.
DOCA OS Inspector service allows monitoring various aspects of a target VM/bare-metal Host by inspecting the memory of the target operating system and exporting it to be utilized by various services for security, big data and many more AI-based services.
DOCA OS Inspector service is linked to DOCA Telemetry Service (DTS). DOCA OS Inspector uses DOCA App Shield Library for collecting OS data of the target system without hindering it. The service parses the collected data and forwards it to the DTS which manages the rest of the telemetry aspects.
The DOCA OS Inspector runs inside of its own Kubernetes pod on BlueField. The collected data is parsed and sent, in a predefined struct, to a telemetry collector which manages the rest of the telemetry aspects.
Please follow the steps needed to work with DOCA App Shield lib As explained in the Lib's documentation/.
Copy
doca_apsh_config
generated JSON files from host/VM to the DPU, to the path/opt/mellanox/doca/services/os_inspector/
.for example:
dpu> scp root
@192
.168.100.1
:~/*.json /opt/mellanox/doca/services/os_inspector/Place your service configuration JSON files
os_inspector_params.json
andos_inspector_cfg.json
at the path/opt/mellanox/doca/services/os_inspector/
.Create a VF to be used by the service according to the DOCA Virtual Functions User Guide and expose it to the target system.
FW Version
The firmware version must be 24.32.1010 and higher.
BlueField OS (BFB) Version
Supported BlueField OS versions are 3.9.3 and higher.
For information about the deployment of DOCA containers on top of the BlueField DPU, refer to NVIDIA DOCA Container Deployment Guide.
Service-specific configuration steps and deployment instructions can be found under the service's container page.
JSON Input
This file configures what data objects, "events", will be exported from the service, to view the service parameters JSON go to Service Parameters JSON section.
The DOCA OS Inspector configuration file should be placed under /opt/mellanox/doca/services/os_inspector/<json_file_name>.json
and be built in the following format:
/*
* key: lib APSH struct
* value: true/false - collect & export or not
*/
{
"processes_info"
: [true
/false
],
"threads_info"
: [true
/false
],
"libs_info"
: [true
/false
],
"vads_info"
: [true
/false
],
"system_modules_info"
: [true
/false
],
"privileges_info"
: [true
/false
],
"processes_envars_info"
: [true
/false
],
}
Allowed data objects for export:
"processes_info" – Information about each process that is running on the target system, see DOCA App Shield API documentation of
doca_apsh_process_attr
TODO insert correct link here and in all below"threads_info" – Information about all threads that are running on the target system, see DOCA App Shield API documentation of
doca_apsh_thread_attr
"libs_info" – Information about all the libraries of each process that is running on the target system, see DOCA App Shield API documentation of
doca_apsh_lib_attr
"vads_info" – Information about the VADs/VMAs (Windows/Linux) of each process that is running on the target system, see DOCA App Shield API documentation of
doca_apsh_vad_attr
"system_modules_info" – Information about each kernel module that is active in the target system, see DOCA App Shield API documentation of
doca_apsh_module_attr
"privileges_info" – Information about the privileges of each process that is running on the target system, see DOCA App Shield API documentation of
doca_apsh_privilege_attr
"processes_envars_info" – Information about the environment variables state for each process that is running on the target system, see DOCA App Shield API documentation of
doca_apsh_envar_attr
All exported data object that relate to a process contain two fields that can be use to identify the related process: PID (process id) and COMM (process executable name)
Events Configuration JSON Example:
{
"processes_info"
: true
,
"threads_info"
: true
,
"libs_info"
: false
,
"vads_info"
: true
,
"system_modules_info"
: true
,
"privileges_info"
: false
,
"processes_envars_info"
: false
}
The precise fields of each data object may depend on the target OS type and some of the data object might be available only for a certain OS type.
Changing the JSON file will not cause the service to change configuration during runtime.
Current string values are assumed to be up to 999 bytes long. If a string is longer than 999 bytes, the service will export 998 bytes with the last byte as "+" to indicate the value is truncated
Service Parameters JSON
This file configures the general behavior of the service and gives it the location of needed resources.
The DOCA OS Inspector parameters file should be placed as/opt/mellanox/doca/services/os_inspector/os_inspector_params.json
and be built in the following format:
{
"doca_general_flags"
:{
// -l - sets the log level for the service DEBUG=60, CRITICAL=20
"log-level"
: 60
,
},
"doca_program_flags"
:{
// -p - Sets the path to the events configuration file in a JSON format.
"policy"
: "/os_inspector/os_inspector_cfg.json"
,
"memr"
: "/os_inspector/mem_regions.json"
,
"vuid"
: "MT2140X05931MLNXS0D0F0"
,
"dma"
: "mlx5_0"
,
"osym"
: "/os_inspector/symbols.json"
,
"osty"
: "linux"
,
"time"
: 20
}
}
Each JSON key is defined as follows:
The
doca_program_flags
is the DOCA general runtime arguments received by the service:"log-level": <value>
– sets the log level <CRITICAL=20, ERROR=30, WARNING=40, INFO=50, DEBUG=60>
The
doca_program_flags
is a JSON with the runtime arguments received by the services which are as follows:"policy": <path>
– Path to the JSON file with export configuration"
memr": <path>
– System memory regions map"
vuid": <string>
– VUID of the System device"
dma": <string>
– DMA device name"
osym": <path>
– System OS symbol map path"
osty": <windows|linux>
– System OS type - windows/linux"
time": <seconds>
– Scan time interval in seconds
Yaml File
The .yaml
file downloaded from NGC can be easily edited according to your needs.
env:
# Set according to the local setup
- name: SERVICE_ARGS
value: /os_inspector/os_inspector_params.json
The
SERVICE_ARGS
is a JSON with the runtime arguments received by the services which is defined at Service Parameters JSON section.
Verifying Output
Enabling write to data in the DTS allows debugging the validity of the DOCA Flow Inspector.
To allow DTS to write locally, uncomment the following line in dts_config.ini
:
#output=/data
Any changes in dts_config.ini
necessitate restarting the pod for the new settings to apply.
The schema folder contains JSON-formatted metadata files which allow reading the binary files containing the actual data. The binary files are written according to the naming convention shown in the following example (apt install tree
):
$ tree /opt/mellanox/doca/services/telemetry/data/
/opt/mellanox/doca/services/telemetry/data/
├── {year}
│ └── {mmdd}
│ └── {hash}
│ ├── {source_id}
│ │ └── {source_tag}{timestamp}.bin
│ └── {another_source_id}
│ └── {another_source_tag}{timestamp}.bin
└── schema
└── schema_{MD5_digest}.json
New binary files appear when:
The service starts
When the binary file's max age/size restriction is reached
When JSON file is changed and new schemas of telemetry are created
An hour passes
If no schema or no data folders are present, refer to the Troubleshooting section in DOCA Telemetry Service Guide.
source_id
is usually set to the machine hostname. source_tag
is a line describing the collected counters, and it is often set as the provider's name or name of user-counters.
Reading the binary data can be done from within the DTS container using the following command:
crictl exec -it <Container ID> /opt/mellanox/collectx/bin/clx_read -s /data/schema /data/path/to/datafile.bin
The data written locally should be shown in a JSON format. You should expect a large output.
On top of the troubleshooting section found in the NVIDIA DOCA Container Deployment Guide, here are additional troubleshooting tips for DTS.
For general troubleshooting, refer to the DOCA Troubleshooting.
When running both containers, you must first run DOCA Telemetry Service, wait a few seconds, and then run DOCA OS Inspector.