DOCA Management Service Guide
This guide provides instructions on how to use the DOCA Management Service on top of NVIDIA® BlueField® networking platform or NVIDIA® ConnectX® SmartNICs.
DOCA DMS service is currently supported at alpha level.
DOCA Management Service (DMS) is a one-stop shop for the user to configure and operate NVIDIA BlueField and ConnectX devices. DMS governs all scripts/tools of NVIDIA with an easy and industry-standard API created by the OpenConfig community. The user can configure BlueField or ConnectX for any mode whether locally (
ssh) or remotely (
grpc). It makes it easy to migrate and bootstrap any customer for any NVIDIA network device.
DMS exposes configurable BlueField/ConnectX parameters over the external interface to support a management station in an automated configuration of the NVIDIA Network Adapters. The exposed interface presents a uniform approach for BF/CX device configuration and keeps hidden details about the internal tools used for the configuration of BlueField or ConnectX features.
The DMS is a Client-Server architecture. Using a daemon, the service handles the discovery of resources, and is ready to receive commands from clients, the user can use DMSc (DMS Client) which delivers as part of the DMS, or use/create any other client.
Please refer to the OpenConfig site for an explanation of the OpenConfig protocol.
The Yang models describe a config tree which is easy to navigate and find any "config leaf" using XPath capabilities. Most gNMI/gNOI protocols are common with the OpenConfig community, utilizing gRPC protocol for transferring the command.
The DOCA Yang model is experimental.
The gNMI Subscribe mechanism for streaming telemetry is not currently supported yet.
DMS can run either on the host machine where BlueField or ConnectX devices are installed or on BlueField Arm itself (when BlueField is operating in DPU mode).
DMS requires DOCA to be installed on the target system, where DMS Service will be running:
DMS for Host - requires DOCA for Host package to be installed on the host system (with doca-networking or doca-all profiles).
DMS for DPU (BlueField Arm) - requires DOCA Image to be installed on BlueField Arm.
Please follow these instructions to install DOCA: DOCA Installation Guide for Linux.
DMS supports only Linux-based environments today.
DMS has 3 major components:
DMSD – Server – DMS server inside the BlueField or on the host with an NVIDIA PCIe device
DMSC – Client – DOCA provides OpenConfig client. Customers can choose to use this client, any other open-source client, or develop their own (gRPC-based) client.
Yang files – Yang model files contain the data model used to configure the BlueField device, NVIDIA-specific extension to common OpenConfig YANG Models.
OpenConfig consists of 2 main protocols:
gNMI – gRPC Network Management Interface, protocol to configure of network device.
gNOI – gRPC Network Operations Interface, a protocol to perform operational commands on network device (i.e., provision, upgrade, reboot).
The following is an architectural diagram of DMS:
The following diagram presents the DMS mode of operation, as the DMS client can operate from anywhere:
Both DMS client and server components are deployed on the Host
Both DMS client and server components are deployed on DPU (BlueField Arm)
DMS server component is deployed on the Host, while DMS client is deployed remotely (connecting to DMS server over management network)
DMS server component is deployed on DPU (BlueField Arm), while DMS client is deployed remotely (connecting to DMS server over management network)
DMSD is a systemd service installed on the DPU by default with the BFB-Bundle and can be enable/disabled using
systemctl. DMSD can be accessed using the command
dmscli and provided the
dmsd user password (default is the root OS password). A systemd template is provided on host packages.
To see the full list of flags, user the help flag (i.e.,
dmsd -help,
dmsd -h).
General Flags
-bind_address <string>– Bind to
<address>:<port>or just
:<port>(default is
:9339). Can be localhost for local use case, or an IP address for remote use case.
-v <value>– log level for V logs
-target_pci <string>– The target PCIe address (i.e.,
03:00). Auto-select if only one NVIDIA network device is present; otherwise, the PCIe address must be specified.
Provisioning Flags
-target_pci <string>– The target PCIe address (i.e.,
03:00). Auto-select if only one NVIDIA network device is present; otherwise, the PCIe address must be specified.
-image_folder <string>– Specify image install folder. Can copy images directly to the folder to avoid transfer over the net. Default create folder:
/tmp/dms.
-chunk_size_ack <uint>– The chunk size of the image to respond with a transfer response in bytes (default: 12000000)
-exec_timeout <uint>– The maximum execution timeout in seconds for a command if not responding (not printing to
stdout); 0 (default) is unlimited
Config Flags
-init_config <string>– File containing gNMI requests to run on DMS start. By default, DMS adds any gNMI set request to a file (file format is req.json).
-image_folder <string>– Do not record the gNMI set requests while running, do not change init config file.
Security Flags
-auth string – this flag has 3 options:
Shadow
Zero-touch, admin not required to create any dedicated additional user for DMS (re-use OS user)
Read the hashed password in real time on each client request
Use flags
-username -shadow
Example:
-username root -shadow /etc/shadow/
To disable:
-noauth flag
Credentials
Admin must set a strong password
Use flags
-username -password
Example:
-username root -password 123456
To disable:
-noauth flag
Can leave password flag empty to invoke prompt for password at demon boot
Certificate File
The most secure option, based on (m)TLS
Example:
-ca /tmp/ca.crt -ca_key /tmp/ca.key
To disable:
-notls option
DMS Traffic Encryption
The connection is secured by transport layer security (TLS) by default.
TLS is a protocol that provides encryption and authentication for network communication.
DMS supports TLS for both gNMI and gNOI interfaces, using self-signed or CA-signed certificates.
To run secure communication, the server/client need to provide the following:
-ca– path to CA certificate
-tls_key_file– path to TLS private key
-tls_cert_file– path to TLS public key
To achieve TLS communication (server authentication only), configure:
Server –
dmsd -ca /tmp/ca.crt -tls_key_file /tmp/target.key -tls_cert_file /tmp/target.key
Client – use the flag
--skip-verifyto skip client authentication
To achieve mTLS communication (server and client authentication), configure:
Server –
dmsd -ca /tmp/ca.crt -tls_key_file /tmp/target.key -tls_cert_file /tmp/target.key
Client –
dmsc --tls-ca /tmp/ca.crt --tls-key /tmp/client.key --tls-cert /tmp/client.crt
To achieve insecure communication (no encryption), configure:
Server – use
-tls_enabled=falseparam
Client – use
--insecureflag
gNMI Command
In DMSC, the gNMI part is powered by the GNMIC project.
For more information, please refer to GNMIC documentation.
dmsc -a localhost:9339 -u root -p <password> --
file /opt/mellanox/doca/service/dms/yang <
command>
Prompt mode with autocomplete options can be invoked using the command
prompt. It can be accessed using the command
dmscli and provided the
dmsd user password (default is the root OS password).
Get Supported Paths
dmsc --
file /opt/mellanox/doca/service/dms/yang path --types --descr
/interfaces/interface[name=*]/config/enabled (
type=boolean)
This leaf contains the configured, desired state of the
interface.
Systems that implement the IF-MIB use the value of this
leaf
in the
'running' datastore to
set
IF-MIB.ifAdminStatus to
'up' or
'down' after an ifEntry
has been initialized, as described
in RFC 2863.
Changes
in this leaf
in the
'running' datastore are
reflected
in ifAdminStatus, but
if ifAdminStatus is
changed over SNMP, this leaf is not affected.
/interfaces/interface[name=*]/config/mtu (
type=uint16)
Set the max transmission unit size
in octets
for the physical interface. If this is not
set, the mtu is
set to the operational default -- e.g., 1514 bytes on an
Ethernet interface.
/interfaces/interface[name=*]/config/type (
type=identityref)
The
type of the interface.
When an interface entry is created, a server MAY
initialize the
type leaf with a valid value, e.g.,
if it
is possible to derive the
type from the name of the
interface.
If a client tries to
set the
type of an interface to a
value that can never be used by the system, e.g.,
if the
type is not supported or
if the
type does not match the
name of the interface, the server MUST reject the request.
A NETCONF server MUST reply with an rpc-error with the
error-tag
'invalid-value'
in this
case.
/interfaces/interface[name=*]/ethernet/nvidia/config/inter-packet-gap (
type=uint8)
Inter packet gap configuration,
in 4B unit
/interfaces/interface[name=*]/ethernet/nvidia/config/rate-limit (
type=uint16)
The percentage of bandwidth,
in permile
units, to be used on the port.
/interfaces/interface[name=*]/name (
type=leafref)
References the name of the interface
/interfaces/interface[name=*]/nvidia/cc/config/priority[
id=*]/id (
type=leafref)
/interfaces/interface[name=*]/nvidia/cc/config/priority[
id=*]/np_enabled (
type=boolean)
Enable CC NP
for a given priority on the interface
/interfaces/interface[name=*]/nvidia/cc/config/priority[
id=*]/rp_enabled (
type=boolean)
Enable CC RP
for a given priority on the interface
/interfaces/interface[name=*]/nvidia/cc/slot[
id=*]/config/enabled (
type=boolean)
Enable a CC algo slot execution.
/interfaces/interface[name=*]/nvidia/cc/slot[
id=*]/id (
type=leafref)
CC algo slot ID.
/interfaces/interface[name=*]/nvidia/cc/slot[
id=*]/param[
id=*]/config/value (
type=algo_param_value)
Parameter value within the CC algo slot.
/interfaces/interface[name=*]/nvidia/cc/slot[
id=*]/param[
id=*]/id (
type=leafref)
Parameter ID within the CC algo slot.
/interfaces/interface[name=*]/nvidia/qos/config/pfc (
type=boolean)
Enables PFC
/interfaces/interface[name=*]/nvidia/qos/config/priority[
id=*]/id (
type=prio)
Priority
id.
/interfaces/interface[name=*]/nvidia/qos/config/trust-mode (
type=identityref)
Trust mode
for the interface QoS.
/interfaces/interface[name=*]/nvidia/roce/config/adaptive-retransmission (
type=boolean)
Enable adaptive retransmission
/interfaces/interface[name=*]/nvidia/roce/config/adaptive-routing-force (
type=boolean)
Force adaptive routing even
if feature was not negotiated between a requestor and responder.
/interfaces/interface[name=*]/nvidia/roce/config/rtt-resp-dscp (
type=uint8)
Defines the DSCP fixed value used
if mode is
set to FIXED.
/interfaces/interface[name=*]/nvidia/roce/config/rtt-resp-dscp-mode (
type=identityref)
Defines the method
for setting DSCP
in RTT response packets.
/interfaces/interface[name=*]/nvidia/roce/config/slow-restart (
type=boolean)
Enable slow restart when congestion
/interfaces/interface[name=*]/nvidia/roce/config/slow-restart-idle (
type=boolean)
Enable slow restart when idle
/interfaces/interface[name=*]/nvidia/roce/config/tos (
type=tos)
ToS value
for RoCE traffic.
/interfaces/interface[name=*]/nvidia/roce/config/tx-window (
type=boolean)
Enable transmission window
/nvidia/cc/config/user-programmable (
type=boolean)
Enables user-programmable CC functionality.
/nvidia/mode/config/mode (
type=identityref)
Mode can take one one of several predefined
values representing operational modes of DPU.
/nvidia/roce/config/adaptive-routing (
type=boolean)
Enable adaptive routing between a requestor and responder.
/nvidia/roce/config/multipath-dscp (
type=identityref)
Multipath on transmit,
set the DSCP bit to hold the MP eligible info
/nvidia/roce/config/tx-sched-locality-mode (
type=identityref)
Transmission scheduler adaptation to locality
The following is a list of values for
identityref type paths:
/interfaces/interface[name=*]/config/type (
type=identityref)
"infiniband",
"ethernetCsmacd"
/interfaces/interface[name=*]/nvidia/qos/config/trust-mode (
type=identityref)
"QOS_TRUST_MODE_PORT",
"QOS_TRUST_MODE_PCP",
"QOS_TRUST_MODE_DSCP"
/interfaces/interface[name=*]/nvidia/roce/config/rtt-resp-dscp-mode (
type=identityref)
"RTT_RESP_DSCP_DEFAULT",
"RTT_RESP_DSCP_FIXED",
"RTT_RESP_DSCP_RTT_REQUEST"
/nvidia/mode/config/mode (
type=identityref)
"SEPARATE",
"DPU",
"NIC"
/nvidia/roce/config/multipath-dscp (
type=identityref)
"MULTIPATH_DSCP_DISABLED",
"MULTIPATH_DSCP_0",
"MULTIPATH_DSCP_1",
"MULTIPATH_DSCP_2"
/nvidia/roce/config/tx-sched-locality-mode (
type=identityref)
"TX_SCHED_LOCALITY_DEFAULT",
"TX_SCHED_LOCALITY_STATIC",
"TX_SCHED_LOCALITY_ACCUMULATIVE",
"TX_SCHED_LOCALITY_DISABLED"
Get Request
Get requests happen in real-time without cache. Get command require providing the Yang Xpath as described in the following:
dmsc <flags> get --path /interfaces/interface[name=p0]/config/mtu
[
{
"source":
"localhost:9339",
"timestamp": 1712485149723248511,
"time":
"2024-04-07T10:19:09.723248511Z",
"updates": [
{
"Path":
"interfaces/interface[name=p0]/config/mtu",
"values": {
"interfaces/interface/config/mtu":
"1500"
}
}
]
}
]
To insert params in the path, as an indication of the interface name (p0).
Get request also work on subtree (subPath) as follow:
dmsc <flags> get --path /nvidia/roce
[
{
"source":
"127.0.0.1:9339",
"timestamp": 1728471432988295603,
"time":
"2024-10-09T13:57:12.988295603+03:00",
"updates": [
{
"Path":
"nvidia/roce",
"values": {
"nvidia/roce": {
"config": {
"adaptive-routing":
"false",
"multipath-dscp":
"MULTIPATH_DSCP_DEFAULT",
"tx-sched-locality-mode":
"TX_SCHED_LOCALITY_ACCUMULATIVE"
}
}
}
}
]
}
]
Failing to provide a mandatory param for decoding a leaf leads to that leaf being skipped. The entire request fails when the first leaf fails.
Set Request
Set requests happen immediately, invoking tools to configure the OS.
Set commands require providing Yang Xpath as described in the following:
dmsc <flags>
set --update /interfaces/interface[name=p0]/config/mtu:::int:::9216
{
"source":
"localhost:9339",
"time":
"1970-01-01T00:00:00Z",
"results": [
{
"operation":
"UPDATE",
"path":
"interfaces/interface[name=p0]/config/mtu"
}
]
}
To insert params in the path, as an indication of the interface name (p0).
The value provided must be separated by value type and char.
Currently, only the
--update flag is supported in set.
The updates of some leafs only take effect after system reboot. Refer to gNOI system reboot for information.
It is also possible to invoke a command JSON list:
dmsc <flags>
set --request-
file req.json
req.json example:
{
"updates":
[
{
"path":
"/interfaces/interface[name=p0]/config/mtu",
"value":
9216,
"encoding":
"uint"
},
{
"path":
"/interfaces/interface[name=p0]/config/enabled",
"value":
true,
"encoding":
"bool"
}
]
}
gNOI Commands
In DMSc, the gNOI part is powered by GNOIC project, for full docs refer to GNOIC docs.
dmsc -a localhost --port 9339 --tls-cert client.crt --tls-key client.key <
command>
Prompt mode with autocomplete options can be invoked using the command
prompt.
All commands are blocking unless specified otherwise.
Currently, gNOI commands are only supported on the host (not the BlueField).
OS
The following subsections present actions for provisioning a new DOCA image (BFB) or firmware on BlueField.
Install
This command transmits the file from the client to the server and authenticates the file's validity:
dmsc <flags> os
install --version <free_text_version> --pkg <bfb|cfg|fw path>
dmsc <flags> os
install --version 2_9_0 --pkg DOCA_2.9.0_Ubuntu.bfb
dmsc <flags> os
install --version 2_9_0 --pkg config.cfg
dmsc <flags> os
install --version 1_3_5_custom.bfb --pkg custom.bfb
The file is saved to the folder specified in the
-image_folder flag (default
/tmp/dms) if the file authenticates successfully. The file's extension is autodetected and is written automatically if
none is provided in the
--version field. Users may copy the file to the folder manually and invoke the command with file extension to authenticate the file. No file transfer is initiated if the file already exists in the folder and the version specified with the extension.
Activate
Activate the command deploy the BFB bundle/firmware to the hardware:
dmsc <flags> os activate --version 2_9_0
# Invoke all files under 2_9_0 name
dmsc <flags> os activate --version
"2_9_0.bfb;0_0_1.cfg;24_29_0046.fw"
The
--version flag provides a version to search for in the folder specified by the
-image_folder flag (default
/tmp/dms). If no extension is provided, the command uses all files under the version name.
To activate separate files, use the
--version flag separated by semi-colon.
After running the command to activate firmware, firmware reset is automatically invoked.
Verify
Verify command retrieves the firmware and BFB bundle version:
dmsc <flags> os verify
The return value consists of both versions separated by semi-colon.
Currently, the BFB bundle can only be retrieved if it was installed via DMS.
System
The following subsections provide actions for rebooting the BFB bundle/firmware on the BlueField.
Reboot Status
To verify BFB is rebooting:
dmsc <flags> system reboot-status
The value returned is
false if the system is active. It is
true if the system is rebooting. If the status cannot be retrieved, the status appears as a failure and the message field indicates what the issue is.
The flag
--reboot_status_check <string> checks if firmware reboot is needed:
If set to
fast(default), a quick test occurs but not accurate (any config can trigger this flag)
If set to
strict, a more accurate test occurs but slower
If set to
none, then firmware check is skipped
Reboot
To reboot the BlueField Arm and firmware:
dmsc <flags> system reboot --delay <uint>s --subcomponent <string> --method <string>
This command is non-blocking and returns immediately.
The flag
--delay specifies the time interval to wait before invoking the reset.
The subcomponent and method are optional. By default, the reboot executes with the lowest reset level and type available.
Currently, DMS supports
--subcomponent ARM --method <WARM|POWERDOWN> flags.