NVIDIA DOCA PSP Gateway Application Guide
This document describes the usage of the NVIDIA DOCA PSP Gateway sample application on top of an NVIDIA® BlueField® networking platform or NVIDIA® ConnectX® SmartNIC.
DOCA PSP Gateway is supported at alpha level.
The DOCA PSP Gateway application leverages the BlueField or ConnectX hardware capability for fully offloaded secure network communication using the PSP security protocol. The application demonstrates how to exchange keys between application instances and insert rules controlling PSP encryption and decryption using the DOCA Flow library.
The application exchanges keys using an unencrypted gRPC channel. If your environment requires the protection of encryption keys, you must modify the application to create the gRPC channel using the applicable certificates.
The PSP Gateway application supports only the PSP tunnel protocol. The PSP transport protocol is not supported by the application in this release, although it is supported by the underlying DOCA Flow library.
The PSP Gateway application supports only IPv4 inner and IPv6 outer headers. Other combinations are not supported by the application in the current release, although they are supported by the underlying DOCA Flow library.
The application can be configured to establish out-bound PSP tunnel connections via individual command-line arguments, or via a text file configured via a command-line argument. The connections are established on-demand by default, but can also be configured to connect at startup.
The DOCA PSP Gateway is designed to run with three ports:
A secure (encrypted) uplink netdev (i.e., p0)
An unsecured (plaintext) netdev representor (VF or SF)
An out-of-bound (OOB) management port, used to communicate with peer instances using standard sockets
Whether the DOCA PSP Gateway is deployed to a BlueField DPU or a ConnectX NIC, the functionality is the same. The Out of Bounds (OOB) network device carries PSP parameters between peers, the Uplink port carries secure (encrypted) traffic, and the VF carries the unencrypted traffic.
When the application is deployed to a DPU, the operation of the PSP encryption protocol is entirely transparent to the Host. All the resources required to manage the PSP connections are physically located on the DPU.
When the application is deployed to the host, the operation of the PSP encryption protocol is the responsibility of the host, and resources are allocated from the host. However, the operation of the PSP encryption protocol is entirely transparent to any virtual machines and containers attached to the VF network devices.
The creation of PSP tunnel connections requires two-way communication between peers. Each "sender" must request a unique security parameters index (SPI) and encryption key from the intended "receiver". The receiver derives sequential SPIs and encryption keys using the hardware resources inside the BlueField or ConnectX device, which manages a secret pair of master keys to produce the SPIs and encryption keys.
One key architectural benefit of PSP over similar protocols (e.g., IPsec) is that the receiver does not incur any additional resource utilization whenever it creates a new SPI and encryption key. This is because the decryption key associated with the SPI is computed on the fly, based on the SPI and master key, for each received packet. This lack of requirement for additional context memory for each additional decryption rule is partly responsible for the ability of the PSP protocol to scale to many thousands of peers.
Startup vs. On-Demand Tunnel Creation
The default mode of operation is on-demand tunnel creation. That is, when a packet is received from the unsecured port for which the flow pipeline does not have an encryption rule, the packet misses to RSS, where the CPU must decide how to handle the packet. If the destination IP address in the packet belongs to a known peer’s virtual network, the CPU uses gRPC on the OOB network connection to attempt a key exchange with the peer. If the key exchange is successful and a new encryption flow is created successfully, then the packet is then resubmitted to the pipeline, where it is encrypted and sent just as any of the following packets having the same destination IP address.
The following diagram illustrates this sequence (the "Slow Path"), for Virtual Machine V1 which intends to send a packet to Virtual Machine V2. In this case, V1 is hosted on physical host H1 and V2 on physical host H2. The first packet sent (1) results in a miss (2), so the packet is retained (3) while the keys are exchanged in both directions (4-8). Then the pipeline is updated (9) and the original packet is resubmitted (10). From there, the packet follows the same logic as the fast path, below.
Once the tunnel is established, and packets received from the VF (1) match a rule (2) and are encrypted and sent (3-4) without any intervention from the CPU ("Fast Path").
In the case of on-startup tunnel creation, the application's main thread repeatedly attempts to perform the key exchange for each of the peers specified on the command line until the list is completed. Each peer is connected only once and, if a connection to one peer fails, the loop continues onto the next peer and retries the failed connection after all the others have been attempted.
Sampling
The PSP gateway application supports the sample-at-receiver (S) bit in the PSP header. If sampling is enabled, then packets marked with the S bit are mirrored to the RSS queues and logged to the console. In addition, on transmit, a random subset of packets (1 out of 2^N for command-line parameter N) will have the S bit set to 1, and those packets will also be mirrored to RSS.
Sampling packets on transmit is currently supported only following encryption. Sampling of egress packets before encryption will be supported in a future release.
Pipelines
Host-to-Network Flows
Traffic sent from the local, unsecured port (host-to-net) without sampling enabled travels through the pipeline as shown in the diagrams that follow. Note that the Ingress Root Pipe is the first destination for packets arriving from either the VF or the secured uplink port. However, the Egress ACL pipe is the first destination for packets sent via tx_burst on the PF (in the switch model's expert mode).
The Empty Pipe is a vestigial transition from the Default Domain, in which the Ingress Root Pipe is created, to the Secure Egress Domain, where the Egress ACL pipe performs encryption.
This pipe may be removed in a future release.
If sampling is enabled, the host-to-net pipeline is modified as shown in the following:
Here, an Egress Sampling Pipe is added between the Egress ACL Pipe and the Secured Port. It performs a match of the random metadata, masked according to command-line parameters, and then:
On match, the following actions occur:
Packet modifications:
The S bit in the PSP header is set to true.
The pkt_meta field is set to a sentinel value to indicate to CPU software why the packet was sent to RSS.
The original packet is forwarded to RSS.
The mirror action forwards the packet to the secured port.
On miss, the following actions occur:
No packet modifications are made.
The packet is forwarded to a vestigial pipe which can then forward the packet to the wire.
InfoA fwd_miss cannot target a port.
NoteThis pipe may be removed in a future release.
Network-to-Host Flows
When a packet arrives from the secured port, the following flows are executed.
As before, the Ingress Root Pipe is the first destination and, here, the secured port ID as well as IPv6 outer L3 type are matched for. Matching packets flow to the decryption pipe, which matches the outer UDP port number against 1000, the constant specified in the PSP specification. On match, the packet is decrypted, but not yet de-capped. Then the Ingress ACL pipe checks the following:
PSP_Syndrome – did the packet decrypt correctly and pass its ICV check?
PSP SPI and inner IP src address – was this packet encrypted with the key associated with the given source?
If the packet passes the syndrome and ACL check, it is forwarded to the VF. Otherwise, the Syndrome Stats pipe counts the occurrences of the different bits in the PSP Syndrome word.
When sampling is enabled, the Ingress Sampling Pipe is inserted before the ACL. Unlike the Egress Sampling Pipe, no randomness is involved; the match criteria is the sample-on-receive flag in the PSP header. On a match, the incoming packet are mirrored to RSS with pkt_meta indicating the reason for forwarding the packet to RSS. On match or miss, the next pipe is the Ingress ACL Pipe.
DOCA Libraries
This application leverages the following DOCA libraries:
Refer to their respective programming guide for more information.
Please refer to the NVIDIA DOCA Installation Guide for Linux for details on how to install BlueField-related software.
The installation of DOCA's reference applications contains the sources of the applications, alongside the matching compilation instructions. This allows for compiling the applications "as-is" and provides the ability to modify the sources, then compile a new version of the application.
For more information about the applications as well as development and compilation tips, refer to the DOCA Applications page.
The sources of the application can be found under the application's directory: /opt/mellanox/doca/applications/psp_gateway/.
Compiling All Applications
All DOCA applications are defined under a single meson project. So, by default, the compilation includes all of them.
To build all the applications together, run:
cd
/opt/mellanox/doca/applications/
meson /tmp/build
ninja -C /tmp/build
doca_psp_gateway is created under /tmp/build/psp_gateway/.
Compiling Only the Current Application
To directly build only the PSP Gateway application:
cd
/opt/mellanox/doca/applications/
meson /tmp/build -Denable_all_applications=false
-Denable_psp_gateway=true
ninja -C /tmp/build
doca_psp_gateway is created under /tmp/build/psp_gateway/.
Alternatively, users can set the desired flags in the meson_options.txt file instead of providing them in the compilation command line:
Edit the following flags in /opt/mellanox/doca/applications/meson_options.txt:
Set enable_all_applications to false
Set enable_psp_gateway to true
Run the following compilation commands :
cd
/opt/mellanox/doca/applications/ meson /tmp/build ninja -C /tmp/buildInfodoca_psp_gateway is created under /tmp/build/psp_gateway/.
Troubleshooting
Refer to the NVIDIA DOCA Troubleshooting Guide for any issue encountered with the compilation of the application .
Prerequisites
The PSP gateway application is based on DOCA Flow. Therefore, the user is required to allocate huge pages:
echo
'2048'
| sudo
tee
-a /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
On some OSs (e.g., RockyLinux, OpenEuler, CentOS 8.2), the default huge page size on the BlueField (and Arm hosts) is larger than 2MB, and is often 512MB instead. The user can find out the size of the huge pages using the following command:
$ grep
-i huge /proc/meminfo
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
FileHugePages: 0 kB
HugePages_Total: 4
HugePages_Free: 4
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 524288 kB
Hugetlb: 6291456 kB
Given that the guiding principle is to allocate 4GB of RAM, in such cases instead of allocating 2048 pages, the user should allocate the matching amount (8 pages):
echo
'8'
| sudo
tee
-a /sys/kernel/mm/hugepages/hugepages-524288kB/nr_hugepages
Application Execution
The PSP Gateway application is provided in source form. Therefore, a compilation is required before the application can be executed.
Application usage instructions:
Usage: doca_psp_gateway [DOCA Flags] [Program Flags] DOCA Flags: -h, --help Print a help synopsis -
v
, --version Print program version information -l, --log-level Set the (numeric) log levelfor
the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE> --sdk-log-level Set the SDK (numeric) log levelfor
the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE> -j, --json <path> Parse allcommand
flags from an input jsonfile
Program Flags: -p, --pci-addr PCI BDF of the devicein
BB:DD.Fformat
-r, --repr Device representor listin
vf[x-y]pf[x-y]format
-m, --core-mask EAL Core Mask -d, --decap-dmac mac_dst addr of the decapped packets -n, --nexthop-dmac next-hop mac_dst addr of the encapped packets -s, --svc-addr Service address of locally running gRPC server; port number optional -t, --tunnel Remote host tunnel(s), formatted'mac-addr,phys-ip,virt-ip'
-f, --tunnels-file
Specifies the location of the tunnels-file
. Format: rpc-addr:virt-addr,virt-addr,... -c, --cookie Enable use of PSP virtualization cookies -a, --disable-ingress-acl Allows any ingress packet that successfully decrypts -, --sample-rate Sets the log2 sample rate: 0: disabled, 1: 50%, ... 16: 1.5e-3% -x, --max-tunnels Specify the max number of PSP tunnels -o, --crypt-offset Specify the PSP crypt offset --psp-version Specify the PSP versionfor
outgoing connections (0 or 1) -z, --static-tunnels Create tunnels at startup using the givenlocal
IP addr -k, --debug-keys Enable debug keysThis usage printout can be printed to the command line using the -h (or --help) options:
./doca_psp_gateway -h
InfoFor additional information, refer to section "Command Line Flags".
CLI example for running the application on the BlueField or host:
./doca_psp_gateway -p 03:00.0 -r vf0 -d 11:22:33:44:55:66 -t 10.1.1.55:192.168.1.55
The PCIe address identifier ( -p flag) should match the addresses of the desired PCIe device
The -d flag indicates the MAC address that should be applied to incoming packets upon decap. It should match the MAC address of the virtual function specified by the -r argument.
The -t flag indicates the mapping of the virtual IP address 192.168.x.y to an out-of-bounds network address 10.1.1.55
The application also supports a JSON-based deployment mode, in which all command-line arguments are provided through a JSON file:
./doca_psp_gateway --json [json_file]
For example:
./doca_psp_gateway --json psp_gateway_params.json
NoteBefore execution, ensure that the used JSON file contains the correct configuration parameters, and especially the PCIe addresses necessary for the deployment.
Command Line Flags
Flag Type |
Short Flag |
Long Flag/JSON Key |
Description |
JSON Content |
General flags |
h |
help |
Prints a help synopsis |
N/A |
v |
version |
Prints program version information |
N/A |
|
l |
log-level |
Set the log level for the application:
|
|
|
N/A |
sdk-log-level |
Sets the log level for the program:
|
|
|
j |
json |
Parse all command flags from an input JSON file |
N/A |
|
Program flags |
p |
pci-addr |
PCIe BDF of the device in BB:DD.F format |
|
r |
repr |
Device representor list in vf[x-y]pf[x-y] format |
|
|
m |
core-mask |
EAL core mask |
|
|
d |
decap-dmac |
mac_dst address of the decapped packets |
|
|
n |
nexthop-dmac |
Next-hop mac_dst address of the encapped packets |
|
|
s |
svc-addr |
Service address of locally running gRPC server; port number optional |
|
|
t |
tunnel |
Remote host tunnel(s), formatted rpc-addr:virt-addr |
|
|
f |
tunnels-file |
Specifies the location of the tunnels-file. Format: rpc-addr:virt-addr,virt-addr,... |
|
|
c |
cookie |
Enable use of PSP virtualization cookies |
|
|
a |
disable-ingress-acl |
Allows any ingress packet that successfully decrypts |
|
|
N/A |
sample-rate |
Sets the log2 sample rate:
|
|
|
x |
max-tunnels |
Specify the max number of PSP tunnels |
|
|
o |
crypt-offset |
Specify the PSP crypt offset |
|
|
N/A |
psp-version |
Specify the PSP version for outgoing connections (0 or 1) |
|
|
z |
static-tunnels |
Create tunnels at startup using the given local IP address |
|
|
k |
debug-keys |
Enable debug keys |
|
Refer to DOCA Arg Parser for more information regarding the supported flags and execution modes.
Tunnel Mappings File
A text file which maps an OOB network address to a list of virtual IP addresses behind that physical address can be specified on the command line. The format is as follows:
# (Comments are allowed)
# Format:
# svc-oob-ip-addr:virt-addr,virt-addr,...
# Specify a service address of 10.1
.1.55
which hosts virtual addresses 192.168
.1.101
and others.
10.1
.1.55
:192.168
.1.101
,192.168
.1.102
,192.168
.1.103
,192.168
.1.104
# Specify a service address of 10.1
.1.56
which hosts virtual addresses 192.168
.1.201
and others.
10.1
.1.56
:192.168
.1.201
,192.168
.1.202
,192.168
.1.203
,192.168
.1.204
When a packet from the VF does not match any existing flows, this table defines the physical host which should provide the tunnel to the given (virtual) destination.
Troubleshooting
Refer to the NVIDIA DOCA Troubleshooting Guide for any issue encountered with the installation or execution of the DOCA applications .
Main loop code flow
Initialize the logger facility.
The standard logger and the SDK logger are created, and the SDK logger default log level is selected.
doca_log_backend_create_standard(); doca_log_backend_create_with_file_sdk(stdout, &sdk_log); doca_log_backend_set_sdk_level(sdk_log, DOCA_LOG_LEVEL_WARNING);
The signal handler is connected to enable a clean shutdown.
signal
(SIGINT, signal_handler);signal
(SIGTERM, signal_handler);
Parse application arguments. The main function invokes psp_gw_argp_exec(), which initializes the arg parser resources and registers DOCA general parameters, and then registers the PSP application-specific parameters. Then the parser is invoked.
doca_argp_init(); psp_gw_register_params(); doca_argp_start();
DPDK initialization. Call rte_eal_init() to initialize EAL resources with the provided EAL flags for not probing the ports (-a00:0.0).
rte_eal_init(n_eal_args, (char **)eal_args);
Initialize devices and ports.
Open DOCA devices with input PCIe addresses/interface names.
Probe DPDK port from each opened device.
open_doca_device_with_pci(...); // not part of doca_flow; see doca/samples/common.c doca_dpdk_port_probe(...);
The MAC and IP addresses of the PF are queried and logged.
rte_eth_macaddr_get(...); doca_devinfo_get_ipv6_addr(...); DOCA_LOG_INFO(
"Port %d: Detected PF mac addr: %s, IPv6 addr: %s, total ports: %d"
, ...);
Initialize and start DPDK ports. Initialize DPDK ports, including mempool allocation. No hairpin queues are created.
dpdk_queues_and_ports_init(); // not part of doca_flow; see doca/applications/common/dpdk_utils.c
Initialize DOCA Flow objects used by the PSP Gateway application. The DOCA Flow library is initialized with the string "switch,hws,isolated,expert", because it is desirable for the application to act as an intermediary between the uplink physical port and some number of VF representors (switch mode), and hws (hardware steering mode) and isolated mode are mandatory for switch mode. The optional expert flag prevents DOCA Flow from automating certain packet operations and gives more control to the application, as described in the DOCA Flow page.
PSP_GatewayFlows psp_flows(&pf_dev, vf_port_id, &app_config); psp_flows.init();
Initialize DOCA Flow library.
Start the ports.
Allocate shared resources (PSP crypto objects and Mirror actions).
Create the ingress and egress pipes.
Create the gRPC service.
PSP_GatewayImpl psp_svc(&app_config, &psp_flows);
Launch the L-Core threads to handle RSS packets.
rte_eal_remote_launch(lcore_pkt_proc_func, &lcore_params, lcore_id);
Launch the gRPC service.
This implementation uses InsecureServerCredentials. Update as needed.
grpc::ServerBuilder builder; builder.AddListeningPort(server_address, grpc::InsecureServerCredentials()); builder.RegisterService(&psp_svc); auto server_instance = builder.BuildAndStart();
Wait for traffic. If configured to connect at startup, process the list of remaining connections. Then display the flow pipe counters.
while
(!force_quit) { psp_svc.try_connect(remotes_to_connect, local_vf_addr); ... psp_flows.show_static_flow_counts(); psp_svc.show_flow_counts(); }Wait in a loop until the user terminates the program.
PSP Gateway cleanup:
Destroy DPDK ports and queues.
dpdk_queues_and_ports_fini();
DPDK finish.
dpdk_fini();
Calls rte_eal_destroy() to destroy initialized EAL resources.
Arg parser destroy.
doca_argp_destroy()
Miss-packet code flow.
The L-Core launch routine from the main loop pointed to the lcore_pkt_proc_func routine.
The force_quit flag is polled to respond to the signal handler.
while
(!*params->force_quit) { ... }The rte_eth_rx_burst function polls the PF queue for received packets.
nb_rx_packets = rte_eth_rx_burst(port_id, queue_id, rx_packets, MAX_RX_BURST_SIZE);
Inside handle_packet(), the packet metadata is inspected to detect whether this packet is sampled on ingress, sampled on egress, or a miss packet.
uint32_t pkt_meta = rte_flow_dynf_metadata_get(packet);
Sampled packets are simply logged using the rte_pktmbuf_dump function.
Miss packets are passed to the handle_miss_packet method of the gRPC service. This method handles cases where an application attached to the VF wishes to send a packet to another virtual address, but a PSP tunnel must first be established by exchanging SPI and key information between hosts.
The service acts as a gRPC client, and the appropriate server is looked up from the config->net_config.hosts vector, which is comprised of hosts passed via the -t tunnels arguments or the -f tunnels file argument.
Once the client connection exists, the request_tunnel_to_host method takes care of invoking the the RequestTunnelParams operation defined in the schema.
Optionally, this function generates a corresponding set of tunnel parameters appropriate for the server host to send traffic back via generate_tunnel_params().
doca_flow_crypto_psp_spi_key_bulk_generate(bulk_key_gen); doca_flow_crypto_psp_spi_key_bulk_get(bulk_key_gen,
0
, &spi, key); doca_flow_crypto_psp_spi_key_wipe(bulk_key_gen,0
);
The RPC operation is invoked, and if successful, create_tunnel_flow is called to create the egress flow:
status = stub->RequestTunnelParams(&context, request, &response);
The create_tunnel_flow method translates the resulting Protobuf objects to application-specific data structures and passes them to the add_encrypt_entry method of the flows object. Here, the PSP SPI and key are programmed into an available crypto_id index as follows.
NoteSPI and crypto_id are two independent concepts:
The SPI value in the PSP packet header indicates to the receiver which key was used by the sender to encrypt the data. Each receiver computes an SPI and key to provide to a sender. Since each receiver is responsible for tracking its next SPI, multiple receivers may provide the same SPI to a sender, so one sender may send the same SPI to multiple different peers. This is allowed, as each of the receiving peers has its own decryption key to handle that SPI.
The crypto_id acts as an index into the bucket of PSP keys allocated by DOCA Flow. The doca_flow_shared_resource_cfg() function writes a given PSP encryption key to a given slot in the bucket of keys in NIC memory. These slots can be overwritten as needed by the application.
There is no explicit association between crypto_id and SPI. The doca_flow_shared_resource_cfg() function writes a key at the slot provided by the crypto_id argument, then the flow pipe entry actions.crypto.crypto_id references this key, and actions.crypto_encap.encap_data includes a PSP header with the desired SPI.
struct doca_flow_shared_resource_cfg res_cfg = {}; res_cfg.domain = DOCA_FLOW_PIPE_DOMAIN_SECURE_EGRESS; res_cfg.psp_cfg.key_cfg.key_type = DOCA_FLOW_CRYPTO_KEY_256; res_cfg.psp_cfg.key_cfg.key = (uint32_t *)encrypt_key; doca_flow_shared_resource_cfg(DOCA_FLOW_SHARED_RESOURCE_PSP, session->crypto_id, &res_cfg);
A flow pipe entry which references the newly programmed PSP encryption key (via its index crypto.crypto_id) must be inserted. Additionally, this pipe entry must specify all the outer Ethernet, IP, UDP, and PSP header fields to insert.
format_encap_data(session, actions.crypto_encap.encap_data); actions.crypto.action_type = DOCA_FLOW_CRYPTO_ACTION_ENCRYPT; actions.crypto.resource_type = DOCA_FLOW_CRYPTO_RESOURCE_PSP; actions.crypto.crypto_id = session->crypto_id; ... doca_flow_pipe_add_entry(pipe_queue, pipe, match, actions, mon, fwd, flags, &status, entry); ... doca_flow_entries_process(port,
0
, DEFAULT_TIMEOUT_US, num_of_entries);The original packet received via rte_ethdev_rx_burst is sent back through the newly updated pipelines via rte_ethdev_tx_burst. Since the port_id argument is that of the PF, and since DOCA Flow has been initialized in expert mode, the packet is transferred to the root of the egress domain (the "empty pipe" before egress_acl_pipe).
nsent = rte_eth_tx_burst(port_id, queue_id, &packet,
1
);
Tunnel parameter request handling
The gRPC service provided by the PSP Gateway implements the RequestTunnelParams operation referenced above. A client uses this operation to request an SPI and key to encrypt traffic to send to the server's NIC device. The request indicates the virtual remote address for which the tunnel will be created.
This operation begins by generating a new SPI and key inside generate_tunnel_params() as described previously.
The operation creates an ACL entry permitting the new SPI and the remote virtual address using the add_ingress_acl_entry method of the Flows object.
doca_flow_match match = {}; match.parser_meta.psp_syndrome =
0
; match.tun.type = DOCA_FLOW_TUN_PSP; match.tun.psp.spi = RTE_BE32(session->spi_ingress); match.inner.l3_type = DOCA_FLOW_L3_TYPE_IP4; match.inner.ip4.src_ip = session->src_vip; ... doca_flow_pipe_add_entry(pipe_queue, pipe, match, actions, mon, fwd, flags, &status, entry); ... doca_flow_entries_process(port,0
, DEFAULT_TIMEOUT_US, num_of_entries);If the request included parameters for traffic in the reverse direction (traffic to encrypt and send to the client), these parameters are translated and passed to the Flows object by calling create_tunnel_flow described above.