DOCA SHA
This guide provides instructions on building and developing applications that calculate message digest using the SHA1, SHA2-256, or SHA2-512 algorithms.
The DOCA SHA library is currently supported at alpha level.
The library provides an API for executing SHA operations on DOCA buffers, where the buffers reside in either local memory (i.e., within the same host) or host memory accessible by the NVIDIA® BlueField®-2 device (remote memory). Using DOCA SHA, complex cryptographic hash operations can be easily executed in an optimized, hardware-accelerated manner.
NVIDIA® BlueField®-3 does not support this library because it has no SHA acceleration engine.
This document is intended for software developers wishing to accelerate their applications' SHA calculations typically used in digital signature schemes or hash-based message authentication code calculations.
This library follows the architecture of a DOCA Core context, it is recommended to read the following sections before:
DOCA SHA-based applications can run either on the host machine or on the BlueField-2 DPU target.
DOCA SHA calculations from the host to BlueField and vice versa can only be run when the DPU is configured in DPU mode.
DOCA SHA is a DOCA Core Context. This library leverages the DOCA Core architecture to expose asynchronous tasks/events offloaded to hardware.
SHA can be used to calculate message digest as illustrated in the following diagrams:
SHA from local memory to local memory:
Using the DPU to do SHA using the memory between the host and the DPU:
Using the host to do SHA calculation using memory between the host and the DPU:
Objects
Device and Representor
The library requires a DOCA device to operate. The device is used to access memory and perform the actual SHA calculation. See DOCA Core Device Discovery.
For the same BlueField DPU, it does not matter which device is used (i.e., PF/VF/SF) as these devices utilize the same hardware component. If there are multiple DPUs, then it is possible to create a SHA instance per DPU, providing each instance with a device from a different DPU.
To access non-local memory (i.e., from the host to DPU or vice versa), the DPU side of the application must choose a device with an appropriate representor (see DOCA Core Device Representor Discovery). The device must stay valid for as long as the SHA instance is not destroyed.
Memory Buffers
The SHA task requires at least two DOCA buffers containing the destination and the source.
The destination is always a single doca_buf
. The source can be a single doca_buf
or a linked_list
of doca_buf
. All destination and source doca_buf
s can be allocated from doca_buf_inventory
.
For the usage of doca_buf_inventory
, please refer to the DOCA Core Inventory Types table.
Buffers must not be modified or read during the SHA operation. For information on what kind of memory is supported, refer to the table in section "Buffer Support".
To start using the library, users must go through a configuration phase as described in DOCA Core Context Configuration Phase.
This section describes how to configure and start the context to allow the execution of tasks and retrieval of events.
Configurations
The context can be configured to match the application use case.
To find if a configuration is supported or its min/max value, refer to section "Device Support".
Mandatory Configurations
These configurations must be set by the application before attempting to start the context:
At least one task/event type must be configured. See configuration of tasks and/or events in sections "Tasks" and "Events" respectively for information.
A device with appropriate support must be provided upon creation
Device Support
DOCA SHA requires a device to operate. For information on choosing a device, see DOCA Core Device Discovery.
As device capabilities may change in the future (see DOCA Core Device Support) it is recommended to select your device using the following methods:
doca_sha_cap_task_hash_get_supported
doca_sha_cap_task_partial_hash_get_supported
Some devices can allow different capabilities such as:
The maximum number of tasks
The maximum source buffer size
The minimum destination buffer size
The maximum supported number of elements in DOCA linked-list buffer
Check whether SHA1, SHA2-256 or SHA2-512 is supported
Buffer Support
Tasks support buffers with the following features:
Buffer Type |
Source Buffer |
Destination Buffer |
Local mmap buffer |
Yes |
Yes |
Mmap from PCIe export buffer |
Yes |
Yes |
Mmap from RDMA export buffer |
No |
No |
Linked list buffer |
Yes |
No |
This section describes execution on the CPU using DOCA Core Progress Engine.
Tasks
DOCA SHA exposes asynchronous tasks that leverage DPU hardware according to DOCA Core architecture.
SHA Task
The SHA task doca_sha_task_hash
allows one-shot SHA calculation using buffers as described in section "Buffer Support". One-shot means that the source buffer is used as a whole input, therefore, the SHA operation is completed after this task completion event arrives.
Task Configuration
Description |
API to Set Configuration |
API to Query Support |
Enable the task |
|
|
Number of tasks |
|
|
Maximal source buffer size |
– |
|
Maximum source buffer list size |
– |
|
Minimum destination buffer size |
– |
|
Task Input
Common input as described in DOCA Core Task.
Name |
Description |
Notes |
Source buffer |
Buffer pointing to the memory to be used for SHA calculation |
Only the data residing in the data segment is to be used |
Destination buffer |
Buffer pointing to the memory used for writing the SHA calculation result |
The SHA result is appended to the tail segment |
SHA algorithm type |
SHA algorithm to be used in SHA calculation |
Must be one of |
Task Output
Common output as described in DOCA Core Task.
Task Completion Success
After the task completes successfully, the following happens:
The SHA calculation of data from the source buffer is successfully completed and the result is written to the destination buffer
The destination buffer data segment is extended to include the SHA result data
Task Completion Failure
If the task fails midway:
The context may enter stopping state if a fatal error occurs
The source and destination
doca_buf
objects are not modifiedThe destination buffer contents may be modified
Task Limitations
The operation is not atomic
Once the task is submitted, the source and destination should not be read/written to
Other limitations are described in DOCA Core Task
Partial-SHA Task
The partial-SHA task doca_sha_task_partial_hash
allows stateful SHA calculation for a collection of messages. Using buffers as described in section "Buffer Support".
Stateful means that the input data is composed of many segments (may be spatial or timely non-consecutive), therefore, its SHA calculation requires more than one one-shot SHA operation to finish. During any stateful operation, other independent SHA tasks can also be executed.
Task Configuration
Description |
API to Set Configuration |
API to Query Support |
Enable the task |
|
|
Number of tasks |
|
|
Maximal source buffer size |
– |
|
Maximum source buffer list size |
– |
|
Minimum destination buffer size |
– |
|
SHA block size |
|
Task Input
Common input as described in DOCA Core Task.
Name |
Description |
Notes |
Source buffer |
Buffer pointing to the memory to be used for SHA calculation |
Only the data residing in the data segment is to be used. And the data length for the non-last data segment must be multiple of the SHA block size queried by |
Destination buffer |
Buffer pointing to the memory is used for writing the SHA calculation result |
The SHA result is appended to the tail segment. During the whole calculation process, this buffer cannot be modified. |
SHA algorithm type |
SHA algorithm to be used in SHA calculation |
Must be one of |
Whether the current source buffer is the last segment |
Indicate whether the current source Buffer is the last segment data to be used for partial-SHA calculation |
Use |
Set source buffer |
Use to set the subsequent source segment buffer after the initial |
|
Task Output
Common output as described in DOCA Core Task.
Task Completion Success
After the task completes successfully, the following happens:
The SHA calculation of data from the source buffer is successfully completed and the result is written to the destination buffer
The destination buffer data segment is extended to include the SHA result data
Task Completion Failure
If the task fails midway:
The context may enter stopping state if a fatal error occurs
The source and destination
doca_buf
objects is not modifiedThe destination buffer contents may be modified
Task Limitations
The operation is not atomic
Once the task is submitted, the source and destination should not be read/written to
Other limitations are described in DOCA Core Task
Events
DOCA SHA exposes asynchronous events to notify about changes that happen unexpectedly according to the DOCA Core architecture.
The only events SHA exposes are common events as described in DOCA Core Event.
The DOCA SHA library follows the context state machine as described in DOCA Core Context State Machine.
The following section describes moving states and what is allowed in each state.
Idle
In this state, it is expected that the application either:
Destroys the context
Starts the context
Allowed operations:
Configuring the context according to section "Configurations"
Starting the context
It is possible to reach this state as follows:
Previous State |
Transition Action |
None |
Create the context |
Running |
Call stop after making sure all tasks have been freed |
Stopping |
Call progress until all tasks are completed and freed |
Starting
This state cannot be reached.
Running
In this state, it is expected that the application:
Allocates and submits tasks
Calls progress to complete tasks and/or receive events
Allowed operations:
Allocating previously configured task
Submitting a task
Calling stop
It is possible to reach this state as follows:
Previous State |
Transition Action |
Idle |
Call start after configuration |
Stopping
In this state, it is expected that the application:
Calls progress to complete all inflight tasks (tasks complete with failure)
Frees any completed tasks
Allowed operations:
Calling progress
It is possible to reach this state as follows:
Previous State |
Transition Action |
Running |
Call progress and fatal error occurs |
Running |
Call stop without freeing all tasks |
DOCA SHA only supports datapath on the CPU. See section "Execution Phase".
This section describes DOCA SHA samples based on the DOCA SHA library.
The samples in this section illustrate how to use the DOCA SHA API to do the following:
Do SHA calculation of contents of a buffer, and write result to another buffer
Chop the contents of a buffer into a collection of segments, and do partial-SHA calculation of this collection of segments, and write result to another
All the DOCA samples described in this section are governed under the BSD-3 software license agreement.
Running the Samples
Refer to the following documents:
NVIDIA DOCA Installation Guide for Linux for details on how to install BlueField-related software.
NVIDIA DOCA Troubleshooting for any issue you may encounter with the installation, compilation, or execution of DOCA samples.
To build a given sample:
cd /opt/mellanox/doca/samples/doca_sha/<sample_name> meson/tmp/build ninja -C/tmp/build
InfoThe binary
doca_<sample_name>
is created under/tmp/build/
.Sample (e.g.,
doca_sha_create
) usage:Usage: doca_sha_create [DOCA Flags] [Program Flags] DOCA Flags: -h, --help Print a help synopsis -v, --version Print program version information -l, --log-level Set the (numeric) log level
for
the program <10
=DISABLE,20
=CRITICAL,30
=ERROR,40
=WARNING,50
=INFO,60
=DEBUG,70
=TRACE> --sdk-log-level Set the SDK (numeric) log levelfor
the program <10
=DISABLE,20
=CRITICAL,30
=ERROR,40
=WARNING,50
=INFO,60
=DEBUG,70
=TRACE> -j, --json <path> Parse all command flags from an input json file Program Flags: -d, --data user dataFor additional information per sample, use the
-h
option:/tmp/build/doca_<sample_name>-h
Samples
SHA Create
This sample illustrates how to perform SHA calculation with DOCA SHA.
The sample logic includes:
Locating DOCA device.
Initializing required DOCA Core structures.
Setting the
task_pool
configuration fordoca_sha_task_hash
.Populating DOCA memory map with two relevant buffers.
Allocating element in DOCA buffer inventory for each buffer.
Allocating and initializing a
doca_sha_task_hash
.Submitting the task.
Retrieving task result once it is done.
Reference:
/opt/mellanox/doca/samples/doca_sha/sha_create/sha_create_sample.c
/opt/mellanox/doca/samples/doca_sha/sha_create/sha_create_main.c
/opt/mellanox/doca/samples/doca_sha/sha_create/meson.build
SHA-Partial Create
This sample illustrates how to perform partial-SHA calculation for a collection of data segments with DOCA SHA.
The sample logic includes:
Locating DOCA device.
Initializing the required DOCA Core structures.
Setting the
task_pool
configuration fordoca_sha_task_partial_hash
.Chopping the source data into a collection of data segments according to the selected SHA algorithm's block size
Populating DOCA memory map with needed buffers for all source data segments and destination buffer.
Allocating element in DOCA buffer inventory for the first source buffer and destination buffer.
Allocating and initializing a
doca_sha_task_partial_hash
with the first source buffer and the destination buffer.Iteratively repeating the following sub-steps until all data segments are consumed:
Submitting the
doca_sha_task_partial_hash
.Waiting for the submitted task to finish.
Allocating a
doca_buf
for the next source segment and usedoca_sha_task_partial_hash_set_src
to set it as source buffer of the above allocated task.If it is the final segment, use
doca_sha_task_partial_hash_set_is_final_buf
to mark it in the allocate task.
Retrieving the result of the final iteration in the destination buffer as the full partial-SHA calculation result.
Destroying all SHA and DOCA Core structures.
Reference:
/opt/mellanox/doca/samples/doca_sha/sha_partial_create/sha_partial_create_sample.c
/opt/mellanox/doca/samples/doca_sha/sha_partial_create/sha_partial_create_main.c
/opt/mellanox/doca/samples/doca_sha/sha_partial_create/meson.build