cuPQC-PK Usage#
The first step in using the cuPQC-PK library is defining the post-quantum cryptographic operation you wish to perform. We specify operations by combining cuPQC-PK operators to form a cuPQC-PK descriptor. A well-defined cuPQC-PK descriptor must include the algorithm, security category, cryptographic function, and block dimension.
As an example, we’ll construct a descriptor for ML-KEM-512 key generation.
// Include the cuPQC-PK library header
#include <pk.hpp>
//cuPQC-PK descriptor
//Define ML-KEM-512 Key Generation with 128 threads
using MLKEM512Key = decltype(ML_KEM_512()
+ Function<function::Keygen>()
+ Block()
+ BlockDim<128>());
Executing Operations#
Descriptors are instantiated as objects inside your CUDA kernels. Creating the descriptor object has no computational cost—it serves as a configuration handle.
The descriptor provides an execute(...) method that performs the requested cryptographic operation.
The arguments for this function depend on the algorithm and function; see the device functions API reference for details.
To demonstrate cuPQC-PK execution we will continue with our previous example and write a kernel for a batched key generation operation – each thread block will compute one public and secret key:
// Include the cuPQC-PK library header
#include <pk.hpp>
//cuPQC-PK descriptor
//Define ML-KEM-512 Key Generation with 128 threads
using MLKEM512Key = decltype(ML_KEM_512()
+ Function<function::Keygen>()
+ Block()
+ BlockDim<128>());
__global__ void keygen_kernel(uint8_t* public_keys, uint8_t* secret_keys, uint8_t* workspace, uint8_t* randombytes)
{
// These functions require a fixed amount of shared memory, for which the API provides a trait
__shared__ uint8_t smem_ptr[MLKEM512Key::shared_memory_size];
int block = blockIdx.x;
auto public_key = public_keys + block * MLKEM512Key::public_key_size;
auto secret_key = secret_keys + block * MLKEM512Key::secret_key_size;
auto entropy = randombytes + block * MLKEM512Key::entropy_size;
auto work = workspace + block * MLKEM512Key::workspace_size;
// Execute the key generation
MLKEM512Key().execute(public_key, secret_key, entropy, work, smem_ptr);
}
In keygen_kernel, each block computes a single public_key and secret_key. Many cuPQC-PK algorithms require an additional
global memory workspace to be constructed, as well as a pre-filled buffer of cryptographic entropy. Techniques to allocate these variables are shown
in the next section.
Launching Kernels#
To launch a kernel with the cuPQC-PK API inside, we need to know the block size and the required amount of global memory workspace. These are fixed by the cuPQC-PK descriptor that we constructed earlier. In the host code we have API function calls that are templated from the cuPQC-PK descriptor, these API calls give the global memory workspace, as well as the entropy variables.
void ml_kem_keygen(std::vector<uint8_t> public_keys, std::vector<uint8_t> secret_keys, const unsigned int batch)
{
/*
* Set up for utilizing cuPQC-PK ML-KEM Keygen.
* Allocates device workspace for computing
*/
auto length_public_key = MLKEM512Key::public_key_size;
auto length_secret_key = MLKEM512Key::secret_key_size;
auto workspace = make_workspace<MLKEM512Key>(batch);
auto randombytes = get_entropy<MLKEM512Key>(batch);
/*
* Allocate device memory for public and secret keys
*/
uint8_t* d_public_key = nullptr;
uint8_t* d_secret_key = nullptr;
cudaMalloc(&d_public_key, length_public_key * batch); //These are uint8_t so length and batch are in bytes
cudaMalloc(&d_secret_key, length_secret_key * batch);
// Launch the kernel
keygen_kernel<<<batch, MLKEM512Key::BlockDim>>>(d_public_key, d_secret_key, workspace, randombytes);
/*
* Transfer generated keys to the host for communication or storage
*/
cudaMemcpy(public_keys.data(), d_public_key, length_public_key * batch, cudaMemcpyDeviceToHost);
cudaMemcpy(secret_keys.data(), d_secret_key, length_secret_key * batch, cudaMemcpyDeviceToHost);
/*
* Delete device memory associated with the cryptographic process.
*/
cudaFree(d_public_key);
cudaFree(d_secret_key);
destroy_workspace(workspace);
release_entropy(randombytes);
}
Compilation#
In order to compile a program which includes pk.hpp, as well as the cuPQC-PK API, users need to pass the location of the
cuPQC-PK header, as well as link with the static library provided. Details on how to use cuPQC-PK in your project can be found in
the Getting Started guide. The cuPQC-PK library is built using link-time-optimization (LTO),
so the appropriate flags are required when compiling to your specific architecture.
nvcc -std=c++17 -dlto -arch=sm_80 \
-I<path_to_cupqc_sdk_include> -I<path_to_commondx> \
-L<path_to_static_lib> -lcupqc-pk \
cupqc_pk_intro_example.cu -o cupqc_pk_intro_example
For detailed installation and compilation instructions, see the Getting Started guide. For information on supported features, see SDK Features.