SDK Features#

The cuPQC SDK provides device-side libraries that integrate directly into CUDA kernels. This section details the functional capabilities, supported algorithms, and key features of each library:

cuPQC-PK: Post-quantum key encapsulation and digital signatures

cuPQC-Hash: Cryptographic Hash and Merkle Tree operations

cuPQC-PK: Post-Quantum Cryptography#

The cuPQC-PK library provides GPU-accelerated implementations of NIST-standardized post-quantum cryptographic algorithms. Designed as a device-side library, cuPQC-PK integrates directly into your CUDA kernels, enabling you to fuse cryptographic operations with your application logic for maximum performance and minimal latency.

The library implements side-channel security protections and supports high-throughput batched operations, making it ideal for applications requiring both quantum-resistant security and exceptional performance.

Supported Algorithms#

The cuPQC-PK library supports ML-KEM (Module-Lattice-Based Key Encapsulation Mechanism, FIPS-203) for quantum-resistant key establishment, and ML-DSA (Module-Lattice-Based Digital Signature Algorithm, FIPS-204) for post-quantum digital signatures and authentication.

Algorithm Family	Parameter Set	NIST Security Category
	ML-KEM-512	1 (≈ AES-128)
ML-KEM	ML-KEM-768	3 (≈ AES-192)
	ML-KEM-1024	5 (≈ AES-256)
	ML-DSA-44	2 (≈ SHA-256)
ML-DSA	ML-DSA-65	3 (≈ AES-192)
	ML-DSA-87	5 (≈ AES-256)

ML-KEM (FIPS-203)#

ML-KEM, or Module-Lattice-Based Key Encapsulation Mechanism, is standardized under FIPS-203 by NIST. The library includes highly optimized implementations of the following ML-KEM operations:

Key Generation: Efficiently generates cryptographic key pairs.
Encapsulation: Securely encapsulates a shared secret.
Decapsulation: Recovers the shared secret from the encapsulated data.

The library supports multiple parameter sets to cater to various security levels and performance requirements:

ML-KEM-512
ML-KEM-768
ML-KEM-1024

ML-DSA (FIPS-204)#

ML-DSA, or Module-Lattice-Based Digital Signature Algorithm, is standardized under FIPS-204 by NIST. The library offers optimized implementations of the following ML-DSA operations:

Key Generation: Generates cryptographic key pairs suitable for digital signatures.
Signing: Produces a digital signature for a given message.
Verification: Validates the authenticity of a digital signature.

Similar to ML-KEM, the library supports various parameter sets to address different security needs:

ML-DSA-44
ML-DSA-65
ML-DSA-87

Key Features#

High Performance: Exploits the parallel processing power of NVIDIA GPUs to accelerate cryptographic operations.
Batched Operations: Supports batch processing to enhance throughput and efficiency.
Security: Implements NIST-standardized algorithms, ensuring compliance with the latest cryptographic standards.
Side-Channel Security: Incorporates measures to protect against side-channel attacks, ensuring the robustness of cryptographic operations.

cuPQC-Hash: Cryptographic Hash Functions#

The cuPQC-Hash library provides GPU-optimized implementations of cryptographic hash functions designed for direct integration into CUDA kernels. As a device-side library, cuPQC-Hash enables you to perform hashing operations within your kernels, eliminating data movement overhead and maximizing throughput.

Supported Hash Functions#

The cuPQC-Hash library supports a comprehensive range of cryptographic hash functions including SHA-2, SHA-3, SHAKE, and Poseidon2 variants.

Hash Family	Algorithm	Output Size (bits)
SHA-2	SHA2-224	224
	SHA2-256	256
	SHA2-384	384
	SHA2-512	512
	SHA2-512/224	224
	SHA2-512/256	256
SHA-3	SHA3-224	224
	SHA3-256	256
	SHA3-384	384
	SHA3-512	512
SHAKE	SHAKE128	Variable (XOF)
SHAKE	SHAKE256	Variable (XOF)
Poseidon2	BabyBear-16/24	Variable
Poseidon2	KoalaBear-16/24	Variable

Supported Merkle Tree Configurations#

The cuPQC-Hash library provides two distinct computation modes optimized for different use cases:

Single-Block Mode (High Throughput): One thread block computes one complete Merkle tree. This mode enables efficient batching where multiple trees can be calculated in parallel across different blocks, maximizing throughput for applications that need to compute many independent Merkle trees.
Multi-Block Mode (Low Latency): Multiple thread blocks collaborate to compute a single large Merkle tree by generating subtrees that are combined. This parallel approach reduces latency for computing individual large trees by leveraging more GPU resources simultaneously.

Both modes support power-of-two leaf counts, with specific functions available for different tree sizes and use cases:

Function	Tree Size (# of Leaves)	Subtree Size (# of Leaves)
Generate Tree	2² – 2²¹	N/A
Generate Sub Tree	2¹⁶ – 2²²	2¹⁰ – 2¹²
Generate Proof	2² – 2²²	N/A
Verify Proof	2² – 2²²	N/A

Supported Merkle Tree Hash Functions#

The following hash algorithms are supported for Merkle Tree operations, with their respective I/O data types:

Hash Family	Algorithm	Precision
SHA-2	SHA2-256	uint8_t
SHA-2	SHA2-512	uint8_t
SHA-3	SHA3-256	uint8_t
SHA-3	SHA3-512	uint8_t
SHAKE	SHAKE128	uint8_t
SHAKE	SHAKE256	uint8_t
Poseidon2	BabyBear-16/24	uint32_t
Poseidon2	KoalaBear-16/24	uint32_t

Key Features#

High Performance: GPU-optimized implementations with batched operations deliver exceptional throughput and maximize GPU utilization.
Merkle Tree Support: High-performance Merkle Tree API for efficient proof generation and verification, compatible with supported hash functions.
Flexible Merkle Tree Modes: Choose between single-block mode for high-throughput batch processing of multiple trees, or multi-block mode for low-latency computation of individual large trees.
Comprehensive Algorithm Support: Includes NIST-standardized hash functions (SHA-2, SHA-3, SHAKE) for traditional cryptographic applications and Poseidon2 (BabyBear/KoalaBear fields) optimized for zero-knowledge proof systems.