SDK Features#

The cuPQC SDK provides device-side libraries that integrate directly into CUDA kernels. This section details the functional capabilities, supported algorithms, and key features of each library:

cuPQC-PK: Post-quantum key encapsulation and digital signatures

cuPQC-Hash: Cryptographic Hash and Merkle Tree operations

cuPQC-PK: Post-Quantum Cryptography#

The cuPQC-PK library provides GPU-accelerated implementations of NIST-standardized post-quantum cryptographic algorithms. Designed as a device-side library, cuPQC-PK integrates directly into your CUDA kernels, enabling you to fuse cryptographic operations with your application logic for maximum performance and minimal latency.

The library implements side-channel security protections and supports high-throughput batched operations, making it ideal for applications requiring both quantum-resistant security and exceptional performance.

Supported Algorithms#

The cuPQC-PK library supports ML-KEM (Module-Lattice-Based Key Encapsulation Mechanism, FIPS-203) for quantum-resistant key establishment, and ML-DSA (Module-Lattice-Based Digital Signature Algorithm, FIPS-204) for post-quantum digital signatures and authentication.

Algorithm Family

Parameter Set

NIST Security Category

ML-KEM-512

1 (≈ AES-128)

ML-KEM

ML-KEM-768

3 (≈ AES-192)

ML-KEM-1024

5 (≈ AES-256)

ML-DSA-44

2 (≈ SHA-256)

ML-DSA

ML-DSA-65

3 (≈ AES-192)

ML-DSA-87

5 (≈ AES-256)

ML-KEM (FIPS-203)#

ML-KEM, or Module-Lattice-Based Key Encapsulation Mechanism, is standardized under FIPS-203 by NIST. The library includes highly optimized implementations of the following ML-KEM operations:

  • Key Generation: Efficiently generates cryptographic key pairs.

  • Encapsulation: Securely encapsulates a shared secret.

  • Decapsulation: Recovers the shared secret from the encapsulated data.

The library supports multiple parameter sets to cater to various security levels and performance requirements:

  • ML-KEM-512

  • ML-KEM-768

  • ML-KEM-1024

ML-DSA (FIPS-204)#

ML-DSA, or Module-Lattice-Based Digital Signature Algorithm, is standardized under FIPS-204 by NIST. The library offers optimized implementations of the following ML-DSA operations:

  • Key Generation: Generates cryptographic key pairs suitable for digital signatures.

  • Signing: Produces a digital signature for a given message.

  • Verification: Validates the authenticity of a digital signature.

Similar to ML-KEM, the library supports various parameter sets to address different security needs:

  • ML-DSA-44

  • ML-DSA-65

  • ML-DSA-87

Key Features#

  • High Performance: Exploits the parallel processing power of NVIDIA GPUs to accelerate cryptographic operations.

  • Batched Operations: Supports batch processing to enhance throughput and efficiency.

  • Security: Implements NIST-standardized algorithms, ensuring compliance with the latest cryptographic standards.

  • Side-Channel Security: Incorporates measures to protect against side-channel attacks, ensuring the robustness of cryptographic operations.

cuPQC-Hash: Cryptographic Hash Functions#

The cuPQC-Hash library provides GPU-optimized implementations of cryptographic hash functions designed for direct integration into CUDA kernels. As a device-side library, cuPQC-Hash enables you to perform hashing operations within your kernels, eliminating data movement overhead and maximizing throughput.

Supported Hash Functions#

The cuPQC-Hash library supports a comprehensive range of cryptographic hash functions including SHA-2, SHA-3, SHAKE, and Poseidon2 variants.

Hash Family

Algorithm

Output Size (bits)

SHA-2

SHA2-224

224

SHA2-256

256

SHA2-384

384

SHA2-512

512

SHA2-512/224

224

SHA2-512/256

256

SHA-3

SHA3-224

224

SHA3-256

256

SHA3-384

384

SHA3-512

512

SHAKE

SHAKE128

Variable (XOF)

SHAKE256

Variable (XOF)

Poseidon2

BabyBear-16/24

Variable

KoalaBear-16/24

Variable

Supported Merkle Tree Configurations#

The cuPQC-Hash library provides two distinct computation modes optimized for different use cases:

  • Single-Block Mode (High Throughput): One thread block computes one complete Merkle tree. This mode enables efficient batching where multiple trees can be calculated in parallel across different blocks, maximizing throughput for applications that need to compute many independent Merkle trees.

  • Multi-Block Mode (Low Latency): Multiple thread blocks collaborate to compute a single large Merkle tree by generating subtrees that are combined. This parallel approach reduces latency for computing individual large trees by leveraging more GPU resources simultaneously.

Both modes support power-of-two leaf counts, with specific functions available for different tree sizes and use cases:

Function

Tree Size
(# of Leaves)
Subtree Size
(# of Leaves)

Generate Tree

22 – 221

N/A

Generate Sub Tree

216 – 222

210 – 212

Generate Proof

22 – 222

N/A

Verify Proof

22 – 222

N/A

Supported Merkle Tree Hash Functions#

The following hash algorithms are supported for Merkle Tree operations, with their respective I/O data types:

Hash Family

Algorithm

Precision

SHA-2

SHA2-256

uint8_t

SHA2-512

SHA-3

SHA3-256

uint8_t

SHA3-512

SHAKE

SHAKE128

uint8_t

SHAKE256

Poseidon2

BabyBear-16/24

uint32_t

KoalaBear-16/24

Key Features#

  • High Performance: GPU-optimized implementations with batched operations deliver exceptional throughput and maximize GPU utilization.

  • Merkle Tree Support: High-performance Merkle Tree API for efficient proof generation and verification, compatible with supported hash functions.

  • Flexible Merkle Tree Modes: Choose between single-block mode for high-throughput batch processing of multiple trees, or multi-block mode for low-latency computation of individual large trees.

  • Comprehensive Algorithm Support: Includes NIST-standardized hash functions (SHA-2, SHA-3, SHAKE) for traditional cryptographic applications and Poseidon2 (BabyBear/KoalaBear fields) optimized for zero-knowledge proof systems.