cuHash Library#
A first step in using cuHash is defining the hash operation we wish to perform. We specify our operations by adding together cuHash operators to form a Hash descriptor. A well-defined cuHash descriptor must include the algorithm, security category, and either thread or warp. Note that for the Poseidon2-BabyBear and Merkle Tree algorithms, the security category is not required.
As an example we show how to construct a SHA-3 256 hashing object using the warp configuration, that is we construct a SHA-3 hash utilizing a full warp.
//cuHash header
#include <cuhash.hpp>
//We use the cuPQC SDK namespace.
using namespace cupqc;
//cuHash descriptor
using SHA3_256_WARP = decltype(SHA3_256() + Warp());
Executing cuHash#
These cuHash descriptor types are instantiated into objects inside user kernels.
The cuHash descriptor object provides a few relevant methods, reset()
, update(...)
, finalize()
, and digest(...)
.
The reset()
method resets the hash object to its initial state. The update(...)
method takes a pointer to a buffer of data and its length, and hashes the data.
The finalize()
method finalizes the hash object and outputs the hash of the data into a buffer.
The digest(...)
method takes a pointer to a buffer of data and its length, and outputs the hash of the data into the buffer.
For more information on these methods, see the methods API reference.
To demonstrate cuHash execution, we will continue with our previous example and write a kernel for a batched single hashed operation – each thread block will compute hash utilizing a warp:
//cuHash header
#include <cuhash.hpp>
//We use the cuPQC SDK namespace.
using namespace cupqc;
//cuHash descriptor
using SHA3_256_WARP = decltype(SHA3_256() + Warp());
__global__ void hash_sha3_kernel(uint8_t* digest, const uint8_t* msg, size_t inbuf_len)
{
SHA3_256_WARP hash {};
hash.reset();
hash.update(msg, inbuf_len);
hash.finalize();
hash.digest(digest, SHA3_256_WARP::digest_size);
}
Launching a cuHash kernel#
Launching a kernel with the cuHash API is straightforward. This function takes a message buffer and outputs a digest buffer, where the hashing is performed on each thread block. In this case, it makes the most sense to use blocks with 32 threads, as this is the size of a warp.
void hash_sha3(std::vector<uint8_t>& digest, std::vector<uint8_t>& msg)
{
uint8_t* d_msg;
uint8_t* d_digest;
cudaMalloc(reinterpret_cast<void**>(&d_msg), msg.size());
cudaMalloc(reinterpret_cast<void**>(&d_digest), digest.size());
cudaMemcpy(d_msg, msg.data(), msg.size(), cudaMemcpyHostToDevice);
hash_sha3_kernel<<<1, 32>>>(d_digest, d_msg, msg.size());
cudaMemcpy(digest.data(), d_digest, digest.size(), cudaMemcpyDeviceToHost);
cudaFree(d_msg);
cudaFree(d_digest);
}
Compilation#
In order to compile a program which includes cuhash.hpp
, as well the cuHash API, users need to pass the location of the
cuPQC SDK headers, as well as link with the static library provided. Details on how to use cuHash in your project can be found in
Quick Installation Guide. The cuHash library is built with link-time-optimization (LTO),
so we need to use the appropriate flags when compiling.
nvcc -std=c++17 -dlto -arch=sm_80 -L<path_to_static_lib> -lcuhash -I<path_to_cupqc_sdk_include> -I<path_to_commondx> <cuhash_intro_example.cu> -o cuhash_intro_example
For more information on the currently supported functionality, see Requirements and Functionality.