Examples |
|||
|---|---|---|---|
Group |
Example |
Description |
|
Subgroup |
|||
Introduction Examples |
introduction_example |
cuBLASDx API introduction example |
|
Simple GEMM Examples |
Basic Example |
simple_gemm_fp32 |
Performs fp32 GEMM |
simple_gemm_cfp16 |
Performs complex fp16 GEMM |
||
simple_gemm_fp8 |
Performs fp8 GEMM |
||
Extra Examples |
simple_gemm_leading_dimensions |
Performs GEMM with non-default leading dimensions |
|
simple_gemm_std_complex_fp32 |
Performs GEMM with |
||
simple_gemm_mixed_precision |
Performs GEMM with different data type for matrices |
||
simple_gemm_transform |
Performs GEMM with transform operators |
||
simple_gemm_custom_layout |
Performs GEMM with custom matrix layouts |
||
NVRTC Examples |
nvrtc_gemm |
Performs GEMM, kernel is compiled using NVRTC |
|
GEMM Performance |
single_gemm_performance |
Benchmark for single GEMM |
|
fused_gemm_performance |
Benchmark for 2 GEMMs fused into a single kernel |
||
Advanced Examples |
Fusion |
fused_gemm |
Performs 2 GEMMs in a single kernel |
gemm_fft |
Perform GEMM and FFT in a single kernel |
||
gemm_fft_fp16 |
Perform GEMM and FFT in a single kernel (half-precision complex type) |
||
gemm_fft_performance |
Benchmark for GEMM and FFT fused into a single kernel |
||
Deep Learning |
scaled_dot_prod_attn |
Scaled dot product attention using cuBLASDx |
|
scaled_dot_prod_attn_batched |
Multi-head attention using cuBLASDx |
||
Other |
multiblock_gemm |
Proof-of-concept for single large GEMM using multiple CUDA blocks |
|
batched_gemm_fp64 |
Manual batching in a single CUDA block |
||
blockdim_gemm_fp16 |
BLAS execution with different block dimensions |
||