Transformer Engine

2.4.0
Version select:

Home

Getting Started

Installation
Getting Started
Frequently Asked Questions (FAQ)
- FP8 checkpoint compatibility

Python API documentation

Common API
Framework-specific API
- pyTorch
- Jax
  - Pre-defined Variable of Logical Axes
  - Modules

Examples and Tutorials

Using FP8 with Transformer Engine
Performance Optimizations
Accelerating a Hugging Face Llama 2 and Llama 3 models with Transformer Engine

Advanced

C/C++ API
Precision debug tools
Attention Is All You Need!

Transformer Engine

C/C++ API
View page source

C/C++ API

The C/C++ API allows you to access the custom kernels defined in libtransformer_engine.so library directly from C/C++, without Python.

Headers

transformer_engine.h
activation.h
cast_transpose_noop.h
- nvte_transpose_with_noop()
- nvte_cast_transpose_with_noop()
cast.h
cudnn.h
- transformer_engine
  - transformer_engine::nvte_cudnn_handle_init()
fused_attn.h
fused_rope.h
- nvte_fused_rope_forward()
- nvte_fused_rope_backward()
gemm.h
multi_tensor.h
normalization.h
padding.h
- nvte_multi_padding()
permutation.h
recipe.h
softmax.h
swizzle.h
- nvte_swizzle_scaling_factors()
transpose.h

Previous Next

Privacy Policy | Manage My Privacy | Do Not Sell or Share My Data | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

© Copyright 2022-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved..