Transformer Engine

2.12.0
Version select:

Home

Getting Started

Installation
Getting Started
Frequently Asked Questions (FAQ)
- FP8 checkpoint compatibility

Python API documentation

Common API
Framework-specific API
- PyTorch
- Jax

Examples and Tutorials

Using FP8 and FP4 with Transformer Engine
Performance Optimizations
Accelerating Hugging Face Llama 2 and 3 Fine-Tuning with Transformer Engine
Accelerating Hugging Face Gemma Inference with Transformer Engine
Export to ONNX and inference using TensorRT
JAX: Integrating TE into an existing framework
- Testing Performance
- Transformer Engine

Advanced

C/C++ API
Precision debug tools
Environment Variables
Attention Is All You Need!
Deep Dive into CP + THD + AG + Striped>1 + SWA support for Transformer Engine JAX

Transformer Engine

C/C++ API
View page source

C/C++ API

The C/C++ API allows you to access the custom kernels defined in libtransformer_engine.so library directly from C/C++, without Python.

Headers

transformer_engine.h
activation.h
cast_transpose_noop.h
- nvte_transpose_with_noop()
- nvte_cast_transpose_with_noop()
cast.h
cudnn.h
- transformer_engine
  - transformer_engine::nvte_cudnn_handle_init()
fused_attn.h
fused_rope.h
gemm.h
multi_tensor.h
normalization.h
padding.h
- nvte_multi_padding()
- nvte_multi_unpadding()
permutation.h
recipe.h
softmax.h
swizzle.h
transpose.h

Previous Next

Privacy Policy | Manage My Privacy | Do Not Sell or Share My Data | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

© Copyright 2022-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved..