Skip to main content
Ctrl+K
CUDA Graph Best Practice for PyTorch - Home CUDA Graph Best Practice for PyTorch - Home

CUDA Graph Best Practice for PyTorch

  • GitHub
CUDA Graph Best Practice for PyTorch - Home CUDA Graph Best Practice for PyTorch - Home

CUDA Graph Best Practice for PyTorch

  • GitHub

Table of Contents

  • Home

CUDA Graph Basics

  • Introduction
  • CUDA Graph
  • Constraints
  • Quantitative Benefits

PyTorch CUDA Graphs

  • PyTorch CUDA Graphs
  • PyTorch CUDA Graph Integration
  • Transformer Engine and Megatron-LM CUDA Graph Support
  • Best Practices for PyTorch CUDA Graphs
  • Writing Sync-Free Code
  • Handling Dynamic Patterns
  • Quick Checklist

Examples

  • Examples
  • RNN-T (RNN Transducer)
  • Stable Diffusion v2
  • GPT-3 175B
  • Llama 2 70B LoRA
  • Llama 3.1 405B

Troubleshooting

  • Troubleshooting CUDA Graphs
  • Debugging Strategies
  • Capture Failures
  • Numerical Errors
  • Memory Issues
  • Process Hang
  • Performance Issues

Reference

  • Reference
  • Contributing
  • Contributing

Contributing#

We welcome contributions to improve this documentation!

Guidelines#

For detailed contribution guidelines, including development setup and documentation standards, see CONTRIBUTING.md in the repository.

Quick Links#

  • GitHub Repository

  • Contribution Guidelines

  • Report an Issue

previous

Reference

On this page
  • Guidelines
  • Quick Links
NVIDIA NVIDIA
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2025, NVIDIA Corporation.