GPU Detection and PyTorch Setup
This guide helps you resolve GPU detection issues when using PyTorch, CUDA, and machine learning frameworks like Unsloth on H100 instances.
Platform: AWS H100 instances, NGC containers
Problem
Users running machine learning frameworks on H100 instances report errors when importing modules or when GPUs are not detected by PyTorch.
Typical symptoms include:
RuntimeError: CUDA error: no CUDA-capable device is detectedtorch.cuda.is_available() == False- Unsloth import fails or defaults to CPU
This occurs when the runtime environment lacks proper CUDA and PyTorch enablement for H100 GPUs.
Prerequisites
- H100 or compatible NVIDIA GPU instance (for example, AWS H100)
- Docker and NVIDIA Container Toolkit installed
- Internet access to pull NGC containers or PyTorch wheels
Solution
Option 1: Use Official NGC PyTorch Container (Recommended)
The NGC PyTorch container includes verified CUDA and cuDNN versions for H100 GPUs.
Option 2: Manual Setup in Existing Environment
If you need to use your current Python environment:
Common Issues
PyTorch installed without CUDA support
Symptom: torch.cuda.is_available() returns False or ModuleNotFoundError: No module named 'torch'
Fix: Reinstall PyTorch with CUDA support:
Missing dependencies (xformers, torchao)
Error examples:
unsloth ... requires xformers>=0.0.27.post2unsloth-zoo ... requires torchao>=0.13.0
Fix: Install required extras:
These dependencies are optional. Unsloth may work without them depending on your use case.
GPU visible in nvidia-smi but Python cannot import torch
Cause: Torch installed in a different environment or missing from PATH.
Fix: Reinstall in the correct environment:
Verification
After completing the setup:
nvidia-smilists the H100 GPU with no Xid or driver errorstorch.cuda.is_available()returnsTruefrom unsloth import FastLanguageModelruns without error- Model training or inference runs utilize the GPU (visible via
nvidia-smiprocess list)
Workaround
If the container or CUDA installation fails, revert to CPU-only mode temporarily:
Alternatively, use an older NGC PyTorch container (for example, 23.10-py3) for temporary compatibility.
Additional Resources
H100 GPUs typically require modern drivers (>= 550 recommended) and PyTorch cu124 wheels. Unsloth doesn’t support AMD/CPU paths—it requires NVIDIA CUDA or Intel XPU.