*** title: GPU Detection and PyTorch Setup description: >- Resolve GPU detection issues with PyTorch, CUDA, and Unsloth on H100 instances. ---------- This guide helps you resolve GPU detection issues when using PyTorch, CUDA, and machine learning frameworks like Unsloth on H100 instances. **Platform**: AWS H100 instances, NGC containers ## Problem Users running machine learning frameworks on H100 instances report errors when importing modules or when GPUs are not detected by PyTorch. **Typical symptoms include:** * `RuntimeError: CUDA error: no CUDA-capable device is detected` * `torch.cuda.is_available() == False` * Unsloth import fails or defaults to CPU This occurs when the runtime environment lacks proper CUDA and PyTorch enablement for H100 GPUs. ## Prerequisites * H100 or compatible NVIDIA GPU instance (for example, AWS H100) * Docker and NVIDIA Container Toolkit installed * Internet access to pull NGC containers or PyTorch wheels ## Solution ### Option 1: Use Official NGC PyTorch Container (Recommended) The NGC PyTorch container includes verified CUDA and cuDNN versions for H100 GPUs. ### Launch the container Replace `` with your local project path: ```bash sudo docker run --rm -it \ --gpus all \ -v :/workspace \ -w /workspace \ nvcr.io/nvidia/pytorch:24.09-py3 ``` ### Install and test your framework Inside the container, install and verify your ML framework: ```bash pip install -U unsloth unsloth-zoo python - <<'PY' from unsloth import FastLanguageModel print("Unsloth import OK") PY ``` ### Option 2: Manual Setup in Existing Environment If you need to use your current Python environment: ### Check GPU visibility ```bash nvidia-smi || echo "no NVIDIA detected" echo $NVIDIA_VISIBLE_DEVICES cat /proc/driver/nvidia/version 2>/dev/null | head -1 ``` ### Verify PyTorch sees the GPU ```python import torch print("cuda_available:", torch.cuda.is_available()) print("torch_cuda:", torch.version.cuda) print("device_count:", torch.cuda.device_count()) print("name0:", torch.cuda.get_device_name(0) if torch.cuda.is_available() else None) ``` ### Install CUDA-enabled PyTorch If NVIDIA is visible but PyTorch shows CPU-only, install CUDA wheels: ```bash pip uninstall -y torch torchvision torchaudio pip install --index-url https://download.pytorch.org/whl/cu124 torch torchvision torchaudio ``` ### Install your ML framework ```bash pip install -U unsloth unsloth-zoo ``` ### Test GPU detection ```bash python -c "import unsloth; print('GPU available:', unsloth.torch.cuda.is_available())" ``` ## Common Issues ### PyTorch installed without CUDA support **Symptom**: `torch.cuda.is_available()` returns `False` or `ModuleNotFoundError: No module named 'torch'` **Fix**: Reinstall PyTorch with CUDA support: ```bash pip uninstall -y torch torchvision torchaudio pip install --index-url https://download.pytorch.org/whl/cu124 \ torch torchvision torchaudio ``` ### Missing dependencies (xformers, torchao) **Error examples**: * `unsloth ... requires xformers>=0.0.27.post2` * `unsloth-zoo ... requires torchao>=0.13.0` **Fix**: Install required extras: ```bash pip install xformers torchao ``` These dependencies are optional. Unsloth may work without them depending on your use case. ### GPU visible in nvidia-smi but Python cannot import torch **Cause**: Torch installed in a different environment or missing from PATH. **Fix**: Reinstall in the correct environment: ```bash which python3 python3 -m pip install --force-reinstall torch ``` ## Verification After completing the setup: * `nvidia-smi` lists the H100 GPU with no Xid or driver errors * `torch.cuda.is_available()` returns `True` * `from unsloth import FastLanguageModel` runs without error * Model training or inference runs utilize the GPU (visible via `nvidia-smi` process list) ## Workaround If the container or CUDA installation fails, revert to CPU-only mode temporarily: ```bash pip uninstall unsloth unsloth-zoo pip install unsloth-cpu ``` Alternatively, use an older NGC PyTorch container (for example, `23.10-py3`) for temporary compatibility. ## Additional Resources * [NVIDIA NGC PyTorch Containers](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch) * [PyTorch CUDA Wheels](https://download.pytorch.org/whl/) H100 GPUs typically require modern drivers (>= 550 recommended) and PyTorch cu124 wheels. Unsloth doesn't support AMD/CPU paths—it requires NVIDIA CUDA or Intel XPU.