GPU Detection and PyTorch Setup | NVIDIA Brev Documentation

This guide helps you resolve GPU detection issues when using PyTorch, CUDA, and machine learning frameworks like Unsloth on H100 instances.

Platform: AWS H100 instances, NGC containers

Problem

Users running machine learning frameworks on H100 instances report errors when importing modules or when GPUs are not detected by PyTorch.

Typical symptoms include:

RuntimeError: CUDA error: no CUDA-capable device is detected
torch.cuda.is_available() == False
Unsloth import fails or defaults to CPU

This occurs when the runtime environment lacks proper CUDA and PyTorch enablement for H100 GPUs.

Prerequisites

H100 or compatible NVIDIA GPU instance (for example, AWS H100)
Docker and NVIDIA Container Toolkit installed
Internet access to pull NGC containers or PyTorch wheels

Solution

Option 1: Use Official NGC PyTorch Container (Recommended)

The NGC PyTorch container includes verified CUDA and cuDNN versions for H100 GPUs.

Launch the container

Replace <PATH> with your local project path:

$ sudo docker run --rm -it \
>    --gpus all \
>    -v <PATH>:/workspace \
>    -w /workspace \
>    nvcr.io/nvidia/pytorch:24.09-py3

Install and test your framework

Inside the container, install and verify your ML framework:

$ pip install -U unsloth unsloth-zoo
$ python - <<'PY'
$ from unsloth import FastLanguageModel
$ print("Unsloth import OK")
$ PY

Option 2: Manual Setup in Existing Environment

If you need to use your current Python environment:

Check GPU visibility

$ nvidia-smi || echo "no NVIDIA detected"
$ echo $NVIDIA_VISIBLE_DEVICES
$ cat /proc/driver/nvidia/version 2>/dev/null | head -1

Verify PyTorch sees the GPU

1 import torch
2 print("cuda_available:", torch.cuda.is_available())
3 print("torch_cuda:", torch.version.cuda)
4 print("device_count:", torch.cuda.device_count())
5 print("name0:", torch.cuda.get_device_name(0) if torch.cuda.is_available() else None)

Install CUDA-enabled PyTorch

If NVIDIA is visible but PyTorch shows CPU-only, install CUDA wheels:

$ pip uninstall -y torch torchvision torchaudio
$ pip install --index-url https://download.pytorch.org/whl/cu124 torch torchvision torchaudio

Install your ML framework

$ pip install -U unsloth unsloth-zoo

Test GPU detection

$ python -c "import unsloth; print('GPU available:', unsloth.torch.cuda.is_available())"

Common Issues

PyTorch installed without CUDA support

Symptom: torch.cuda.is_available() returns False or ModuleNotFoundError: No module named 'torch'

Fix: Reinstall PyTorch with CUDA support:

$ pip uninstall -y torch torchvision torchaudio
$ pip install --index-url https://download.pytorch.org/whl/cu124 \
>     torch torchvision torchaudio

Missing dependencies (xformers, torchao)

Error examples:

unsloth ... requires xformers>=0.0.27.post2
unsloth-zoo ... requires torchao>=0.13.0

Fix: Install required extras:

$ pip install xformers torchao

These dependencies are optional. Unsloth may work without them depending on your use case.

GPU visible in nvidia-smi but Python cannot import torch

Cause: Torch installed in a different environment or missing from PATH.

Fix: Reinstall in the correct environment:

$ which python3
$ python3 -m pip install --force-reinstall torch

Verification

After completing the setup:

nvidia-smi lists the H100 GPU with no Xid or driver errors
torch.cuda.is_available() returns True
from unsloth import FastLanguageModel runs without error
Model training or inference runs utilize the GPU (visible via nvidia-smi process list)

Workaround

If the container or CUDA installation fails, revert to CPU-only mode temporarily:

$ pip uninstall unsloth unsloth-zoo
$ pip install unsloth-cpu

Alternatively, use an older NGC PyTorch container (for example, 23.10-py3) for temporary compatibility.

Additional Resources

H100 GPUs typically require modern drivers (>= 550 recommended) and PyTorch cu124 wheels. Unsloth doesn’t support AMD/CPU paths—it requires NVIDIA CUDA or Intel XPU.