GPU Detection and PyTorch Setup

View as Markdown

This guide helps you resolve GPU detection issues when using PyTorch, CUDA, and machine learning frameworks like Unsloth on H100 instances.

Platform: AWS H100 instances, NGC containers

Problem

Users running machine learning frameworks on H100 instances report errors when importing modules or when GPUs are not detected by PyTorch.

Typical symptoms include:

  • RuntimeError: CUDA error: no CUDA-capable device is detected
  • torch.cuda.is_available() == False
  • Unsloth import fails or defaults to CPU

This occurs when the runtime environment lacks proper CUDA and PyTorch enablement for H100 GPUs.

Prerequisites

  • H100 or compatible NVIDIA GPU instance (for example, AWS H100)
  • Docker and NVIDIA Container Toolkit installed
  • Internet access to pull NGC containers or PyTorch wheels

Solution

The NGC PyTorch container includes verified CUDA and cuDNN versions for H100 GPUs.

1

Launch the container

Replace <PATH> with your local project path:

$sudo docker run --rm -it \
> --gpus all \
> -v <PATH>:/workspace \
> -w /workspace \
> nvcr.io/nvidia/pytorch:24.09-py3
2

Install and test your framework

Inside the container, install and verify your ML framework:

$pip install -U unsloth unsloth-zoo
$python - <<'PY'
$from unsloth import FastLanguageModel
$print("Unsloth import OK")
$PY

Option 2: Manual Setup in Existing Environment

If you need to use your current Python environment:

1

Check GPU visibility

$nvidia-smi || echo "no NVIDIA detected"
$echo $NVIDIA_VISIBLE_DEVICES
$cat /proc/driver/nvidia/version 2>/dev/null | head -1
2

Verify PyTorch sees the GPU

1import torch
2print("cuda_available:", torch.cuda.is_available())
3print("torch_cuda:", torch.version.cuda)
4print("device_count:", torch.cuda.device_count())
5print("name0:", torch.cuda.get_device_name(0) if torch.cuda.is_available() else None)
3

Install CUDA-enabled PyTorch

If NVIDIA is visible but PyTorch shows CPU-only, install CUDA wheels:

$pip uninstall -y torch torchvision torchaudio
$pip install --index-url https://download.pytorch.org/whl/cu124 torch torchvision torchaudio
4

Install your ML framework

$pip install -U unsloth unsloth-zoo
5

Test GPU detection

$python -c "import unsloth; print('GPU available:', unsloth.torch.cuda.is_available())"

Common Issues

PyTorch installed without CUDA support

Symptom: torch.cuda.is_available() returns False or ModuleNotFoundError: No module named 'torch'

Fix: Reinstall PyTorch with CUDA support:

$pip uninstall -y torch torchvision torchaudio
$pip install --index-url https://download.pytorch.org/whl/cu124 \
> torch torchvision torchaudio

Missing dependencies (xformers, torchao)

Error examples:

  • unsloth ... requires xformers>=0.0.27.post2
  • unsloth-zoo ... requires torchao>=0.13.0

Fix: Install required extras:

$pip install xformers torchao

These dependencies are optional. Unsloth may work without them depending on your use case.

GPU visible in nvidia-smi but Python cannot import torch

Cause: Torch installed in a different environment or missing from PATH.

Fix: Reinstall in the correct environment:

$which python3
$python3 -m pip install --force-reinstall torch

Verification

After completing the setup:

  • nvidia-smi lists the H100 GPU with no Xid or driver errors
  • torch.cuda.is_available() returns True
  • from unsloth import FastLanguageModel runs without error
  • Model training or inference runs utilize the GPU (visible via nvidia-smi process list)

Workaround

If the container or CUDA installation fails, revert to CPU-only mode temporarily:

$pip uninstall unsloth unsloth-zoo
$pip install unsloth-cpu

Alternatively, use an older NGC PyTorch container (for example, 23.10-py3) for temporary compatibility.

Additional Resources

H100 GPUs typically require modern drivers (>= 550 recommended) and PyTorch cu124 wheels. Unsloth doesn’t support AMD/CPU paths—it requires NVIDIA CUDA or Intel XPU.