Demo Cluster Requirements#
Meet the following requirements to set up a demo cluster with the NeMo microservices platform and run the Beginner Platform Tutorials.
Note
The demo cluster requirements are specific to running the getting started tutorials with the Llama 3.1 8B Instruct model. If you want to try with a larger model, you need to re-evaluate and adjust the requirements accordingly.
System Requirements
The following are the common requirements for running the Beginner Platform Tutorials.
A single-node NVIDIA GPU cluster on a Linux host with cluster-admin permissions.
A least 300 GB of free disk space.
Two NVIDIA GPUs, B200 80B, A100 80 GB, or H100 80 GB, and no other workloads running on them:
One GPU for model fine-tuning.
One GPU for a
meta/llama-3.1-8b-instructNIM microservice for inference.
Software Requirements
NVIDIA developed and tested this tutorial using minikube and meeting the following prerequisites.
Minikube version 1.33 or later.
Docker 27 or later.
NVIDIA Container Toolkit v1.16.2 or higher. Refer to the Installing the NVIDIA Container Toolkit.
NVIDIA GPU Driver 560.35.03 or higher. Refer to Driver Installation Guide.
Kubernetes CLI,
kubectl. Refer to Install and Set Up kubectl on Linux in the Kubernetes documentation.Helm CLI,
helm. Refer to the Helm documentation.Hugging Face CLI. Refer to the Hugging Face Hub CLI user guide and the Hugging Face Hub installation guide. If you aren’t using a virtual environment or a root user, make sure that you add the
$HOME/.local/bindirectory to yourPATHenvironment variable.export PATH="$HOME/.local/bin:$PATH"
The minikube cluster setup tutorial uses the following minikube features:
minikube ingress.
Standard storage class using host path volumes provided by the default storage provisioner.
The host file system for the host path volumes must support file locking. During customization with NeMo Customizer, NeMo Operator starts an entity handler pod that runs the Hugging Face CLI. The CLI requires a file system, such as EXT4, that supports file locking.