Quickstart Installation#
The NeMo Platform Python SDK installs both a CLI and a Python SDK which provide convenient access to NeMo Platform.
Prerequisites#
Ensure you have the following prerequisites ready:
Python 3.11 or higher
pippackage managerDocker 28.3.0 or higher
An NGC API key, which you can obtain for free at NVIDIA NGC
Your system has at least 16 GB of available disk space and 8 GB of RAM
Note
The default quickstart (nmp quickstart) uses cloud-only mode (NVIDIA Build API) for inference and requires 16 GB disk and 8 GB RAM. In this mode, Customization and Safe Synthesizer are not available.
If you run nmp quickstart configure and select host-gpu mode, local GPUs are used for inference, fine-tuning, and privacy-compliant synthetic data generation. You will need substantially more resources:
Additional tens to hundreds of GBs of disk space for downloaded model files
16-32 GB of system RAM (recommended 32 GB)
A GPU with 40+ GB VRAM (80 GB recommended for larger models)
Refer to Hardware and Software Requirements for NeMo Platform for detailed hardware requirements
Installing the CLI and SDK#
The CLI is the easiest and fastest way to deploy NeMo Platform locally and get started. Install both the CLI and SDK with:
pip install nemo-platform
Note
This will download and install additional third-party open source software projects. Review the license terms of these open source projects before use.
If you previously installed the nemo-microservices package, uninstall it first to avoid conflicts:
pip uninstall nemo-microservices
You can verify a successful installation and view a list of all available commands in the CLI with:
nmp --help
Refer to the CLI reference for more details.
Installing NeMo Platform#
GPU workloads require configuration
By default, quickstart uses cloud-only mode (NVIDIA Build API) for inference. Local GPUs are not used, and Customization and Safe Synthesizer are not available.
To enable local inference, fine-tuning, and privacy-compliant synthetic data generation, configure quickstart before starting:
nmp quickstart configure
# Select "Host GPU" when prompted for deployment mode
After configuring for host-gpu mode, GPU device IDs are detected and stored in your quickstart config. For GPU selection and verification, expand GPU Configuration below.
GPU Configuration
GPU Configuration Overview
When running GPU workloads locally (such as inference, Customization, and Safe Synthesizer), you configure which GPU devices are available via nmp quickstart configure. The chosen GPU list is stored in ~/.config/nmp/quickstart.yaml as reserved_gpu_device_ids (a comma-separated string, e.g. "0,1,2") and passed into the container at startup.
Note
This configuration creates a shared GPU pool for services that run GPU workloads, such as the jobs service (for training, evaluation, and other GPU jobs) and the models service (for local NIM deployments). The platform will not over-schedule workloads - if all configured GPUs are in use, new workloads will wait until a GPU becomes available.
To see available GPU device IDs on your system:
nvidia-smi --list-gpus
Understanding Quickstart Configuration Files
When you run nmp quickstart configure, the following files are created or updated:
File |
Purpose |
|---|---|
|
CLI context configuration (clusters, users, workspaces). |
|
Quickstart settings (image, NGC key, GPU mode, GPU device IDs when using host-gpu). |
When you select host-gpu, the detected (or filtered) GPU list is written to reserved_gpu_device_ids in ~/.config/nmp/quickstart.yaml.
Using All GPUs (Default)
When you select host-gpu during nmp quickstart configure, all detected NVIDIA GPUs are used by default. The list is stored automatically; no extra steps are required.
# 1. Configure for GPU mode
nmp quickstart configure
# Select "host-gpu" when prompted (GPUs are detected and saved)
# 2. Start quickstart - all detected GPUs are available
nmp quickstart up
If you have the CUDA_VISIBLE_DEVICES environment variable set when you run configure, the default list is the intersection of that list with the detected GPUs (integer device IDs only), so you can limit GPUs from the shell.
Using Specific GPUs
To use a subset of GPUs (for example, to reserve some for other applications):
Run
nmp quickstart configureand select host-gpu (detection pre-populates the list).When asked Save configuration?, choose Configure advanced options.
In GPU device IDs, enter a comma-separated list (e.g.
0,1or0, 1, 2) and save.
The value is stored in ~/.config/nmp/quickstart.yaml as reserved_gpu_device_ids: "0,1" (or your chosen list).
Important
If host-gpu was enabled previously but reserved_gpu_device_ids was never set (for example, an older config file), nmp quickstart up will fail and ask you to re-run nmp quickstart configure to set the GPU list.
Verifying GPU Detection
After starting quickstart, verify GPUs were detected:
docker logs nmp-quickstart 2>&1 | grep -i gpu
You should see a log line containing: SharedResourceManager: Creating GPU pool with N GPU(s)
If you see “No GPUs detected”:
Ensure you ran
nmp quickstart configureand selected host-gpuVerify Docker can access GPUs:
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi
Note
The GPU pool is shared between services that run GPU workloads (jobs service, models service). The platform coordinates GPU allocation to prevent over-scheduling—if all GPUs are in use, new workloads will wait until a GPU becomes available.
nmp quickstart up
This downloads and starts the quickstart version of the NeMo Platform. This step can take a few minutes, depending on your network and hardware.
What happens next depends on the deployment mode you configured:
Models are available immediately after startup via the NVIDIA Build API. List available models and start chatting:
nmp models list
nmp chat nvidia/llama-3.3-nemotron-super-49b-v1
Note
In cloud-only mode, Customization and Safe Synthesizer are not available. To use these features, re-run nmp quickstart configure and select Host GPU.
All platform features are available: inference, Customization, and Safe Synthesizer.
The platform starts but no model is deployed by default. Before you can chat, you must deploy a model. See Deploy Models for instructions.
Once a model is deployed, list available models and start chatting:
nmp models list
nmp chat <your-model>
You have set up NeMo Platform and are ready to get building.
Using a custom image with quickstart#
In some cases you will need to use a custom image with quickstart (using a specific version or an internal one). You can use --image flag to do so.
nmp quickstart up --image=my-override:1.0.0
Recreating your quickstart instance#
If you encounter errors after updating the nemo-platform package, schema changes between versions may be causing conflicts. On startup, the NeMo Platform should run migrations on your data, but if you run into errors and are only using things for testing, you can teardown your old installation and remove the data/configuration.
# Step 1: Destroy the existing quickstart environment (removes all data and config)
nmp quickstart destroy
# Step 2: Reconfigure and restart
nmp quickstart configure
nmp quickstart up
After restarting, re-run any setup steps (creating providers, deploying models, etc.) as your previous environment state has been removed.
Python SDK#
You have also installed the Python SDK, which enables easy use of NeMo Platform from your code. Here is a sample:
from nemo_platform import NeMoPlatform
# Initialize the client
client = NeMoPlatform(
base_url="http://localhost:8080",
workspace="default"
)
# List models
models = client.models.list()
print(models.data)
An asynchronous Python client is also available. Refer to the full SDK reference for more details, or start with one of the example applications.
Initializing the CLI and SDK#
The CLI and SDK each support several ways to specify the platform URL and workspace. This section explains the options and defines the convention used throughout this documentation.
CLI initialization#
The CLI reads connection settings from its config file (~/.config/nmp/config.yaml). There are three common patterns:
1. After nmp quickstart up (default)
nmp quickstart up configures the CLI automatically. All commands use http://localhost:8080 and the default workspace with no extra steps:
nmp models list
2. Connecting to a remote deployment
Use nmp config set to point the CLI at a different deployment:
nmp config set --base-url http://my-cluster:8080 --workspace default
nmp models list
3. Auth-enabled cluster
Use nmp auth login to authenticate. This sets the base URL and stores credentials in the config:
nmp auth login --base-url https://nmp.example.com
nmp models list # subsequent commands use stored credentials
You can also override URL and workspace for a single command without changing the config:
nmp --server http://my-cluster:8080 --workspace my-workspace models list
SDK initialization options#
The SDK constructor accepts these key parameters:
Parameter |
Description |
|---|---|
|
URL of the NeMo Platform deployment |
|
Workspace to operate in |
|
Bearer token for authentication |
There are three common initialization patterns:
1. Explicit URL and workspace (most common for scripts)
import os
from nemo_platform import NeMoPlatform
client = NeMoPlatform(
base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
workspace="default",
)
2. With authentication enabled (for remote clusters)
When connecting to a cluster with auth enabled, pass an access token:
import os
from nemo_platform import NeMoPlatform
client = NeMoPlatform(
base_url=os.environ.get("NMP_BASE_URL", "http://my-cluster:8080"),
workspace="default",
access_token=os.environ.get("NMP_ACCESS_TOKEN"),
)
3. Config-based initialization (uses CLI context)
Calling NeMoPlatform() with no arguments reads your active CLI context from ~/.config/nmp/config.yaml. This is the same context set by nmp auth login and is most convenient when the CLI is already configured:
from nemo_platform import NeMoPlatform
# Uses base_url, workspace, and auth from the active CLI context
client = NeMoPlatform()
Documentation convention#
Throughout this documentation, code examples use:
import os
from nemo_platform import NeMoPlatform
client = NeMoPlatform(
base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
workspace="default",
)
NMP_BASE_URL— Set this environment variable to point to your deployment. If it is not set,http://localhost:8080is used (the default quickstart address).workspace="default"— Examples use thedefaultworkspace. Replace with your own workspace name as needed.
Note
Environment variables like NMP_BASE_URL and NMP_ACCESS_TOKEN are not set automatically. You must set them in your shell or CI environment. The SDK looks for them by name when constructing the client.
Where a doc intentionally uses a different init (for example, auth-focused pages use NeMoPlatform() so the client picks up your logged-in identity), a note explains why.