Quickstart Installation#

The NeMo Platform Python SDK installs both a CLI and a Python SDK which provide convenient access to NeMo Platform. During the early access program, the SDK and platform images are distributed through the NGC private registry under the early access org 0857255566152269.

Prerequisites#

Ensure you have the following prerequisites ready:

Python 3.11 or higher
pip (or uv) and the ngc CLI
Docker 28.3.0 or higher
An NGC account with access to the NeMo Platform early access org (see Step 1: Set Up Your NGC Account and API Key)
A build.nvidia.com API token (used for cloud inference, separate from the NGC key)
Your system has at least 16 GB of available disk space and 8 GB of RAM

Note

The default quickstart (nmp quickstart) uses cloud-only mode (NVIDIA Build API) for inference and requires 16 GB disk and 8 GB RAM. In this mode, Customization and Safe Synthesizer are not available.

If you run nmp quickstart configure and select host-gpu mode, local GPUs are used for inference, fine-tuning, and privacy-compliant synthetic data generation. You will need substantially more resources:

Additional tens to hundreds of GBs of disk space for downloaded model files
16-32 GB of system RAM (recommended 32 GB)
A GPU with 40+ GB VRAM (80 GB recommended for larger models)
Refer to Hardware and Software Requirements for NeMo Platform for detailed hardware requirements

Step 1: Set Up Your NGC Account and API Key#

Create or log in to your NGC account at https://ngc.nvidia.com.

If your personal email is not yet part of an NGC organization, NGC will prompt you to create one before you can mint a personal key. Any name works (for example, <your-name>PersonalOrg).
Accept the invitation to the NeMoMS organization as an external user. If you have not received an invitation, contact your NVIDIA point of contact to request access for the email tied to your NGC account.
Create a personal API key at https://org.ngc.nvidia.com/account/api-keys. When prompted for Services Included, select NVIDIA Private Registry at minimum.
Export the key in the shell where you will run the rest of the install:
```
export NGC_CLI_API_KEY=<your-ngc-api-key>
```

Step 2: Install the CLI and SDK#

The CLI is the easiest and fastest way to deploy NeMo Platform locally and get started. During early access, both the CLI and SDK ship as a wheel in the NGC private registry.

ngc registry resource download-version "0857255566152269/external/nemo-platform-python-sdk:2.0.1"
pip install nemo-platform-python-sdk_v2.0.1/*.whl

Note

This will download and install additional third-party open source software projects. Review the license terms of these open source projects before use.

If you previously installed the nemo-microservices package, uninstall it first to avoid conflicts:

pip uninstall nemo-microservices

You can verify a successful installation and view a list of all available commands in the CLI with:

nmp --help

Refer to the CLI reference for more details.

Step 3: Pull Required Container Images#

The early access NGC org distributes the platform image and the task images that the platform launches at runtime (customizer, GPU tasks, and so on). The runtime cannot fetch these on demand from the private registry, so pull them locally before starting the platform.

Log in to nvcr.io with your NGC key:

echo "${NGC_CLI_API_KEY}" | docker login nvcr.io -u '$oauthtoken' --password-stdin

Pull the platform image and the task images that services launch on demand:

# Platform API (started by `nmp quickstart up`)
docker pull nvcr.io/0857255566152269/external/nmp-api:26.03.1

# Task images launched by services at runtime
docker pull nvcr.io/0857255566152269/external/customizer-automodel:26.03.1
docker pull nvcr.io/0857255566152269/external/customizer-rl:26.03.1
docker pull nvcr.io/0857255566152269/external/customizer-tasks:26.03.1
docker pull nvcr.io/0857255566152269/external/nmp-cpu-tasks:26.03.1
docker pull nvcr.io/0857255566152269/external/nmp-gpu-tasks:26.03.1

Note

You can skip task images for services you do not plan to use. The Customizer service uses customizer-* images; Safe Synthesizer uses nmp-gpu-tasks; Evaluator and Data Designer use nmp-cpu-tasks. The Models service may also pull customizer-tasks when launching local NIMs.

Tip

If a service later fails with 403 Forbidden from nvcr.io while pulling an image (for example, an embedding NIM under nvcr.io/nim/...), the same authentication scope cannot fetch that image from the private registry. Either pre-pull it as above, or use a non-NIM model deployment.

Step 4: Configure and Start Quickstart#

GPU workloads require configuration

By default, quickstart uses cloud-only mode (NVIDIA Build API) for inference. Local GPUs are not used, and Customization and Safe Synthesizer are not available.

To enable local inference, fine-tuning, and privacy-compliant synthetic data generation, configure quickstart and select Host GPU when prompted:

nmp quickstart configure

After configuring for host-gpu mode, GPU device IDs are detected and stored in your quickstart config. For GPU selection and verification, expand GPU Configuration below.

For a non-interactive setup that uses defaults, run:

nmp quickstart configure --auto

GPU Configuration

GPU Configuration Overview

When running GPU workloads locally (such as inference, Customization, and Safe Synthesizer), you configure which GPU devices are available via nmp quickstart configure. The chosen GPU list is stored in ~/.config/nmp/quickstart.yaml as reserved_gpu_device_ids (a comma-separated string, e.g. "0,1,2") and passed into the container at startup.

Note

This configuration creates a shared GPU pool for services that run GPU workloads, such as the jobs service (for training, evaluation, and other GPU jobs) and the models service (for local NIM deployments). The platform will not over-schedule workloads - if all configured GPUs are in use, new workloads will wait until a GPU becomes available.

To see available GPU device IDs on your system:

nvidia-smi --list-gpus

Understanding Quickstart Configuration Files

When you run nmp quickstart configure, the following files are created or updated:

File	Purpose
`~/.config/nmp/config.yaml`	CLI context configuration (clusters, users, workspaces).
`~/.config/nmp/quickstart.yaml`	Quickstart settings (image, NGC key, GPU mode, GPU device IDs when using host-gpu).

When you select host-gpu, the detected (or filtered) GPU list is written to reserved_gpu_device_ids in ~/.config/nmp/quickstart.yaml.

Using All GPUs (Default)

When you select host-gpu during nmp quickstart configure, all detected NVIDIA GPUs are used by default. The list is stored automatically; no extra steps are required.

# 1. Configure for GPU mode
nmp quickstart configure
# Select "host-gpu" when prompted (GPUs are detected and saved)

# 2. Start quickstart - all detected GPUs are available
nmp quickstart up --image nvcr.io/0857255566152269/external/nmp-api:26.03.1

If you have the CUDA_VISIBLE_DEVICES environment variable set when you run configure, the default list is the intersection of that list with the detected GPUs (integer device IDs only), so you can limit GPUs from the shell.

Using Specific GPUs

To use a subset of GPUs (for example, to reserve some for other applications):

Run nmp quickstart configure and select host-gpu (detection pre-populates the list).
When asked Save configuration?, choose Configure advanced options.
In GPU device IDs, enter a comma-separated list (e.g. 0,1 or 0, 1, 2) and save.

The value is stored in ~/.config/nmp/quickstart.yaml as reserved_gpu_device_ids: "0,1" (or your chosen list).

Important

If host-gpu was enabled previously but reserved_gpu_device_ids was never set (for example, an older config file), nmp quickstart up will fail and ask you to re-run nmp quickstart configure to set the GPU list.

Verifying GPU Detection

After starting quickstart, verify GPUs were detected:

docker logs nmp-quickstart 2>&1 | grep -i gpu

You should see a log line containing: SharedResourceManager: Creating GPU pool with N GPU(s)

If you see “No GPUs detected”:

Ensure you ran nmp quickstart configure and selected host-gpu
Verify Docker can access GPUs: docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi

Note

The GPU pool is shared between services that run GPU workloads (jobs service, models service). The platform coordinates GPU allocation to prevent over-scheduling—if all GPUs are in use, new workloads will wait until a GPU becomes available.

Start the platform, pointing --image at the early access image you pulled in Step 3: Pull Required Container Images:

nmp quickstart up --image nvcr.io/0857255566152269/external/nmp-api:26.03.1

This starts the quickstart version of the NeMo Platform. Because you pre-pulled the image, startup completes within a few seconds; without the pre-pull it would also attempt the pull here.

What happens next depends on the deployment mode you configured:

Cloud only (default)

Models are available immediately after startup via the NVIDIA Build API once you complete Step 5: Configure Cloud Inference. List available models and start chatting:

nmp models list
nmp chat openai/gpt-oss-20b "capital of france" --provider nvidia

Note

In cloud-only mode, Customization and Safe Synthesizer are not available. To use these features, re-run nmp quickstart configure and select Host GPU.

Host GPU

All platform features are available: inference, Customization, and Safe Synthesizer.

The platform starts but no model is deployed by default. Before you can chat, you must deploy a model. See Deploy Models for instructions.

Once a model is deployed, list available models and start chatting:

nmp models list
nmp chat <your-model>

Step 5: Configure Cloud Inference#

The default quickstart calls out to NVIDIA Build for inference, which uses a separate token from your NGC key. Generate a token at build.nvidia.com, then register it with the platform.

Export the build.nvidia.com token:

export NVIDIA_API_TOKEN=<your-build.nvidia.com-token>

Create a secret from the token:

echo "$NVIDIA_API_TOKEN" | nmp secrets create --name "nvidia-api-key" --from-file -

Register the inference provider that points at NVIDIA Build:

nmp inference providers create \
    --name "nvidia" \
    --host-url "https://integrate.api.nvidia.com" \
    --api-key-secret-name "nvidia-api-key"

Try a chat completion:

nmp chat openai/gpt-oss-20b "capital of france" --provider nvidia

You have set up NeMo Platform and are ready to get building.

Using a custom image with quickstart#

In some cases you will need to use a custom image with quickstart (for example, a different version, or a build from a fork). Pass --image to nmp quickstart up:

nmp quickstart up --image=my-override:1.0.0

Recreating your quickstart instance#

If you encounter errors after updating to a new platform image, schema changes between versions may be causing conflicts. On startup, the NeMo Platform should run migrations on your data, but if you run into errors and are only using things for testing, you can teardown your old installation and remove the data/configuration.

# Step 1: Destroy the existing quickstart environment (removes all data and config)
nmp quickstart destroy

# Step 2: Reconfigure and restart
nmp quickstart configure
nmp quickstart up --image nvcr.io/0857255566152269/external/nmp-api:26.03.1

After restarting, re-run any setup steps (creating providers, deploying models, etc.) as your previous environment state has been removed.

Python SDK#

You have also installed the Python SDK, which enables easy use of NeMo Platform from your code. Here is a sample:

from nemo_platform import NeMoPlatform

# Initialize the client
client = NeMoPlatform(
    base_url="http://localhost:8080",
    workspace="default"
)

# List models
models = client.models.list()
print(models.data)

An asynchronous Python client is also available. Refer to the full SDK reference for more details, or start with one of the example applications.

Initializing the CLI and SDK#

The CLI and SDK each support several ways to specify the platform URL and workspace. This section explains the options and defines the convention used throughout this documentation.

CLI initialization#

The CLI reads connection settings from its config file (~/.config/nmp/config.yaml). There are three common patterns:

1. After nmp quickstart up (default)

nmp quickstart up configures the CLI automatically. All commands use http://localhost:8080 and the default workspace with no extra steps:

nmp models list

2. Connecting to a remote deployment

Use nmp config set to point the CLI at a different deployment:

nmp config set --base-url http://my-cluster:8080 --workspace default
nmp models list

3. Auth-enabled cluster

Use nmp auth login to authenticate. This sets the base URL and stores credentials in the config:

nmp auth login --base-url https://nmp.example.com
nmp models list  # subsequent commands use stored credentials

You can also override URL and workspace for a single command without changing the config:

nmp --server http://my-cluster:8080 --workspace my-workspace models list

SDK initialization options#

The SDK constructor accepts these key parameters:

Parameter	Description
`base_url`	URL of the NeMo Platform deployment
`workspace`	Workspace to operate in
`access_token`	Bearer token for authentication

There are three common initialization patterns:

1. Explicit URL and workspace (most common for scripts)

import os
from nemo_platform import NeMoPlatform

client = NeMoPlatform(
    base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
    workspace="default",
)

2. With authentication enabled (for remote clusters)

When connecting to a cluster with auth enabled, pass an access token:

import os
from nemo_platform import NeMoPlatform

client = NeMoPlatform(
    base_url=os.environ.get("NMP_BASE_URL", "http://my-cluster:8080"),
    workspace="default",
    access_token=os.environ.get("NMP_ACCESS_TOKEN"),
)

3. Config-based initialization (uses CLI context)

Calling NeMoPlatform() with no arguments reads your active CLI context from ~/.config/nmp/config.yaml. This is the same context set by nmp auth login and is most convenient when the CLI is already configured:

from nemo_platform import NeMoPlatform

# Uses base_url, workspace, and auth from the active CLI context
client = NeMoPlatform()

Documentation convention#

Throughout this documentation, code examples use:

import os
from nemo_platform import NeMoPlatform

client = NeMoPlatform(
    base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
    workspace="default",
)

NMP_BASE_URL — Set this environment variable to point to your deployment. If it is not set, http://localhost:8080 is used (the default quickstart address).
workspace="default" — Examples use the default workspace. Replace with your own workspace name as needed.

Note

Environment variables like NMP_BASE_URL and NMP_ACCESS_TOKEN are not set automatically. You must set them in your shell or CI environment. The SDK looks for them by name when constructing the client.

Where a doc intentionally uses a different init (for example, auth-focused pages use NeMoPlatform() so the client picks up your logged-in identity), a note explains why.