Is this page helpful?

Model Download#

NIM LLM downloads model artifacts at container startup using a manifest-driven approach. The model manifest describes which files to download and where to download them from. The following model sources are supported:

Note

Model download is independent of the inference backend. The commands and URI schemes in this guide work identically for both the vLLM and SGLang container images — only the image tag changes.

Source	URI Scheme
NGC	`ngc://`
Hugging Face	`hf://`
Amazon S3	`s3://`
Google Cloud Storage	`gs://`
ModelScope	`modelscope://`
Local Storage	`local://`

NGC#

Configure NGC as a model source for NIM with the following information.

URI Format#

Use the following URI format for NGC models.

ngc://{org}/{team}/{model}:{version}?file={filename}

Authentication#

NGC uses API key authentication. NIM checks the following environment variables in order of priority:

Variable	Priority	Description
`NGC_CLI_API_KEY`	1 (highest)	Backward compatibility with NGC CLI.
`NGC_API_KEY`	2	Recommended for container deployments.

Both Personal API Keys (starting with nvapi-) and Legacy API Keys are supported.

export NGC_API_KEY="nvapi-xxxxxxxxxxxxxxxxxxxxxx"

Downloading from NGC#

Use the following steps to download a model from NGC.

Set the required environment variables:

export NGC_API_KEY="nvapi-xxxxxxxxxxxxxxxxxxxxxx"
export NIM_MODEL_PROFILE=${NIM_MODEL_PROFILE}

Run the download command:

docker run --rm --gpus all \
  -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
  -e NGC_API_KEY \
  -e NIM_MODEL_PROFILE \
  ${NIM_LLM_IMAGE} download-to-cache --profile $NIM_MODEL_PROFILE

Downloading Behind a Corporate Proxy#

Use the following steps to download a model from NGC behind a corporate proxy.

Set the required environment variables:

export NGC_API_KEY="nvapi-xxxxxxxxxxxxxxxxxxxxxx"
export NIM_MODEL_PROFILE="${YOUR_NIM_MODEL_PROFILE}"
export HTTPS_PROXY=http://proxy.corp.example.com:8080

Run the download command:

docker run --rm --gpus all \
  -e NGC_API_KEY \
  -e NIM_MODEL_PROFILE \
  -e HTTPS_PROXY \
  -p 8000:8000 \
  ${NIM_LLM_IMAGE} download-to-cache --profile $NIM_MODEL_PROFILE

Hugging Face#

Configure Hugging Face as a model source for NIM with the following information.

URI Format#

Use the following URI format for Hugging Face models.

hf://{org}/{model}:{revision}?file={filename}

Authentication#

Hugging Face uses a bearer token for authentication:

Variable	Description
`HF_TOKEN`	API token for private or gated models. Recommended even for public models to avoid rate limiting.

Additional Environment Variables#

The following environment variables configure cache location and hub endpoint for Hugging Face downloads:

Variable	Default	Description
`HF_HOME`	`~/.cache/huggingface`	Root directory for the local Hugging Face cache. When both `HF_HOME` and `NIM_CACHE_PATH` are set, `HF_HOME` must equal `$NIM_CACHE_PATH/huggingface/hub`.
`HF_ENDPOINT`	`https://huggingface.co`	Base URL of the Hugging Face Hub API. Set this to redirect downloads to a private Enterprise Hub instance, a mirror, or a local proxy server.

Hugging Face-Compatible Proxies#

NIM supports Hugging Face-compatible proxy servers (such as Olah) for local caching:

export HF_ENDPOINT="http://localhost:8090"

Downloading from Hugging Face#

Use the following steps to download a model from Hugging Face.

Set the required environment variables:

export HF_TOKEN="hf_your_token_here"
export NIM_MODEL_PROFILE="${YOUR_NIM_MODEL_PROFILE}"

Run the download command:

docker run --rm --gpus all \
  -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
  -e HF_TOKEN \
  -e NIM_MODEL_PROFILE \
  ${NIM_LLM_IMAGE} download-to-cache --profile $NIM_MODEL_PROFILE

Amazon S3#

Configure Amazon S3 or S3-compatible storage as a model source for NIM with the following information.

URI Format#

Use the following URI format for Amazon S3 objects.

s3://{bucket}/{key}

Authentication#

NIM uses the standard AWS credential provider chain, which discovers credentials in the following order of precedence:

Credential Source	Priority	Use Case
Environment variables (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`)	1 (highest)	Container deployments
AWS credentials file (`~/.aws/credentials`)	2	Local development
IAM instance profile	3	EC2 instances
ECS container credentials	4	ECS tasks

export AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
export AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
export AWS_REGION=us-east-1

S3-Compatible Storage#

To use an S3-compatible service (MinIO, Ceph, Oracle OCI, and so on), set the AWS_ENDPOINT_URL environment variable to redirect S3 API calls to your provider:

export AWS_ENDPOINT_URL=http://minio.internal:9000

When no custom endpoint is set, NIM uses standard AWS S3 endpoints with virtual-hosted-style addressing by default. Some S3-compatible services require path-style addressing instead. The following variables control the addressing style:

Variable	Values	Default	Description
`AWS_S3_ADDRESSING_STYLE`	`virtual`, `path`	`virtual`	Selects the S3 addressing style explicitly.
`AWS_S3_USE_PATH_STYLE`	`True`, `False`	`False`	Forces path-style addressing. Required for services such as MinIO and Ceph.

Note

NIM automatically detects Oracle OCI Object Storage endpoints and enables path-style addressing without additional configuration.

Mirroring NGC Models to S3#

S3 is generic object storage without model-aware upload tools. NIM provides the mirror s3 command to copy NGC models to S3 with the correct percent-encoded keys required by NIM_REPOSITORY_OVERRIDE:

Set the NGC API key:

export NGC_API_KEY="nvapi-xxxxxxxxxxxxxxxxxxxxxx"

Run the mirror command:

docker run --rm --gpus all \
  -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
  -e NGC_API_KEY \
  -e AWS_ACCESS_KEY_ID \
  -e AWS_SECRET_ACCESS_KEY \
  -e AWS_DEFAULT_REGION \
  ${NIM_LLM_IMAGE} mirror s3 \
    --manifest model_manifest.yaml \
    --bucket my-models-bucket

Downloading from S3#

Use the following steps to download a model from S3.

Set the required environment variables:

export AWS_ACCESS_KEY_ID="${YOUR_AWS_ACCESS_KEY}"
export AWS_SECRET_ACCESS_KEY="${YOUR_AWS_SECRET_KEY}"
export AWS_REGION=us-east-1
export NIM_MODEL_PROFILE="${YOUR_NIM_MODEL_PROFILE}"

Run the download command:

docker run --rm --gpus all \
  -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
  -e AWS_ACCESS_KEY_ID \
  -e AWS_SECRET_ACCESS_KEY \
  -e AWS_REGION \
  -e NIM_MODEL_PROFILE \
  ${NIM_LLM_IMAGE} download-to-cache --profile $NIM_MODEL_PROFILE

Redirecting NGC Downloads to S3#

NIM_REPOSITORY_OVERRIDE redirects NGC URIs in the manifest to your S3 bucket at runtime:

export NIM_REPOSITORY_OVERRIDE="s3://my-models-bucket"
export AWS_ACCESS_KEY_ID="${YOUR_AWS_ACCESS_KEY}"
export AWS_SECRET_ACCESS_KEY="${YOUR_AWS_SECRET_KEY}"

docker run --rm -it --gpus all \
  -e NIM_REPOSITORY_OVERRIDE \
  -e AWS_ACCESS_KEY_ID \
  -e AWS_SECRET_ACCESS_KEY \
  -p 8000:8000 \
  ${NIM_LLM_IMAGE}

Note

When using NIM_REPOSITORY_OVERRIDE with S3, model assets must be stored with percent-encoded keys. Use the mirror s3 command to upload models with the correct encoding.

Google Cloud Storage#

Configure Google Cloud Storage as a model source for NIM with the following information.

URI Format#

Use the following URI format for Google Cloud Storage objects.

gs://{bucket}/{key}

Authentication#

GCS uses Google’s Application Default Credentials (ADC), which discovers credentials from the following sources:

GOOGLE_APPLICATION_CREDENTIALS environment variable (path to a service account key file)
Vertex AI managed environment credentials
Google Compute Engine metadata service
Cloud Run environment
gcloud CLI user credentials

For example, to set GOOGLE_APPLICATION_CREDENTIALS:

export GOOGLE_APPLICATION_CREDENTIALS="/path/to/sa.json"

Mirroring NGC Models to GCS#

GCS is generic object storage without model-aware upload tools. NIM provides the mirror gcs command to copy NGC models to GCS with the correct percent-encoded keys required by NIM_REPOSITORY_OVERRIDE:

export NGC_API_KEY="nvapi-xxxxxxxxxxxxxxxxxxxxxx"
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/sa.json"

docker run --rm --gpus all \
  -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
  -v /path/to/sa.json:/credentials/sa.json:ro \
  -e NGC_API_KEY \
  -e GOOGLE_APPLICATION_CREDENTIALS=/credentials/sa.json \
  ${NIM_LLM_IMAGE} mirror gcs \
    --manifest model_manifest.yaml \
    --bucket my-models-bucket

Downloading from GCS#

Use the following steps to download a model from GCS.

Set the required environment variables:

export GOOGLE_APPLICATION_CREDENTIALS="/path/to/sa.json"
export NIM_MODEL_PROFILE="${YOUR_NIM_MODEL_PROFILE}"

Run the download command:

docker run --rm --gpus all \
  -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
  -v /path/to/sa.json:/credentials/sa.json:ro \
  -e GOOGLE_APPLICATION_CREDENTIALS=/credentials/sa.json \
  -e NIM_MODEL_PROFILE \
  ${NIM_LLM_IMAGE} download-to-cache --profile $NIM_MODEL_PROFILE

Redirecting NGC Downloads to GCS#

Set the required environment variables:

export GOOGLE_APPLICATION_CREDENTIALS="/path/to/sa.json"
export NIM_REPOSITORY_OVERRIDE="gs://my-models-bucket"

Start the container:

docker run --rm -it --gpus all \
  -e NIM_REPOSITORY_OVERRIDE \
  -e GOOGLE_APPLICATION_CREDENTIALS=/credentials/sa.json \
  -v /path/to/sa.json:/credentials/sa.json:ro \
  -p 8000:8000 \
  ${NIM_LLM_IMAGE}

ModelScope#

Configure ModelScope as a model source for NIM with the following information.

URI Format#

Use the following URI format for ModelScope models.

modelscope://{org}/{model}:{revision}?file={filename}

Note

ModelScope uses master as the default branch, unlike Hugging Face which uses main.

Authentication#

ModelScope authentication uses the following environment variable:

Variable	Description
`MODELSCOPE_API_TOKEN`	API token for private models. Recommended even for public models to avoid rate limiting.

ModelScope-Compatible Proxies#

NIM supports custom ModelScope endpoints for local caching:

export MODELSCOPE_ENDPOINT="http://your-modelscope-proxy:port"

Additional Environment Variables#

The following environment variables configure ModelScope-specific behavior:

Variable	Default	Description
`MODELSCOPE_CACHE`	`$NIM_CACHE_PATH/modelscope/hub`	ModelScope cache directory. Must be consistent with `NIM_CACHE_PATH` when both are set.

Downloading from ModelScope#

Use the following steps to download a model from ModelScope.

Set the required environment variables:

export MODELSCOPE_API_TOKEN="your_token_here"
export NIM_MODEL_PROFILE="${YOUR_NIM_MODEL_PROFILE}"

Run the download command:

docker run --rm --gpus all \
  -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
  -e MODELSCOPE_API_TOKEN \
  -e NIM_MODEL_PROFILE \
  ${NIM_LLM_IMAGE} download-to-cache --profile $NIM_MODEL_PROFILE

Local Storage#

Work with models that are already available on a local filesystem with the following information.

URI Format#

Use the following URI format for local filesystem paths.

local://{absolute_path}

The following examples show valid local URI formats:

local:///mnt/models/llama-3.1-8b — references a model directory.
local:///mnt/models/llama-3.1-8b?file=config.json — references a specific file.

Note

Only the local:// URI scheme is supported for local filesystem access. Alternative notations (such as file:// or bare paths) are not supported.

How It Works#

Unlike remote sources, Local Storage does not download files. Instead, it creates symlinks in the workspace pointing directly to the source files. This means:

Zero additional disk usage.
Near-instant setup (milliseconds for symlink creation vs. minutes or hours for network transfers).
Source files must remain accessible for the duration of the NIM deployment.

Preparing a Local Model Store#

The create-model-store command downloads a model from a remote source (such as NGC) and creates a properly formatted local model store:

export NGC_API_KEY="nvapi-xxxxxxxxxxxxxxxxxxxxxx"
export NIM_MODEL_PROFILE="${YOUR_NIM_MODEL_PROFILE}"

docker run --rm --gpus all \
  -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
  -v "$(pwd)/model-store:/opt/nim/model-store" \
  -e NGC_API_KEY \
  -e NIM_MODEL_PROFILE \
  ${NIM_LLM_IMAGE} create-model-store \
    --model-cache-path /opt/nim/.cache \
    --model-store /opt/nim/model-store \
    --profile $NIM_MODEL_PROFILE

Using a Local Model#

Use the following steps to work with a local model source.

Set the required environment variable:

export NIM_MODEL_PROFILE="${YOUR_NIM_MODEL_PROFILE}"

Run the command:

docker run --rm --gpus all \
  -v /mnt/models:/mnt/models:ro \
  -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
  -e NIM_MODEL_PROFILE \
  ${NIM_LLM_IMAGE} download-to-cache --profile $NIM_MODEL_PROFILE

Common Configuration#

The following environment variables apply across all model sources.

Cache Path#

Variable	Default	Description
`NIM_CACHE_PATH`	`/opt/nim/.cache`	Directory used for model assets in the container. Must be a valid, writable path.

Repository Override#

NIM_REPOSITORY_OVERRIDE redirects model downloads from the default NGC source to an alternative storage location at runtime.

The following protocols are supported as override targets:

Protocol	Description
`ngc://`	NGC-compatible proxy servers.
`s3://`	AWS S3 and S3-compatible storage.
`gs://`	Google Cloud Storage.
`http://`, `https://`	HTTP/HTTPS servers.

export NIM_REPOSITORY_OVERRIDE="s3://my-bucket/nim-models"

Note

NIM_REPOSITORY_OVERRIDE supports object storage (S3/GCS) and HTTP(S) servers. It does not support model registries (hf://, modelscope://) as override targets. To use Hugging Face or ModelScope models, specify the corresponding URIs directly in the model manifest.

NIM_REPOSITORY_OVERRIDE vs NIM_MODEL_PATH#

These two environment variables serve different purposes for sourcing models from cloud storage:

Aspect	`NIM_REPOSITORY_OVERRIDE`	`NIM_MODEL_PATH`
Purpose	Mirror NGC models to your own storage	Serve custom/fine-tuned models directly
Manifest	Uses built-in NGC manifest	No manifest required
Upload format	Percent-encoded flat keys	Standard directory structure
Checksum verification	Yes (from NGC manifest)	No
Use case	Air-gapped deployments, NGC mirrors	Custom models, fine-tuned weights

When to use NIM_REPOSITORY_OVERRIDE: You want to serve an NGC model but download from your own S3/GCS bucket instead of NGC (for air-gapped environments, lower latency, or cost optimization). Refer to Examples section.

When to use NIM_MODEL_PATH: You have a custom or fine-tuned model that was never on NGC and want to serve it directly from cloud storage. Refer to Examples section.

Disabling Model Download#

The following environment variable controls model download behavior:

Variable	Default	Description
`NIM_DISABLE_MODEL_DOWNLOAD`	`False`	Disable model download on container startup. Useful for multi-node scenarios where only one node needs to download.

Proxy Support#

For environments behind corporate firewalls, NIM honors the standard proxy environment variables:

export HTTPS_PROXY=http://proxy.mycorp.com:8080
export HTTP_PROXY=http://proxy.mycorp.com:8080
export NO_PROXY=localhost,127.0.0.1

If your corporate proxy performs TLS inspection (re-signing HTTPS traffic with a private CA), you must also provide the corporate CA certificate so that TLS verification succeeds during model downloads. Mount a combined CA bundle (default public CAs and your corporate CA) and set SSL_CERT_FILE:

docker run --rm --gpus all \
  -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
  -v ./combined-ca-bundle.pem:/etc/ssl/certs/custom-ca-bundle.pem:ro \
  -e SSL_CERT_FILE=/etc/ssl/certs/custom-ca-bundle.pem \
  -e REQUESTS_CA_BUNDLE=/etc/ssl/certs/custom-ca-bundle.pem \
  -e HTTPS_PROXY=http://proxy.mycorp.com:8080 \
  -e NGC_API_KEY \
  ${NIM_LLM_IMAGE} download-to-cache --all

For details on creating the combined bundle and the full list of outbound TLS variables, refer to Environment Variables: Outbound TLS.

CLI Commands#

Use the following CLI commands to download model profiles, create a local model store, or mirror models to object storage.

`download-to-cache`#

Downloads model profiles to the NIM cache. The container downloads assets unless --use-cache is passed.

usage: download-to-cache [-h] [--all] [--profiles [PROFILES ...]]
                         [--lora] [--model-uri MODEL_URI]
                         [--manifest-file MANIFEST_FILE]
                         [--model-cache-path MODEL_CACHE_PATH]
                         [--use-cache] [--verify-checksums]

The following arguments are available:

Argument	Description
`--all`	Download all profiles to cache.
`--profiles [PROFILES ...]`	Profile(s) to download. If omitted, the optimal profile is downloaded.
`--lora`	Download the default LoRA profile. Cannot be combined with `--profiles` or `--all`.
`--model-uri MODEL_URI`	Model URI to download. Supported schemes: `ngc://`, `hf://`, `s3://`, `gs://`, `modelscope://`, `local://`.
`--manifest-file MANIFEST_FILE`	Manifest file path.
`--model-cache-path MODEL_CACHE_PATH`	Directory path of model cache.
`--use-cache`	Check for cached assets before downloading from the repo.
`--verify-checksums`	Verify downloaded files match the checksums in the manifest (enabled by default).

`create-model-store`#

Creates a properly formatted model store directory from a cached model profile.

usage: create-model-store [-h] --profile PROFILE --model-store MODEL_STORE
                          [--model-cache-path MODEL_CACHE_PATH]
                          [--use-cache]

The following arguments are available:

Argument	Description
`--profile PROFILE`	Profile hash to create a model directory for (required).
`--model-store MODEL_STORE`	Directory path where the model profile is extracted and copied to (required).
`--model-cache-path MODEL_CACHE_PATH`	Directory path of model cache.
`--use-cache`	Check for cached assets before downloading from the repo.

`mirror`#

Mirrors model profiles from NGC to a destination object storage bucket. This command downloads profiles to the local cache and uploads them with the percent-encoded keys required by NIM_REPOSITORY_OVERRIDE.

usage: mirror {s3,gcs} [-h] -b BUCKET
                        [--profiles [PROFILES ...]]
                        [--manifest-file MANIFEST_FILE]
                        [--model-cache-path MODEL_CACHE_PATH]

The following arguments are available:

Argument	Description
`s3` or `gcs`	Destination storage type (positional, required).
`-b BUCKET`, `--bucket BUCKET`	Destination bucket name (required).
`--profiles [PROFILES ...]`	Profile(s) to mirror. If omitted, the default profile is mirrored.
`--manifest-file MANIFEST_FILE`	Manifest file path.
`--model-cache-path MODEL_CACHE_PATH`	Directory path of model cache.

Examples#

Use the following examples to download, mirror, and serve models from supported sources.

NGC#

Use the following examples to download model profiles from NGC, including through a corporate proxy.

Basic NGC Download#

Use this example to download a specific NGC model profile to the local cache.

export NGC_API_KEY="${YOUR_NGC_API_KEY}"
export NIM_MODEL_PROFILE=${NIM_MODEL_PROFILE}

docker run --rm --gpus all \
  -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
  -e NGC_API_KEY \
  -e NIM_MODEL_PROFILE \
  ${NIM_LLM_IMAGE} download-to-cache --profile $NIM_MODEL_PROFILE

Download Behind Corporate Proxy#

Use this example to download NGC model profiles from behind a corporate proxy.

export NGC_API_KEY="${YOUR_NGC_API_KEY}"
export HTTPS_PROXY=http://proxy.corp.example.com:8080
export NO_PROXY=localhost,127.0.0.1

docker run --rm --gpus all \
  -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
  -e NGC_API_KEY \
  -e HTTPS_PROXY \
  -e NO_PROXY \
  ${NIM_LLM_IMAGE} download-to-cache --all

If your proxy uses a private CA for TLS inspection, add the CA bundle:

docker run --rm --gpus all \
  -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
  -v ./combined-ca-bundle.pem:/etc/ssl/certs/custom-ca-bundle.pem:ro \
  -e NGC_API_KEY \
  -e HTTPS_PROXY \
  -e NO_PROXY \
  -e SSL_CERT_FILE=/etc/ssl/certs/custom-ca-bundle.pem \
  -e REQUESTS_CA_BUNDLE=/etc/ssl/certs/custom-ca-bundle.pem \
  ${NIM_LLM_IMAGE} download-to-cache --all

For information on how to create the combined bundle, refer to Environment Variables: Outbound TLS.

Hugging Face#

Use the following examples to download model profiles from Hugging Face or by using a Hugging Face-compatible proxy.

Basic Hugging Face Download#

Use this example to download a specific Hugging Face model profile to the local cache.

export HF_TOKEN="${YOUR_HF_TOKEN}"
export NIM_MODEL_PROFILE=${NIM_MODEL_PROFILE}

docker run --rm --gpus all \
  -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
  -e HF_TOKEN \
  -e NIM_MODEL_PROFILE \
  ${NIM_LLM_IMAGE} download-to-cache --profile $NIM_MODEL_PROFILE

Using Hugging Face Proxy (Olah)#

Use this example to route Hugging Face requests through an Olah proxy.

export HF_ENDPOINT="http://localhost:8090"
export HF_TOKEN="${YOUR_HF_TOKEN}"

docker run --rm --gpus all \
  --network=host \
  -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
  -e HF_ENDPOINT \
  -e HF_TOKEN \
  ${NIM_LLM_IMAGE}

Amazon S3#

Use the following examples to serve models from Amazon S3, mirror NGC models to S3, or use S3-compatible storage.

Download with NIM_REPOSITORY_OVERRIDE#

Use this example to serve a model from an Amazon S3 bucket by using NIM_REPOSITORY_OVERRIDE.

export AWS_ACCESS_KEY_ID="${YOUR_AWS_ACCESS_KEY}"
export AWS_SECRET_ACCESS_KEY="${YOUR_AWS_SECRET_KEY}"
export AWS_REGION=us-east-1
export NIM_REPOSITORY_OVERRIDE=s3://my-models-bucket

docker run --rm -it --gpus all \
  -e NIM_REPOSITORY_OVERRIDE \
  -e AWS_ACCESS_KEY_ID \
  -e AWS_SECRET_ACCESS_KEY \
  -e AWS_REGION \
  -p 8000:8000 \
  ${NIM_LLM_IMAGE}

Mirror NGC to S3#

Use this example to mirror NGC model assets to an Amazon S3 bucket.

export NGC_API_KEY="${YOUR_NGC_API_KEY}"
export AWS_ACCESS_KEY_ID="${YOUR_AWS_ACCESS_KEY}"
export AWS_SECRET_ACCESS_KEY="${YOUR_AWS_SECRET_KEY}"
export AWS_DEFAULT_REGION=us-east-1

docker run --rm --gpus all \
  -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
  -e NGC_API_KEY \
  -e AWS_ACCESS_KEY_ID \
  -e AWS_SECRET_ACCESS_KEY \
  -e AWS_DEFAULT_REGION \
  ${NIM_LLM_IMAGE} mirror s3 \
    --manifest model_manifest.yaml \
    --bucket my-models-bucket

S3-Compatible Storage (MinIO)#

Use this example to serve a model from MinIO or another S3-compatible object store.

export AWS_ENDPOINT_URL=http://minio.internal:9000
export AWS_ACCESS_KEY_ID="${YOUR_MINIO_ACCESS_KEY}"
export AWS_SECRET_ACCESS_KEY="${YOUR_MINIO_SECRET_KEY}"
export AWS_REGION=us-east-1
export AWS_S3_USE_PATH_STYLE=true
export NIM_REPOSITORY_OVERRIDE=s3://my-bucket

docker run --rm -it --gpus all \
  -e NIM_REPOSITORY_OVERRIDE \
  -e AWS_ENDPOINT_URL \
  -e AWS_ACCESS_KEY_ID \
  -e AWS_SECRET_ACCESS_KEY \
  -e AWS_REGION \
  -e AWS_S3_USE_PATH_STYLE \
  -p 8000:8000 \
  ${NIM_LLM_IMAGE}

Google Cloud Storage#

Use the following examples to serve models from Google Cloud Storage, mirror NGC models to GCS, or download models by using direct GCS URIs in a manifest.

Download with NIM_REPOSITORY_OVERRIDE#

Use this example to serve a model from a Google Cloud Storage bucket by using NIM_REPOSITORY_OVERRIDE.

export GOOGLE_APPLICATION_CREDENTIALS="/path/to/sa.json"
export NIM_REPOSITORY_OVERRIDE=gs://my-models-bucket

docker run --rm -it --gpus all \
  -e NIM_REPOSITORY_OVERRIDE \
  -e GOOGLE_APPLICATION_CREDENTIALS=/credentials/sa.json \
  -v /path/to/sa.json:/credentials/sa.json:ro \
  -p 8000:8000 \
  ${NIM_LLM_IMAGE}

Mirror NGC to GCS#

Use this example to mirror NGC model assets to a Google Cloud Storage bucket.

export NGC_API_KEY="${YOUR_NGC_API_KEY}"
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/sa.json"

docker run --rm --gpus all \
  -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
  -v /path/to/sa.json:/credentials/sa.json:ro \
  -e NGC_API_KEY \
  -e GOOGLE_APPLICATION_CREDENTIALS=/credentials/sa.json \
  ${NIM_LLM_IMAGE} mirror gcs \
    --manifest model_manifest.yaml \
    --bucket my-models-bucket

Direct GCS URI in Manifest#

Use this example to download a model profile when the manifest already contains direct gs:// URIs.

export GOOGLE_APPLICATION_CREDENTIALS="/path/to/sa.json"
export NIM_MODEL_PROFILE=${NIM_MODEL_PROFILE}

docker run --rm --gpus all \
  -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
  -v /path/to/sa.json:/credentials/sa.json:ro \
  -e GOOGLE_APPLICATION_CREDENTIALS=/credentials/sa.json \
  -e NIM_MODEL_PROFILE \
  ${NIM_LLM_IMAGE} download-to-cache --profile $NIM_MODEL_PROFILE

Local Storage#

Use the following examples to serve models from local storage, run air-gapped deployments, or create a local model store from cached assets.

Using NIM_MODEL_PATH (No Manifest)#

Use this example to serve a local model directory by using NIM_MODEL_PATH without a manifest.

docker run --rm --gpus all \
  -v /mnt/models/llama-3.1-8b:/opt/nim/model:ro \
  -e NIM_MODEL_PATH=/opt/nim/model \
  -p 8000:8000 \
  ${NIM_LLM_IMAGE}

Air-Gapped Deployment#

Use this example to run a model from local storage in an air-gapped environment.

docker run --rm --gpus all \
  --network none \
  -v /opt/models/llama-3.1-8b:/opt/nim/model:ro \
  -e NIM_MODEL_PATH=/opt/nim/model \
  -p 8000:8000 \
  ${NIM_LLM_IMAGE}

Create Model Store from NGC#

Use this example to create a local model store from a cached NGC model profile.

export NGC_API_KEY="${YOUR_NGC_API_KEY}"
export NIM_MODEL_PROFILE=${NIM_MODEL_PROFILE}

docker run --rm --gpus all \
  -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
  -v "$(pwd)/model-store:/opt/nim/model-store" \
  -e NGC_API_KEY \
  -e NIM_MODEL_PROFILE \
  ${NIM_LLM_IMAGE} create-model-store \
    --model-cache-path /opt/nim/.cache \
    --model-store /opt/nim/model-store \
    --profile $NIM_MODEL_PROFILE

ModelScope#

Use the following example to download model profiles from ModelScope.

Basic ModelScope Download#

Use this example to download a specific ModelScope model profile to the local cache.

export MODELSCOPE_API_TOKEN="${YOUR_MODELSCOPE_TOKEN}"
export NIM_MODEL_PROFILE=${NIM_MODEL_PROFILE}

docker run --rm --gpus all \
  -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
  -e MODELSCOPE_API_TOKEN \
  -e NIM_MODEL_PROFILE \
  ${NIM_LLM_IMAGE} download-to-cache --profile $NIM_MODEL_PROFILE

Model Download#

NGC#

URI Format#

Authentication#

Downloading from NGC#

Downloading Behind a Corporate Proxy#

Hugging Face#

URI Format#

Authentication#

Additional Environment Variables#

Hugging Face-Compatible Proxies#

Downloading from Hugging Face#

Amazon S3#

URI Format#

Authentication#

S3-Compatible Storage#

Mirroring NGC Models to S3#

Downloading from S3#

Redirecting NGC Downloads to S3#

Google Cloud Storage#

URI Format#

Authentication#

Mirroring NGC Models to GCS#

Downloading from GCS#

Redirecting NGC Downloads to GCS#

ModelScope#

URI Format#

Authentication#

ModelScope-Compatible Proxies#

Additional Environment Variables#

Downloading from ModelScope#

Local Storage#

URI Format#

How It Works#

Preparing a Local Model Store#

Using a Local Model#

Common Configuration#

Cache Path#

Repository Override#

NIM_REPOSITORY_OVERRIDE vs NIM_MODEL_PATH#

Disabling Model Download#

Proxy Support#

CLI Commands#

download-to-cache#

create-model-store#

mirror#

Examples#

NGC#

Basic NGC Download#

Download Behind Corporate Proxy#

Hugging Face#

Basic Hugging Face Download#

Using Hugging Face Proxy (Olah)#

Amazon S3#

Download with NIM_REPOSITORY_OVERRIDE#

Mirror NGC to S3#

S3-Compatible Storage (MinIO)#

Google Cloud Storage#

Download with NIM_REPOSITORY_OVERRIDE#

Mirror NGC to GCS#

Direct GCS URI in Manifest#

Local Storage#

Using NIM_MODEL_PATH (No Manifest)#

Air-Gapped Deployment#

Create Model Store from NGC#

ModelScope#

Basic ModelScope Download#

`download-to-cache`#

`create-model-store`#

`mirror`#