Model Download#
NIM LLM downloads model artifacts at container startup using a manifest-driven approach. The model manifest describes which files to download and where to download them from. The following model sources are supported:
Source |
URI Scheme |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
NGC#
URI Format#
ngc://{org}/{team}/{model}:{version}?file={filename}
Authentication#
NGC uses API key authentication. NIM checks the following environment variables in order of priority:
Variable |
Priority |
Description |
|---|---|---|
|
1 (highest) |
Backward compatibility with NGC CLI. |
|
2 |
Recommended for container deployments. |
Both Personal API Keys (starting with nvapi-) and Legacy API Keys are supported.
export NGC_API_KEY="nvapi-xxxxxxxxxxxxxxxxxxxxxx"
Downloading from NGC#
Use the following steps to download a model from NGC.
Set the required environment variables:
export NGC_API_KEY="nvapi-xxxxxxxxxxxxxxxxxxxxxx" export NIM_MODEL_PROFILE=${NIM_MODEL_PROFILE}
Run the download command:
docker run --rm --gpus all \ -v $(pwd)/model-cache:/opt/nim/.cache \ -e NGC_API_KEY \ -e NIM_MODEL_PROFILE \ ${NIM_LLM_IMAGE} download-to-cache --profile $NIM_MODEL_PROFILE
Downloading Behind a Corporate Proxy#
Use the following steps to download a model from NGC behind a corporate proxy.
Set the required environment variables:
export NGC_API_KEY="nvapi-xxxxxxxxxxxxxxxxxxxxxx" export NIM_MODEL_PROFILE="${YOUR_NIM_MODEL_PROFILE}" export HTTPS_PROXY=http://proxy.corp.example.com:8080
Run the download command:
docker run --rm --gpus all \ -e NGC_API_KEY \ -e NIM_MODEL_PROFILE \ -e HTTPS_PROXY \ -p 8000:8000 \ ${NIM_LLM_IMAGE} download-to-cache --profile $NIM_MODEL_PROFILE
Hugging Face#
URI Format#
hf://{org}/{model}:{revision}?file={filename}
Authentication#
Hugging Face uses a bearer token for authentication:
Variable |
Description |
|---|---|
|
API token for private or gated models. Recommended even for public models to avoid rate limiting. |
Additional Environment Variables#
The following environment variables configure cache location and hub endpoint for Hugging Face downloads:
Variable |
Default |
Description |
|---|---|---|
|
|
Root directory for the local Hugging Face cache. When both |
|
|
Base URL of the Hugging Face Hub API. Set this to redirect downloads to a private Enterprise Hub instance, a mirror, or a local proxy server. |
Hugging Face-Compatible Proxies#
NIM supports Hugging Face-compatible proxy servers (such as Olah) for local caching:
export HF_ENDPOINT="http://localhost:8090"
Downloading from Hugging Face#
Use the following steps to download a model from Hugging Face.
Set the required environment variables:
export HF_TOKEN="hf_your_token_here" export NIM_MODEL_PROFILE="${YOUR_NIM_MODEL_PROFILE}"
Run the download command:
docker run --rm --gpus all \ -v $(pwd)/model-cache:/opt/nim/.cache \ -e HF_TOKEN \ -e NIM_MODEL_PROFILE \ ${NIM_LLM_IMAGE} download-to-cache --profile $NIM_MODEL_PROFILE
Amazon S3#
URI Format#
s3://{bucket}/{key}
Authentication#
NIM uses the standard AWS credential provider chain, which discovers credentials in the following order of precedence:
Credential Source |
Priority |
Use Case |
|---|---|---|
Environment variables ( |
1 (highest) |
Container deployments |
AWS credentials file ( |
2 |
Local development |
IAM instance profile |
3 |
EC2 instances |
ECS container credentials |
4 |
ECS tasks |
export AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
export AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
export AWS_REGION=us-east-1
S3-Compatible Storage#
To use an S3-compatible service (MinIO, Ceph, Oracle OCI, and so on), set the AWS_ENDPOINT_URL environment variable to redirect S3 API calls to your provider:
export AWS_ENDPOINT_URL=http://minio.internal:9000
When no custom endpoint is set, NIM uses standard AWS S3 endpoints with virtual-hosted-style addressing by default. Some S3-compatible services require path-style addressing instead. The following variables control the addressing style:
Variable |
Values |
Default |
Description |
|---|---|---|---|
|
|
|
Selects the S3 addressing style explicitly. |
|
|
|
Forces path-style addressing. Required for services such as MinIO and Ceph. |
Note
NIM automatically detects Oracle OCI Object Storage endpoints and enables path-style addressing without additional configuration.
Mirroring NGC Models to S3#
S3 is generic object storage without model-aware upload tools. NIM provides the mirror s3 command to copy NGC models to S3 with the correct percent-encoded keys required by NIM_REPOSITORY_OVERRIDE:
Set the NGC API key:
export NGC_API_KEY="nvapi-xxxxxxxxxxxxxxxxxxxxxx"
Run the mirror command:
docker run --rm --gpus all \ -v $(pwd)/model-cache:/opt/nim/.cache \ -e NGC_API_KEY \ -e AWS_ACCESS_KEY_ID \ -e AWS_SECRET_ACCESS_KEY \ -e AWS_DEFAULT_REGION \ ${NIM_LLM_IMAGE} mirror s3 \ --manifest model_manifest.yaml \ --bucket my-models-bucket
Downloading from S3#
Use the following steps to download a model from S3.
Set the required environment variables:
export AWS_ACCESS_KEY_ID="${YOUR_AWS_ACCESS_KEY}" export AWS_SECRET_ACCESS_KEY="${YOUR_AWS_SECRET_KEY}" export AWS_REGION=us-east-1 export NIM_MODEL_PROFILE="${YOUR_NIM_MODEL_PROFILE}"
Run the download command:
docker run --rm --gpus all \ -v $(pwd)/model-cache:/opt/nim/.cache \ -e AWS_ACCESS_KEY_ID \ -e AWS_SECRET_ACCESS_KEY \ -e AWS_REGION \ -e NIM_MODEL_PROFILE \ ${NIM_LLM_IMAGE} download-to-cache --profile $NIM_MODEL_PROFILE
Redirecting NGC Downloads to S3#
NIM_REPOSITORY_OVERRIDE redirects NGC URIs in the manifest to your S3 bucket at runtime:
export NIM_REPOSITORY_OVERRIDE="s3://my-models-bucket"
export AWS_ACCESS_KEY_ID="${YOUR_AWS_ACCESS_KEY}"
export AWS_SECRET_ACCESS_KEY="${YOUR_AWS_SECRET_KEY}"
docker run --rm -it --gpus all \
-e NIM_REPOSITORY_OVERRIDE \
-e AWS_ACCESS_KEY_ID \
-e AWS_SECRET_ACCESS_KEY \
-p 8000:8000 \
${NIM_LLM_IMAGE}
Note
When using NIM_REPOSITORY_OVERRIDE with S3, model assets must be stored with percent-encoded keys. Use the mirror s3 command to upload models with the correct encoding.
Google Cloud Storage#
URI Format#
gs://{bucket}/{key}
Authentication#
GCS uses Google’s Application Default Credentials (ADC), which discovers credentials from the following sources:
GOOGLE_APPLICATION_CREDENTIALSenvironment variable (path to a service account key file)Vertex AI managed environment credentials
Google Compute Engine metadata service
Cloud Run environment
gcloud CLI user credentials
For example, to set GOOGLE_APPLICATION_CREDENTIALS:
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/sa.json"
Mirroring NGC Models to GCS#
GCS is generic object storage without model-aware upload tools. NIM provides the mirror gcs command to copy NGC models to GCS with the correct percent-encoded keys required by NIM_REPOSITORY_OVERRIDE:
export NGC_API_KEY="nvapi-xxxxxxxxxxxxxxxxxxxxxx"
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/sa.json"
docker run --rm --gpus all \
-v $(pwd)/model-cache:/opt/nim/.cache \
-v /path/to/sa.json:/credentials/sa.json:ro \
-e NGC_API_KEY \
-e GOOGLE_APPLICATION_CREDENTIALS=/credentials/sa.json \
${NIM_LLM_IMAGE} mirror gcs \
--manifest model_manifest.yaml \
--bucket my-models-bucket
Downloading from GCS#
Use the following steps to download a model from GCS.
Set the required environment variables:
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/sa.json" export NIM_MODEL_PROFILE="${YOUR_NIM_MODEL_PROFILE}"
Run the download command:
docker run --rm --gpus all \ -v $(pwd)/model-cache:/opt/nim/.cache \ -v /path/to/sa.json:/credentials/sa.json:ro \ -e GOOGLE_APPLICATION_CREDENTIALS=/credentials/sa.json \ -e NIM_MODEL_PROFILE \ ${NIM_LLM_IMAGE} download-to-cache --profile $NIM_MODEL_PROFILE
Redirecting NGC Downloads to GCS#
Set the required environment variables:
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/sa.json" export NIM_REPOSITORY_OVERRIDE="gs://my-models-bucket"
Start the container:
docker run --rm -it --gpus all \ -e NIM_REPOSITORY_OVERRIDE \ -e GOOGLE_APPLICATION_CREDENTIALS=/credentials/sa.json \ -v /path/to/sa.json:/credentials/sa.json:ro \ -p 8000:8000 \ ${NIM_LLM_IMAGE}
ModelScope#
URI Format#
modelscope://{org}/{model}:{revision}?file={filename}
Note
ModelScope uses master as the default branch, unlike Hugging Face which uses main.
Authentication#
ModelScope authentication uses the following environment variable:
Variable |
Description |
|---|---|
|
API token for private models. Recommended even for public models to avoid rate limiting. |
ModelScope-Compatible Proxies#
NIM supports custom ModelScope endpoints for local caching:
export MODELSCOPE_ENDPOINT="http://your-modelscope-proxy:port"
Additional Environment Variables#
The following environment variables configure ModelScope-specific behavior:
Variable |
Default |
Description |
|---|---|---|
|
|
ModelScope cache directory. Must be consistent with |
Downloading from ModelScope#
Use the following steps to download a model from ModelScope.
Set the required environment variables:
export MODELSCOPE_API_TOKEN="your_token_here" export NIM_MODEL_PROFILE="${YOUR_NIM_MODEL_PROFILE}"
Run the download command:
docker run --rm --gpus all \ -v $(pwd)/model-cache:/opt/nim/.cache \ -e MODELSCOPE_API_TOKEN \ -e NIM_MODEL_PROFILE \ ${NIM_LLM_IMAGE} download-to-cache --profile $NIM_MODEL_PROFILE
Local Storage#
URI Format#
local://{absolute_path}
The following examples show valid local URI formats:
local:///mnt/models/llama-3.1-8b— references a model directory.local:///mnt/models/llama-3.1-8b?file=config.json— references a specific file.
Note
Only the local:// URI scheme is supported for local filesystem access. Alternative notations (such as file:// or bare paths) are not supported.
How It Works#
Unlike remote sources, Local Storage does not download files. Instead, it creates symlinks in the workspace pointing directly to the source files. This means:
Zero additional disk usage.
Near-instant setup (milliseconds for symlink creation vs. minutes or hours for network transfers).
Source files must remain accessible for the duration of the NIM deployment.
Preparing a Local Model Store#
The create-model-store command downloads a model from a remote source (such as NGC) and creates a properly formatted local model store:
export NGC_API_KEY="nvapi-xxxxxxxxxxxxxxxxxxxxxx"
export NIM_MODEL_PROFILE="${YOUR_NIM_MODEL_PROFILE}"
docker run --rm --gpus all \
-v $(pwd)/model-cache:/opt/nim/.cache \
-v $(pwd)/model-store:/opt/nim/model-store \
-e NGC_API_KEY \
-e NIM_MODEL_PROFILE \
${NIM_LLM_IMAGE} create-model-store \
--model-cache-path /opt/nim/.cache \
--model-store /opt/nim/model-store \
--profile $NIM_MODEL_PROFILE
Using a Local Model#
Use the following steps to work with a local model source.
Set the required environment variable:
export NIM_MODEL_PROFILE="${YOUR_NIM_MODEL_PROFILE}"
Run the command:
docker run --rm --gpus all \ -v /mnt/models:/mnt/models:ro \ -v $(pwd)/model-cache:/opt/nim/.cache \ -e NIM_MODEL_PROFILE \ ${NIM_LLM_IMAGE} download-to-cache --profile $NIM_MODEL_PROFILE
Common Configuration#
The following environment variables apply across all model sources.
Cache Path#
Variable |
Default |
Description |
|---|---|---|
|
|
Directory used for model assets in the container. Must be a valid, writable path. |
Repository Override#
NIM_REPOSITORY_OVERRIDE redirects model downloads from the default NGC source to an alternative storage location at runtime.
The following protocols are supported as override targets:
Protocol |
Description |
|---|---|
|
NGC-compatible proxy servers. |
|
AWS S3 and S3-compatible storage. |
|
Google Cloud Storage. |
|
HTTP/HTTPS servers. |
export NIM_REPOSITORY_OVERRIDE="s3://my-bucket/nim-models"
Note
NIM_REPOSITORY_OVERRIDE supports object storage (S3/GCS) and HTTP(S) servers. It does not support model registries (hf://, modelscope://) as override targets. To use Hugging Face or ModelScope models, specify the corresponding URIs directly in the model manifest.
NIM_REPOSITORY_OVERRIDE vs NIM_MODEL_PATH#
These two environment variables serve different purposes for sourcing models from cloud storage:
Aspect |
|
|
|---|---|---|
Purpose |
Mirror NGC models to your own storage |
Serve custom/fine-tuned models directly |
Manifest |
Uses built-in NGC manifest |
No manifest required |
Upload format |
Percent-encoded flat keys |
Standard directory structure |
Checksum verification |
Yes (from NGC manifest) |
No |
Use case |
Air-gapped deployments, NGC mirrors |
Custom models, fine-tuned weights |
When to use NIM_REPOSITORY_OVERRIDE: You want to serve an NGC model but download from your own S3/GCS bucket instead of NGC (for air-gapped environments, lower latency, or cost optimization). Refer to Examples section.
When to use NIM_MODEL_PATH: You have a custom or fine-tuned model that was never on NGC and want to serve it directly from cloud storage. Refer to Examples section.
Disabling Model Download#
The following environment variable controls model download behavior:
Variable |
Default |
Description |
|---|---|---|
|
|
Disable model download on container startup. Useful for multi-node scenarios where only one node needs to download. |
Proxy Support#
For environments behind corporate firewalls, NIM honors the standard proxy environment variables:
export HTTPS_PROXY=http://proxy.mycorp.com:8080
export HTTP_PROXY=http://proxy.mycorp.com:8080
export NO_PROXY=localhost,127.0.0.1
CLI Commands#
download-to-cache#
Downloads model profiles to the NIM cache. The container downloads assets unless --use-cache is passed.
usage: download-to-cache [-h] [--all] [--profiles [PROFILES ...]]
[--lora] [--model-uri MODEL_URI]
[--manifest-file MANIFEST_FILE]
[--model-cache-path MODEL_CACHE_PATH]
[--use-cache] [--verify-checksums]
The following arguments are available:
Argument |
Description |
|---|---|
|
Download all profiles to cache. |
|
Profile(s) to download. If omitted, the optimal profile is downloaded. |
|
Download the default LoRA profile. Cannot be combined with |
|
Model URI to download. Supported schemes: |
|
Manifest file path. |
|
Directory path of model cache. |
|
Check for cached assets before downloading from the repo. |
|
Verify downloaded files match the checksums in the manifest (enabled by default). |
create-model-store#
Creates a properly formatted model store directory from a cached model profile.
usage: create-model-store [-h] --profile PROFILE --model-store MODEL_STORE
[--model-cache-path MODEL_CACHE_PATH]
[--use-cache]
The following arguments are available:
Argument |
Description |
|---|---|
|
Profile hash to create a model directory for (required). |
|
Directory path where the model profile is extracted and copied to (required). |
|
Directory path of model cache. |
|
Check for cached assets before downloading from the repo. |
mirror#
Mirrors model profiles from NGC to a destination object storage bucket. This command downloads profiles to the local cache and uploads them with the percent-encoded keys required by NIM_REPOSITORY_OVERRIDE.
usage: mirror {s3,gcs} [-h] -b BUCKET
[--profiles [PROFILES ...]]
[--manifest-file MANIFEST_FILE]
[--model-cache-path MODEL_CACHE_PATH]
The following arguments are available:
Argument |
Description |
|---|---|
|
Destination storage type (positional, required). |
|
Destination bucket name (required). |
|
Profile(s) to mirror. If omitted, the default profile is mirrored. |
|
Manifest file path. |
|
Directory path of model cache. |
Examples#
NGC#
Basic NGC Download#
export NGC_API_KEY="${YOUR_NGC_API_KEY}"
export NIM_MODEL_PROFILE=${NIM_MODEL_PROFILE}
docker run --rm --gpus all \
-v $(pwd)/model-cache:/opt/nim/.cache \
-e NGC_API_KEY \
-e NIM_MODEL_PROFILE \
${NIM_LLM_IMAGE} download-to-cache --profile $NIM_MODEL_PROFILE
Download Behind Corporate Proxy#
export NGC_API_KEY="${YOUR_NGC_API_KEY}"
export HTTPS_PROXY=http://proxy.corp.example.com:8080
export NO_PROXY=localhost,127.0.0.1
docker run --rm --gpus all \
-v $(pwd)/model-cache:/opt/nim/.cache \
-e NGC_API_KEY \
-e HTTPS_PROXY \
-e NO_PROXY \
${NIM_LLM_IMAGE} download-to-cache --all
Hugging Face#
Basic Hugging Face Download#
export HF_TOKEN="${YOUR_HF_TOKEN}"
export NIM_MODEL_PROFILE=${NIM_MODEL_PROFILE}
docker run --rm --gpus all \
-v $(pwd)/model-cache:/opt/nim/.cache \
-e HF_TOKEN \
-e NIM_MODEL_PROFILE \
${NIM_LLM_IMAGE} download-to-cache --profile $NIM_MODEL_PROFILE
Using Hugging Face Proxy (Olah)#
export HF_ENDPOINT="http://localhost:8090"
export HF_TOKEN="${YOUR_HF_TOKEN}"
docker run --rm --gpus all \
--network=host \
-v $(pwd)/model-cache:/opt/nim/.cache \
-e HF_ENDPOINT \
-e HF_TOKEN \
${NIM_LLM_IMAGE}
Amazon S3#
Download with NIM_REPOSITORY_OVERRIDE#
export AWS_ACCESS_KEY_ID="${YOUR_AWS_ACCESS_KEY}"
export AWS_SECRET_ACCESS_KEY="${YOUR_AWS_SECRET_KEY}"
export AWS_REGION=us-east-1
export NIM_REPOSITORY_OVERRIDE=s3://my-models-bucket
docker run --rm -it --gpus all \
-e NIM_REPOSITORY_OVERRIDE \
-e AWS_ACCESS_KEY_ID \
-e AWS_SECRET_ACCESS_KEY \
-e AWS_REGION \
-p 8000:8000 \
${NIM_LLM_IMAGE}
Mirror NGC to S3#
export NGC_API_KEY="${YOUR_NGC_API_KEY}"
export AWS_ACCESS_KEY_ID="${YOUR_AWS_ACCESS_KEY}"
export AWS_SECRET_ACCESS_KEY="${YOUR_AWS_SECRET_KEY}"
export AWS_DEFAULT_REGION=us-east-1
docker run --rm --gpus all \
-v $(pwd)/model-cache:/opt/nim/.cache \
-e NGC_API_KEY \
-e AWS_ACCESS_KEY_ID \
-e AWS_SECRET_ACCESS_KEY \
-e AWS_DEFAULT_REGION \
${NIM_LLM_IMAGE} mirror s3 \
--manifest model_manifest.yaml \
--bucket my-models-bucket
S3-Compatible Storage (MinIO)#
export AWS_ENDPOINT_URL=http://minio.internal:9000
export AWS_ACCESS_KEY_ID="${YOUR_MINIO_ACCESS_KEY}"
export AWS_SECRET_ACCESS_KEY="${YOUR_MINIO_SECRET_KEY}"
export AWS_REGION=us-east-1
export AWS_S3_USE_PATH_STYLE=true
export NIM_REPOSITORY_OVERRIDE=s3://my-bucket
docker run --rm -it --gpus all \
-e NIM_REPOSITORY_OVERRIDE \
-e AWS_ENDPOINT_URL \
-e AWS_ACCESS_KEY_ID \
-e AWS_SECRET_ACCESS_KEY \
-e AWS_REGION \
-e AWS_S3_USE_PATH_STYLE \
-p 8000:8000 \
${NIM_LLM_IMAGE}
Google Cloud Storage#
Download with NIM_REPOSITORY_OVERRIDE#
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/sa.json"
export NIM_REPOSITORY_OVERRIDE=gs://my-models-bucket
docker run --rm -it --gpus all \
-e NIM_REPOSITORY_OVERRIDE \
-e GOOGLE_APPLICATION_CREDENTIALS=/credentials/sa.json \
-v /path/to/sa.json:/credentials/sa.json:ro \
-p 8000:8000 \
${NIM_LLM_IMAGE}
Mirror NGC to GCS#
export NGC_API_KEY="${YOUR_NGC_API_KEY}"
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/sa.json"
docker run --rm --gpus all \
-v $(pwd)/model-cache:/opt/nim/.cache \
-v /path/to/sa.json:/credentials/sa.json:ro \
-e NGC_API_KEY \
-e GOOGLE_APPLICATION_CREDENTIALS=/credentials/sa.json \
${NIM_LLM_IMAGE} mirror gcs \
--manifest model_manifest.yaml \
--bucket my-models-bucket
Direct GCS URI in Manifest#
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/sa.json"
export NIM_MODEL_PROFILE=${NIM_MODEL_PROFILE}
docker run --rm --gpus all \
-v $(pwd)/model-cache:/opt/nim/.cache \
-v /path/to/sa.json:/credentials/sa.json:ro \
-e GOOGLE_APPLICATION_CREDENTIALS=/credentials/sa.json \
-e NIM_MODEL_PROFILE \
${NIM_LLM_IMAGE} download-to-cache --profile $NIM_MODEL_PROFILE
Local Storage#
Using NIM_MODEL_PATH (No Manifest)#
docker run --rm --gpus all \
-v /mnt/models/llama-3.1-8b:/opt/nim/model:ro \
-e NIM_MODEL_PATH=/opt/nim/model \
-p 8000:8000 \
${NIM_LLM_IMAGE}
Air-Gapped Deployment#
docker run --rm --gpus all \
--network none \
-v /opt/models/llama-3.1-8b:/opt/nim/model:ro \
-e NIM_MODEL_PATH=/opt/nim/model \
-p 8000:8000 \
${NIM_LLM_IMAGE}
Create Model Store from NGC#
export NGC_API_KEY="${YOUR_NGC_API_KEY}"
export NIM_MODEL_PROFILE=${NIM_MODEL_PROFILE}
docker run --rm --gpus all \
-v $(pwd)/model-cache:/opt/nim/.cache \
-v $(pwd)/model-store:/opt/nim/model-store \
-e NGC_API_KEY \
-e NIM_MODEL_PROFILE \
${NIM_LLM_IMAGE} create-model-store \
--model-cache-path /opt/nim/.cache \
--model-store /opt/nim/model-store \
--profile $NIM_MODEL_PROFILE
ModelScope#
Basic ModelScope Download#
export MODELSCOPE_API_TOKEN="${YOUR_MODELSCOPE_TOKEN}"
export NIM_MODEL_PROFILE=${NIM_MODEL_PROFILE}
docker run --rm --gpus all \
-v $(pwd)/model-cache:/opt/nim/.cache \
-e MODELSCOPE_API_TOKEN \
-e NIM_MODEL_PROFILE \
${NIM_LLM_IMAGE} download-to-cache --profile $NIM_MODEL_PROFILE