Model Download#

NIM LLM downloads model artifacts at container startup using a manifest-driven approach. The model manifest describes which files to download and where to download them from. The following model sources are supported:

Source

URI Scheme

NGC

ngc://

Hugging Face

hf://

Amazon S3

s3://

Google Cloud Storage

gs://

ModelScope

modelscope://

Local Storage

local://

NGC#

URI Format#

ngc://{org}/{team}/{model}:{version}?file={filename}

Authentication#

NGC uses API key authentication. NIM checks the following environment variables in order of priority:

Variable

Priority

Description

NGC_CLI_API_KEY

1 (highest)

Backward compatibility with NGC CLI.

NGC_API_KEY

2

Recommended for container deployments.

Both Personal API Keys (starting with nvapi-) and Legacy API Keys are supported.

export NGC_API_KEY="nvapi-xxxxxxxxxxxxxxxxxxxxxx"

Downloading from NGC#

Use the following steps to download a model from NGC.

  1. Set the required environment variables:

    export NGC_API_KEY="nvapi-xxxxxxxxxxxxxxxxxxxxxx"
    export NIM_MODEL_PROFILE=${NIM_MODEL_PROFILE}
    
  2. Run the download command:

    docker run --rm --gpus all \
      -v $(pwd)/model-cache:/opt/nim/.cache \
      -e NGC_API_KEY \
      -e NIM_MODEL_PROFILE \
      ${NIM_LLM_IMAGE} download-to-cache --profile $NIM_MODEL_PROFILE
    

Downloading Behind a Corporate Proxy#

Use the following steps to download a model from NGC behind a corporate proxy.

  1. Set the required environment variables:

    export NGC_API_KEY="nvapi-xxxxxxxxxxxxxxxxxxxxxx"
    export NIM_MODEL_PROFILE="${YOUR_NIM_MODEL_PROFILE}"
    export HTTPS_PROXY=http://proxy.corp.example.com:8080
    
  2. Run the download command:

    docker run --rm --gpus all \
      -e NGC_API_KEY \
      -e NIM_MODEL_PROFILE \
      -e HTTPS_PROXY \
      -p 8000:8000 \
      ${NIM_LLM_IMAGE} download-to-cache --profile $NIM_MODEL_PROFILE
    

Hugging Face#

URI Format#

hf://{org}/{model}:{revision}?file={filename}

Authentication#

Hugging Face uses a bearer token for authentication:

Variable

Description

HF_TOKEN

API token for private or gated models. Recommended even for public models to avoid rate limiting.

Additional Environment Variables#

The following environment variables configure cache location and hub endpoint for Hugging Face downloads:

Variable

Default

Description

HF_HOME

~/.cache/huggingface

Root directory for the local Hugging Face cache. When both HF_HOME and NIM_CACHE_PATH are set, HF_HOME must equal $NIM_CACHE_PATH/huggingface/hub.

HF_ENDPOINT

https://huggingface.co

Base URL of the Hugging Face Hub API. Set this to redirect downloads to a private Enterprise Hub instance, a mirror, or a local proxy server.

Hugging Face-Compatible Proxies#

NIM supports Hugging Face-compatible proxy servers (such as Olah) for local caching:

export HF_ENDPOINT="http://localhost:8090"

Downloading from Hugging Face#

Use the following steps to download a model from Hugging Face.

  1. Set the required environment variables:

    export HF_TOKEN="hf_your_token_here"
    export NIM_MODEL_PROFILE="${YOUR_NIM_MODEL_PROFILE}"
    
  2. Run the download command:

    docker run --rm --gpus all \
      -v $(pwd)/model-cache:/opt/nim/.cache \
      -e HF_TOKEN \
      -e NIM_MODEL_PROFILE \
      ${NIM_LLM_IMAGE} download-to-cache --profile $NIM_MODEL_PROFILE
    

Amazon S3#

URI Format#

s3://{bucket}/{key}

Authentication#

NIM uses the standard AWS credential provider chain, which discovers credentials in the following order of precedence:

Credential Source

Priority

Use Case

Environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)

1 (highest)

Container deployments

AWS credentials file (~/.aws/credentials)

2

Local development

IAM instance profile

3

EC2 instances

ECS container credentials

4

ECS tasks

export AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
export AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
export AWS_REGION=us-east-1

S3-Compatible Storage#

To use an S3-compatible service (MinIO, Ceph, Oracle OCI, and so on), set the AWS_ENDPOINT_URL environment variable to redirect S3 API calls to your provider:

export AWS_ENDPOINT_URL=http://minio.internal:9000

When no custom endpoint is set, NIM uses standard AWS S3 endpoints with virtual-hosted-style addressing by default. Some S3-compatible services require path-style addressing instead. The following variables control the addressing style:

Variable

Values

Default

Description

AWS_S3_ADDRESSING_STYLE

virtual, path

virtual

Selects the S3 addressing style explicitly.

AWS_S3_USE_PATH_STYLE

True, False

False

Forces path-style addressing. Required for services such as MinIO and Ceph.

Note

NIM automatically detects Oracle OCI Object Storage endpoints and enables path-style addressing without additional configuration.

Mirroring NGC Models to S3#

S3 is generic object storage without model-aware upload tools. NIM provides the mirror s3 command to copy NGC models to S3 with the correct percent-encoded keys required by NIM_REPOSITORY_OVERRIDE:

  1. Set the NGC API key:

    export NGC_API_KEY="nvapi-xxxxxxxxxxxxxxxxxxxxxx"
    
  2. Run the mirror command:

    docker run --rm --gpus all \
      -v $(pwd)/model-cache:/opt/nim/.cache \
      -e NGC_API_KEY \
      -e AWS_ACCESS_KEY_ID \
      -e AWS_SECRET_ACCESS_KEY \
      -e AWS_DEFAULT_REGION \
      ${NIM_LLM_IMAGE} mirror s3 \
        --manifest model_manifest.yaml \
        --bucket my-models-bucket
    

Downloading from S3#

Use the following steps to download a model from S3.

  1. Set the required environment variables:

    export AWS_ACCESS_KEY_ID="${YOUR_AWS_ACCESS_KEY}"
    export AWS_SECRET_ACCESS_KEY="${YOUR_AWS_SECRET_KEY}"
    export AWS_REGION=us-east-1
    export NIM_MODEL_PROFILE="${YOUR_NIM_MODEL_PROFILE}"
    
  2. Run the download command:

    docker run --rm --gpus all \
      -v $(pwd)/model-cache:/opt/nim/.cache \
      -e AWS_ACCESS_KEY_ID \
      -e AWS_SECRET_ACCESS_KEY \
      -e AWS_REGION \
      -e NIM_MODEL_PROFILE \
      ${NIM_LLM_IMAGE} download-to-cache --profile $NIM_MODEL_PROFILE
    

Redirecting NGC Downloads to S3#

NIM_REPOSITORY_OVERRIDE redirects NGC URIs in the manifest to your S3 bucket at runtime:

export NIM_REPOSITORY_OVERRIDE="s3://my-models-bucket"
export AWS_ACCESS_KEY_ID="${YOUR_AWS_ACCESS_KEY}"
export AWS_SECRET_ACCESS_KEY="${YOUR_AWS_SECRET_KEY}"

docker run --rm -it --gpus all \
  -e NIM_REPOSITORY_OVERRIDE \
  -e AWS_ACCESS_KEY_ID \
  -e AWS_SECRET_ACCESS_KEY \
  -p 8000:8000 \
  ${NIM_LLM_IMAGE}

Note

When using NIM_REPOSITORY_OVERRIDE with S3, model assets must be stored with percent-encoded keys. Use the mirror s3 command to upload models with the correct encoding.

Google Cloud Storage#

URI Format#

gs://{bucket}/{key}

Authentication#

GCS uses Google’s Application Default Credentials (ADC), which discovers credentials from the following sources:

  • GOOGLE_APPLICATION_CREDENTIALS environment variable (path to a service account key file)

  • Vertex AI managed environment credentials

  • Google Compute Engine metadata service

  • Cloud Run environment

  • gcloud CLI user credentials

For example, to set GOOGLE_APPLICATION_CREDENTIALS:

export GOOGLE_APPLICATION_CREDENTIALS="/path/to/sa.json"

Mirroring NGC Models to GCS#

GCS is generic object storage without model-aware upload tools. NIM provides the mirror gcs command to copy NGC models to GCS with the correct percent-encoded keys required by NIM_REPOSITORY_OVERRIDE:

export NGC_API_KEY="nvapi-xxxxxxxxxxxxxxxxxxxxxx"
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/sa.json"

docker run --rm --gpus all \
  -v $(pwd)/model-cache:/opt/nim/.cache \
  -v /path/to/sa.json:/credentials/sa.json:ro \
  -e NGC_API_KEY \
  -e GOOGLE_APPLICATION_CREDENTIALS=/credentials/sa.json \
  ${NIM_LLM_IMAGE} mirror gcs \
    --manifest model_manifest.yaml \
    --bucket my-models-bucket

Downloading from GCS#

Use the following steps to download a model from GCS.

  1. Set the required environment variables:

    export GOOGLE_APPLICATION_CREDENTIALS="/path/to/sa.json"
    export NIM_MODEL_PROFILE="${YOUR_NIM_MODEL_PROFILE}"
    
  2. Run the download command:

    docker run --rm --gpus all \
      -v $(pwd)/model-cache:/opt/nim/.cache \
      -v /path/to/sa.json:/credentials/sa.json:ro \
      -e GOOGLE_APPLICATION_CREDENTIALS=/credentials/sa.json \
      -e NIM_MODEL_PROFILE \
      ${NIM_LLM_IMAGE} download-to-cache --profile $NIM_MODEL_PROFILE
    

Redirecting NGC Downloads to GCS#

  1. Set the required environment variables:

    export GOOGLE_APPLICATION_CREDENTIALS="/path/to/sa.json"
    export NIM_REPOSITORY_OVERRIDE="gs://my-models-bucket"
    
  2. Start the container:

    docker run --rm -it --gpus all \
      -e NIM_REPOSITORY_OVERRIDE \
      -e GOOGLE_APPLICATION_CREDENTIALS=/credentials/sa.json \
      -v /path/to/sa.json:/credentials/sa.json:ro \
      -p 8000:8000 \
      ${NIM_LLM_IMAGE}
    

ModelScope#

URI Format#

modelscope://{org}/{model}:{revision}?file={filename}

Note

ModelScope uses master as the default branch, unlike Hugging Face which uses main.

Authentication#

ModelScope authentication uses the following environment variable:

Variable

Description

MODELSCOPE_API_TOKEN

API token for private models. Recommended even for public models to avoid rate limiting.

ModelScope-Compatible Proxies#

NIM supports custom ModelScope endpoints for local caching:

export MODELSCOPE_ENDPOINT="http://your-modelscope-proxy:port"

Additional Environment Variables#

The following environment variables configure ModelScope-specific behavior:

Variable

Default

Description

MODELSCOPE_CACHE

$NIM_CACHE_PATH/modelscope/hub

ModelScope cache directory. Must be consistent with NIM_CACHE_PATH when both are set.

Downloading from ModelScope#

Use the following steps to download a model from ModelScope.

  1. Set the required environment variables:

    export MODELSCOPE_API_TOKEN="your_token_here"
    export NIM_MODEL_PROFILE="${YOUR_NIM_MODEL_PROFILE}"
    
  2. Run the download command:

    docker run --rm --gpus all \
      -v $(pwd)/model-cache:/opt/nim/.cache \
      -e MODELSCOPE_API_TOKEN \
      -e NIM_MODEL_PROFILE \
      ${NIM_LLM_IMAGE} download-to-cache --profile $NIM_MODEL_PROFILE
    

Local Storage#

URI Format#

local://{absolute_path}

The following examples show valid local URI formats:

  • local:///mnt/models/llama-3.1-8b — references a model directory.

  • local:///mnt/models/llama-3.1-8b?file=config.json — references a specific file.

Note

Only the local:// URI scheme is supported for local filesystem access. Alternative notations (such as file:// or bare paths) are not supported.

How It Works#

Unlike remote sources, Local Storage does not download files. Instead, it creates symlinks in the workspace pointing directly to the source files. This means:

  • Zero additional disk usage.

  • Near-instant setup (milliseconds for symlink creation vs. minutes or hours for network transfers).

  • Source files must remain accessible for the duration of the NIM deployment.

Preparing a Local Model Store#

The create-model-store command downloads a model from a remote source (such as NGC) and creates a properly formatted local model store:

export NGC_API_KEY="nvapi-xxxxxxxxxxxxxxxxxxxxxx"
export NIM_MODEL_PROFILE="${YOUR_NIM_MODEL_PROFILE}"

docker run --rm --gpus all \
  -v $(pwd)/model-cache:/opt/nim/.cache \
  -v $(pwd)/model-store:/opt/nim/model-store \
  -e NGC_API_KEY \
  -e NIM_MODEL_PROFILE \
  ${NIM_LLM_IMAGE} create-model-store \
    --model-cache-path /opt/nim/.cache \
    --model-store /opt/nim/model-store \
    --profile $NIM_MODEL_PROFILE

Using a Local Model#

Use the following steps to work with a local model source.

  1. Set the required environment variable:

    export NIM_MODEL_PROFILE="${YOUR_NIM_MODEL_PROFILE}"
    
  2. Run the command:

    docker run --rm --gpus all \
      -v /mnt/models:/mnt/models:ro \
      -v $(pwd)/model-cache:/opt/nim/.cache \
      -e NIM_MODEL_PROFILE \
      ${NIM_LLM_IMAGE} download-to-cache --profile $NIM_MODEL_PROFILE
    

Common Configuration#

The following environment variables apply across all model sources.

Cache Path#

Variable

Default

Description

NIM_CACHE_PATH

/opt/nim/.cache

Directory used for model assets in the container. Must be a valid, writable path.

Repository Override#

NIM_REPOSITORY_OVERRIDE redirects model downloads from the default NGC source to an alternative storage location at runtime.

The following protocols are supported as override targets:

Protocol

Description

ngc://

NGC-compatible proxy servers.

s3://

AWS S3 and S3-compatible storage.

gs://

Google Cloud Storage.

http://, https://

HTTP/HTTPS servers.

export NIM_REPOSITORY_OVERRIDE="s3://my-bucket/nim-models"

Note

NIM_REPOSITORY_OVERRIDE supports object storage (S3/GCS) and HTTP(S) servers. It does not support model registries (hf://, modelscope://) as override targets. To use Hugging Face or ModelScope models, specify the corresponding URIs directly in the model manifest.

NIM_REPOSITORY_OVERRIDE vs NIM_MODEL_PATH#

These two environment variables serve different purposes for sourcing models from cloud storage:

Aspect

NIM_REPOSITORY_OVERRIDE

NIM_MODEL_PATH

Purpose

Mirror NGC models to your own storage

Serve custom/fine-tuned models directly

Manifest

Uses built-in NGC manifest

No manifest required

Upload format

Percent-encoded flat keys

Standard directory structure

Checksum verification

Yes (from NGC manifest)

No

Use case

Air-gapped deployments, NGC mirrors

Custom models, fine-tuned weights

When to use NIM_REPOSITORY_OVERRIDE: You want to serve an NGC model but download from your own S3/GCS bucket instead of NGC (for air-gapped environments, lower latency, or cost optimization). Refer to Examples section.

When to use NIM_MODEL_PATH: You have a custom or fine-tuned model that was never on NGC and want to serve it directly from cloud storage. Refer to Examples section.

Disabling Model Download#

The following environment variable controls model download behavior:

Variable

Default

Description

NIM_DISABLE_MODEL_DOWNLOAD

False

Disable model download on container startup. Useful for multi-node scenarios where only one node needs to download.

Proxy Support#

For environments behind corporate firewalls, NIM honors the standard proxy environment variables:

export HTTPS_PROXY=http://proxy.mycorp.com:8080
export HTTP_PROXY=http://proxy.mycorp.com:8080
export NO_PROXY=localhost,127.0.0.1

CLI Commands#

download-to-cache#

Downloads model profiles to the NIM cache. The container downloads assets unless --use-cache is passed.

usage: download-to-cache [-h] [--all] [--profiles [PROFILES ...]]
                         [--lora] [--model-uri MODEL_URI]
                         [--manifest-file MANIFEST_FILE]
                         [--model-cache-path MODEL_CACHE_PATH]
                         [--use-cache] [--verify-checksums]

The following arguments are available:

Argument

Description

--all

Download all profiles to cache.

--profiles [PROFILES ...]

Profile(s) to download. If omitted, the optimal profile is downloaded.

--lora

Download the default LoRA profile. Cannot be combined with --profiles or --all.

--model-uri MODEL_URI

Model URI to download. Supported schemes: ngc://, hf://, s3://, gs://, modelscope://, local://.

--manifest-file MANIFEST_FILE

Manifest file path.

--model-cache-path MODEL_CACHE_PATH

Directory path of model cache.

--use-cache

Check for cached assets before downloading from the repo.

--verify-checksums

Verify downloaded files match the checksums in the manifest (enabled by default).

create-model-store#

Creates a properly formatted model store directory from a cached model profile.

usage: create-model-store [-h] --profile PROFILE --model-store MODEL_STORE
                          [--model-cache-path MODEL_CACHE_PATH]
                          [--use-cache]

The following arguments are available:

Argument

Description

--profile PROFILE

Profile hash to create a model directory for (required).

--model-store MODEL_STORE

Directory path where the model profile is extracted and copied to (required).

--model-cache-path MODEL_CACHE_PATH

Directory path of model cache.

--use-cache

Check for cached assets before downloading from the repo.

mirror#

Mirrors model profiles from NGC to a destination object storage bucket. This command downloads profiles to the local cache and uploads them with the percent-encoded keys required by NIM_REPOSITORY_OVERRIDE.

usage: mirror {s3,gcs} [-h] -b BUCKET
                        [--profiles [PROFILES ...]]
                        [--manifest-file MANIFEST_FILE]
                        [--model-cache-path MODEL_CACHE_PATH]

The following arguments are available:

Argument

Description

s3 or gcs

Destination storage type (positional, required).

-b BUCKET, --bucket BUCKET

Destination bucket name (required).

--profiles [PROFILES ...]

Profile(s) to mirror. If omitted, the default profile is mirrored.

--manifest-file MANIFEST_FILE

Manifest file path.

--model-cache-path MODEL_CACHE_PATH

Directory path of model cache.

Examples#

NGC#

Basic NGC Download#

export NGC_API_KEY="${YOUR_NGC_API_KEY}"
export NIM_MODEL_PROFILE=${NIM_MODEL_PROFILE}

docker run --rm --gpus all \
  -v $(pwd)/model-cache:/opt/nim/.cache \
  -e NGC_API_KEY \
  -e NIM_MODEL_PROFILE \
  ${NIM_LLM_IMAGE} download-to-cache --profile $NIM_MODEL_PROFILE

Download Behind Corporate Proxy#

export NGC_API_KEY="${YOUR_NGC_API_KEY}"
export HTTPS_PROXY=http://proxy.corp.example.com:8080
export NO_PROXY=localhost,127.0.0.1

docker run --rm --gpus all \
  -v $(pwd)/model-cache:/opt/nim/.cache \
  -e NGC_API_KEY \
  -e HTTPS_PROXY \
  -e NO_PROXY \
  ${NIM_LLM_IMAGE} download-to-cache --all

Hugging Face#

Basic Hugging Face Download#

export HF_TOKEN="${YOUR_HF_TOKEN}"
export NIM_MODEL_PROFILE=${NIM_MODEL_PROFILE}

docker run --rm --gpus all \
  -v $(pwd)/model-cache:/opt/nim/.cache \
  -e HF_TOKEN \
  -e NIM_MODEL_PROFILE \
  ${NIM_LLM_IMAGE} download-to-cache --profile $NIM_MODEL_PROFILE

Using Hugging Face Proxy (Olah)#

export HF_ENDPOINT="http://localhost:8090"
export HF_TOKEN="${YOUR_HF_TOKEN}"

docker run --rm --gpus all \
  --network=host \
  -v $(pwd)/model-cache:/opt/nim/.cache \
  -e HF_ENDPOINT \
  -e HF_TOKEN \
  ${NIM_LLM_IMAGE}

Amazon S3#

Download with NIM_REPOSITORY_OVERRIDE#

export AWS_ACCESS_KEY_ID="${YOUR_AWS_ACCESS_KEY}"
export AWS_SECRET_ACCESS_KEY="${YOUR_AWS_SECRET_KEY}"
export AWS_REGION=us-east-1
export NIM_REPOSITORY_OVERRIDE=s3://my-models-bucket

docker run --rm -it --gpus all \
  -e NIM_REPOSITORY_OVERRIDE \
  -e AWS_ACCESS_KEY_ID \
  -e AWS_SECRET_ACCESS_KEY \
  -e AWS_REGION \
  -p 8000:8000 \
  ${NIM_LLM_IMAGE}

Mirror NGC to S3#

export NGC_API_KEY="${YOUR_NGC_API_KEY}"
export AWS_ACCESS_KEY_ID="${YOUR_AWS_ACCESS_KEY}"
export AWS_SECRET_ACCESS_KEY="${YOUR_AWS_SECRET_KEY}"
export AWS_DEFAULT_REGION=us-east-1

docker run --rm --gpus all \
  -v $(pwd)/model-cache:/opt/nim/.cache \
  -e NGC_API_KEY \
  -e AWS_ACCESS_KEY_ID \
  -e AWS_SECRET_ACCESS_KEY \
  -e AWS_DEFAULT_REGION \
  ${NIM_LLM_IMAGE} mirror s3 \
    --manifest model_manifest.yaml \
    --bucket my-models-bucket

S3-Compatible Storage (MinIO)#

export AWS_ENDPOINT_URL=http://minio.internal:9000
export AWS_ACCESS_KEY_ID="${YOUR_MINIO_ACCESS_KEY}"
export AWS_SECRET_ACCESS_KEY="${YOUR_MINIO_SECRET_KEY}"
export AWS_REGION=us-east-1
export AWS_S3_USE_PATH_STYLE=true
export NIM_REPOSITORY_OVERRIDE=s3://my-bucket

docker run --rm -it --gpus all \
  -e NIM_REPOSITORY_OVERRIDE \
  -e AWS_ENDPOINT_URL \
  -e AWS_ACCESS_KEY_ID \
  -e AWS_SECRET_ACCESS_KEY \
  -e AWS_REGION \
  -e AWS_S3_USE_PATH_STYLE \
  -p 8000:8000 \
  ${NIM_LLM_IMAGE}

Google Cloud Storage#

Download with NIM_REPOSITORY_OVERRIDE#

export GOOGLE_APPLICATION_CREDENTIALS="/path/to/sa.json"
export NIM_REPOSITORY_OVERRIDE=gs://my-models-bucket

docker run --rm -it --gpus all \
  -e NIM_REPOSITORY_OVERRIDE \
  -e GOOGLE_APPLICATION_CREDENTIALS=/credentials/sa.json \
  -v /path/to/sa.json:/credentials/sa.json:ro \
  -p 8000:8000 \
  ${NIM_LLM_IMAGE}

Mirror NGC to GCS#

export NGC_API_KEY="${YOUR_NGC_API_KEY}"
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/sa.json"

docker run --rm --gpus all \
  -v $(pwd)/model-cache:/opt/nim/.cache \
  -v /path/to/sa.json:/credentials/sa.json:ro \
  -e NGC_API_KEY \
  -e GOOGLE_APPLICATION_CREDENTIALS=/credentials/sa.json \
  ${NIM_LLM_IMAGE} mirror gcs \
    --manifest model_manifest.yaml \
    --bucket my-models-bucket

Direct GCS URI in Manifest#

export GOOGLE_APPLICATION_CREDENTIALS="/path/to/sa.json"
export NIM_MODEL_PROFILE=${NIM_MODEL_PROFILE}

docker run --rm --gpus all \
  -v $(pwd)/model-cache:/opt/nim/.cache \
  -v /path/to/sa.json:/credentials/sa.json:ro \
  -e GOOGLE_APPLICATION_CREDENTIALS=/credentials/sa.json \
  -e NIM_MODEL_PROFILE \
  ${NIM_LLM_IMAGE} download-to-cache --profile $NIM_MODEL_PROFILE

Local Storage#

Using NIM_MODEL_PATH (No Manifest)#

docker run --rm --gpus all \
  -v /mnt/models/llama-3.1-8b:/opt/nim/model:ro \
  -e NIM_MODEL_PATH=/opt/nim/model \
  -p 8000:8000 \
  ${NIM_LLM_IMAGE}

Air-Gapped Deployment#

docker run --rm --gpus all \
  --network none \
  -v /opt/models/llama-3.1-8b:/opt/nim/model:ro \
  -e NIM_MODEL_PATH=/opt/nim/model \
  -p 8000:8000 \
  ${NIM_LLM_IMAGE}

Create Model Store from NGC#

export NGC_API_KEY="${YOUR_NGC_API_KEY}"
export NIM_MODEL_PROFILE=${NIM_MODEL_PROFILE}

docker run --rm --gpus all \
  -v $(pwd)/model-cache:/opt/nim/.cache \
  -v $(pwd)/model-store:/opt/nim/model-store \
  -e NGC_API_KEY \
  -e NIM_MODEL_PROFILE \
  ${NIM_LLM_IMAGE} create-model-store \
    --model-cache-path /opt/nim/.cache \
    --model-store /opt/nim/model-store \
    --profile $NIM_MODEL_PROFILE

ModelScope#

Basic ModelScope Download#

export MODELSCOPE_API_TOKEN="${YOUR_MODELSCOPE_TOKEN}"
export NIM_MODEL_PROFILE=${NIM_MODEL_PROFILE}

docker run --rm --gpus all \
  -v $(pwd)/model-cache:/opt/nim/.cache \
  -e MODELSCOPE_API_TOKEN \
  -e NIM_MODEL_PROFILE \
  ${NIM_LLM_IMAGE} download-to-cache --profile $NIM_MODEL_PROFILE