Model-Free NIM#

Run any supported model without a model-specific container image.

By default, NIM containers ship with a built-in model manifest that defines which model to serve. In model-free mode, you point a generic NIM container at any supported model, such as a Hugging Face repo, an S3 bucket, or a local directory. NIM then generates a manifest at startup and serves that model.

Model-free NIM is useful in the following scenarios:

  • Flexible single-container deployments: One container image can pass security review and serve any supported model.

  • Day-zero model support: You can serve newly released models without waiting for a model-specific NIM container.

  • Custom and fine-tuned models: You can serve your own models from any supported source.

Note

Regardless of where the model is hosted, if its architecture is unsupported by vLLM, it will also be unsupported by NIM.

Configuring the Model#

Supply the model via either of these methods. If both are provided, the CLI positional argument takes precedence.

NIM_MODEL_PATH Environment Variable#

Use NIM_MODEL_PATH to point the container to the model at runtime:

docker run --gpus=all \
  -e NIM_MODEL_PATH=hf://meta-llama/Llama-3.1-8B-Instruct \
  -e HF_TOKEN=<yourtoken> \
  -p 8000:8000 \
  ${NIM_LLM_IMAGE}

vLLM CLI Positional Argument#

Use the vLLM positional argument to pass the model path at runtime:

docker run --gpus=all \
  -e HF_TOKEN=<yourtoken> \
  -p 8000:8000 \
  ${NIM_LLM_IMAGE} \
  hf://meta-llama/Llama-3.1-8B-Instruct

Supported Model Sources#

Prefix

Source

Example

Authentication

hf://

Hugging Face Hub

hf://meta-llama/Llama-3.1-8B-Instruct

HF_TOKEN

ngc://

NVIDIA NGC

ngc://nim/meta/llama-3.3-70b-instruct:hf

NGC_API_KEY

s3://

AWS S3 / S3-compatible

s3://my-bucket/my-org/my-model

AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY (additional S3 variables)

modelscope://

ModelScope Hub

modelscope://LLM-Research/Llama-3.2-1B-Instruct:d3e55134

MODELSCOPE_API_TOKEN

gs://

Google Cloud Storage

gs://my-bucket/my-org/my-model

GOOGLE_APPLICATION_CREDENTIALS (or ADC)

(absolute path)

Local directory

/mnt/models/my-llama

None

For details on downloading models from each source (including URI formats, proxy configuration, and cache management), refer to Model Download.

Configuring Deployment Options#

Model-free NIM generates profiles for combinations of tensor parallelism (TP), pipeline parallelism (PP), and LoRA. To select a deployment configuration, use either of the following methods. If both are provided, vLLM CLI arguments take precedence.

NIM_MODEL_PROFILE Environment Variable#

Run list-model-profiles to see available profiles, then select one:

docker run --gpus=all \
  -e NIM_MODEL_PATH=hf://meta-llama/Llama-3.1-8B-Instruct \
  -e NIM_MODEL_PROFILE=vllm-bf16-tp2-pp1 \
  -e HF_TOKEN=<yourtoken> \
  -p 8000:8000 \
  ${NIM_LLM_IMAGE}

vLLM CLI Arguments#

Use vLLM CLI arguments to pass additional runtime options with the model path:

docker run --gpus=all \
  -e HF_TOKEN=<yourtoken> \
  -p 8000:8000 \
  ${NIM_LLM_IMAGE} \
  hf://meta-llama/Llama-3.1-8B-Instruct \
  --tensor-parallel-size 2

The following vLLM CLI arguments are supported:

Argument

Purpose

Default

--tensor-parallel-size

Number of GPUs

1

--pipeline-parallel-size

Number of nodes

1

--enable-lora

Enable LoRA adapter support

Disabled

Listing Profiles#

Use the list-model-profiles command to view the profiles generated for a given model:

docker run --gpus=all \
  -e NIM_MODEL_PATH=hf://meta-llama/Llama-3.1-8B-Instruct \
  -e HF_TOKEN=<yourtoken> \
  ${NIM_LLM_IMAGE} \
  list-model-profiles

For more details on profile selection, refer to Model Profiles and Selection.

S3-Specific Environment Variables#

Variable

Required

Purpose

AWS_ACCESS_KEY_ID

Yes

AWS access key

AWS_SECRET_ACCESS_KEY

Yes

AWS secret key

AWS_REGION or AWS_DEFAULT_REGION

Yes

AWS region (e.g., us-east-1)

AWS_ENDPOINT_URL

Only for S3-compatible

Custom endpoint (e.g., http://localhost:9000 for MinIO)

AWS_S3_USE_PATH_STYLE

Only for S3-compatible

Set to true for MinIO or path-style S3 endpoints

For complete authentication and configuration details for each source, refer to Model Download.

Air-Gap Deployments#

How model-free NIM behaves in an air-gap environment depends on the type of NIM_MODEL_PATH value you use.

Local path (/abs/path/to/model)#

NIM reads the model directory directly with no network access. No manifest regeneration occurs, and no credentials are needed. This is the simplest air-gap workflow — ensure the model directory is pre-staged and mounted into the container.

Remote URI (ngc://, hf://, s3://, and so on)#

NIM generates a runtime manifest from the URI on the first deployment and automatically saves a copy inside NIM_CACHE_PATH. On subsequent restarts — including in a strict air-gap environment — NIM finds the cached manifest in the same cache volume and reuses it without any outbound network or authentication calls. No additional environment variables are required.

This enables the following workflow for air-gap redeployment:

  1. First deploy (network-connected): Run the container with credentials and the remote URI. NIM downloads the model, generates the manifest, and saves a copy to NIM_CACHE_PATH.

  2. Transfer: Ensure the NIM cache is on a PVC or persistent volume that survives pod restarts, or transfer the cache directory to the air-gap environment.

  3. Redeploy (air-gapped): Mount the same cache volume. NIM finds the cached manifest and skips regeneration. No credentials or network access are required.

Tip

To force manifest regeneration after an upstream model update, delete nim_runtime_manifest.yaml from your persistent cache directory (NIM_CACHE_PATH) before restarting.

For a complete walkthrough, refer to Air-Gap Deployment.

Examples#

Hugging Face Model with TP=2#

Use this example to inspect available profiles for a Hugging Face model and then run the model with TP=2.

  1. Set the model path:

    export MODEL=hf://meta-llama/Llama-3.2-1B
    
  2. List the available profiles:

    docker run --gpus=all \
      -v $(pwd)/local_cache:/opt/nim/.cache \
      -e HF_TOKEN \
      -e NIM_MODEL_PATH=$MODEL \
      ${NIM_LLM_IMAGE} \
      list-model-profiles
    

Example output:

- Compatible with system and runnable:
  - c214460d2ad7a379660126062912d2aeecaa74a3ce14ab9966cd135de49a73f2 (vllm-tp1-pp1-0bdd169fb413e457cef3feda64108b085f73d16b6252a860e5e9ee85f533de73) [requires >=13 GB/gpu]
- With LoRA support:
  - 289b03eb8c26104f416dd0a1055004e31fd9e4b0f84fe2e59754a3ceb710976a (vllm-tp1-pp1-feat_lora-0bdd169fb413e457cef3feda64108b085f73d16b6252a860e5e9ee85f533de73) [requires >=13 GB/gpu]
  1. Start NIM by using one of the following methods:

  • Select a profile with NIM_MODEL_PROFILE:

    docker run --gpus=all \
      -v $(pwd)/local_cache:/opt/nim/.cache \
      -e HF_TOKEN \
      -e NIM_MODEL_PROFILE=vllm-tp1-pp1-0bdd169fb413e457cef3feda64108b085f73d16b6252a860e5e9ee85f533de73 \
      -e NIM_MODEL_PATH=$MODEL \
      -p 8000:8000 \
      ${NIM_LLM_IMAGE}
    
  • Override the profile through a vLLM CLI argument:

    docker run --gpus=all \
      -v $(pwd)/local_cache:/opt/nim/.cache \
      -e HF_TOKEN \
      -p 8000:8000 \
      ${NIM_LLM_IMAGE} \
      $MODEL --tensor-parallel-size 2
    

Model Hosted in S3 Using Default Profile from Automatic Profile Selection#

Use this example to serve a model from S3 and let NIM select the default compatible profile automatically.

  1. Set the model path:

    export MODEL=s3://my-bucket/my-org/my-fine-tuned-model
    
  2. Start NIM with the required S3 credentials:

    docker run --gpus=all \
      -v $(pwd)/local_cache:/opt/nim/.cache \
      -e NIM_MODEL_PATH=$MODEL \
      -e AWS_ACCESS_KEY_ID=<key> \
      -e AWS_SECRET_ACCESS_KEY=<secret> \
      -e AWS_REGION=us-east-1 \
      -p 8000:8000 \
      ${NIM_LLM_IMAGE}
    

Local Model with TP=8 Profile Specified#

Use this example to serve a local model and select a TP=8 profile explicitly.

  1. Set the model path:

    export MODEL=/mnt/models/my-120b-model
    
  2. Start NIM with the selected profile:

    docker run --gpus=all \
      -v $(pwd)/local_cache:/opt/nim/.cache \
      -v /mnt/models:/mnt/models \
      -e NIM_MODEL_PROFILE=<tp8_profile> \
      -e NIM_MODEL_PATH=$MODEL \
      -p 8000:8000 \
      ${NIM_LLM_IMAGE}