Model-Free NIM#

Run any supported model without a model-specific container image.

Overview#

By default, NIM containers ship with a baked-in model manifest that defines which model to serve. Model-free mode lets you point a generic NIM container at any model — a Hugging Face repo, an NGC model, an S3 bucket, or a local directory — and NIM will generate a manifest at startup and serve that model.

Model-free NIM is useful for:

  • Flexible single-container deployments — one container image passes security review and serves any supported model.

  • Day-zero model support — serve newly released models without waiting for a model-specific NIM container.

  • Custom and fine-tuned models — serve your own models from any supported source.

Note

Regardless of where the model is hosted, if its architecture is unsupported by vLLM, it will also be unsupported by NIM.

Configuring the Model#

Supply the model via either of these methods. If both are provided, the CLI positional argument takes precedence.

NIM_MODEL_PATH Environment Variable#

docker run --gpus=all \
  -e NIM_MODEL_PATH=hf://meta-llama/Llama-3.1-8B-Instruct \
  -e HF_TOKEN=<yourtoken> \
  -p 8000:8000 \
  ${NIM_LLM_IMAGE}

vLLM CLI Positional Argument#

docker run --gpus=all \
  -e HF_TOKEN=<yourtoken> \
  -p 8000:8000 \
  ${NIM_LLM_IMAGE} \
  hf://meta-llama/Llama-3.1-8B-Instruct

Supported Model Sources#

Prefix

Source

Example

Authentication

hf://

Hugging Face Hub

hf://meta-llama/Llama-3.1-8B-Instruct

HF_TOKEN

ngc://

NVIDIA NGC

ngc://nim/meta/llama-3.3-70b-instruct:hf

NGC_API_KEY

s3://

AWS S3 / S3-compatible

s3://my-bucket/my-org/my-model

AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY (additional S3 variables)

modelscope://

ModelScope Hub

modelscope://LLM-Research/Llama-3.2-1B-Instruct:d3e55134

MODELSCOPE_API_TOKEN

gs://

Google Cloud Storage

gs://my-bucket/my-org/my-model

GOOGLE_APPLICATION_CREDENTIALS (or ADC)

(absolute path)

Local directory

/mnt/models/my-llama

None

For details on downloading models from each source (including URI formats, proxy configuration, and cache management), refer to Model Download.

Configuring Deployment Options#

Model-free NIM generates profiles for combinations of tensor parallelism (TP), pipeline parallelism (PP), and LoRA. To select a deployment configuration, use either of the following methods. If both are provided, vLLM CLI arguments take precedence.

NIM_MODEL_PROFILE Environment Variable#

Run list-model-profiles to see available profiles, then select one:

docker run --gpus=all \
  -e NIM_MODEL_PATH=hf://meta-llama/Llama-3.1-8B-Instruct \
  -e NIM_MODEL_PROFILE=vllm-bf16-tp2-pp1 \
  -e HF_TOKEN=<yourtoken> \
  -p 8000:8000 \
  ${NIM_LLM_IMAGE}

vLLM CLI Arguments#

docker run --gpus=all \
  -e HF_TOKEN=<yourtoken> \
  -p 8000:8000 \
  ${NIM_LLM_IMAGE} \
  hf://meta-llama/Llama-3.1-8B-Instruct \
  -tp 2

The following vLLM CLI arguments are supported:

Argument

Purpose

Default

-tp, --tensor-parallel-size

Number of GPUs

1

-pp, --pipeline-parallel-size

Number of nodes

1

--enable-lora

Enable LoRA adapter support

Disabled

Listing Profiles#

Use the list-model-profiles command to view the profiles generated for a given model:

docker run --gpus=all \
  -e NIM_MODEL_PATH=hf://meta-llama/Llama-3.1-8B-Instruct \
  -e HF_TOKEN=<yourtoken> \
  ${NIM_LLM_IMAGE} \
  list-model-profiles

For more details on profile selection, refer to Model Profiles and Selection.

S3-Specific Environment Variables#

Variable

Required

Purpose

AWS_ACCESS_KEY_ID

Yes

AWS access key

AWS_SECRET_ACCESS_KEY

Yes

AWS secret key

AWS_REGION or AWS_DEFAULT_REGION

Yes

AWS region (e.g., us-east-1)

AWS_ENDPOINT_URL

Only for S3-compatible

Custom endpoint (e.g., http://localhost:9000 for MinIO)

AWS_S3_USE_PATH_STYLE

Only for S3-compatible

Set to true for MinIO or path-style S3 endpoints

For complete authentication and configuration details for each source, refer to Model Download.

Examples#

HuggingFace Model with TP=2#

export MODEL=hf://meta-llama/Llama-3.2-1B

# List available profiles
docker run --gpus=all \
  -v $(pwd)/local_cache:/opt/nim/.cache \
  -e HF_TOKEN \
  -e NIM_MODEL_PATH=$MODEL \
  ${NIM_LLM_IMAGE} \
  list-model-profiles

# Example output:
# - Compatible with system and runnable:
#   - c214460d2ad7a379660126062912d2aeecaa74a3ce14ab9966cd135de49a73f2 (vllm-tp1-pp1-0bdd169fb413e457cef3feda64108b085f73d16b6252a860e5e9ee85f533de73) [requires >=13 GB/gpu]
#   - With LoRA support:
#     - 289b03eb8c26104f416dd0a1055004e31fd9e4b0f84fe2e59754a3ceb710976a (vllm-tp1-pp1-feat_lora-0bdd169fb413e457cef3feda64108b085f73d16b6252a860e5e9ee85f533de73) [requires >=13 GB/gpu]

# Run with profile selected via NIM_MODEL_PROFILE
docker run --gpus=all \
  -v $(pwd)/local_cache:/opt/nim/.cache \
  -e HF_TOKEN \
  -e NIM_MODEL_PROFILE=vllm-tp1-pp1-0bdd169fb413e457cef3feda64108b085f73d16b6252a860e5e9ee85f533de73 \
  -e NIM_MODEL_PATH=$MODEL \
  -p 8000:8000 \
  ${NIM_LLM_IMAGE}

# OR run with profile selected via vLLM CLI argument override
docker run --gpus=all \
  -v $(pwd)/local_cache:/opt/nim/.cache \
  -e HF_TOKEN \
  -p 8000:8000 \
  ${NIM_LLM_IMAGE} \
  $MODEL -tp 2

Model hosted in S3 using default profile from automatic profile selection#

export MODEL=s3://my-bucket/my-org/my-fine-tuned-model

docker run --gpus=all \
  -v $(pwd)/local_cache:/opt/nim/.cache \
  -e NIM_MODEL_PATH=$MODEL \
  -e AWS_ACCESS_KEY_ID=<key> \
  -e AWS_SECRET_ACCESS_KEY=<secret> \
  -e AWS_REGION=us-east-1 \
  -p 8000:8000 \
  ${NIM_LLM_IMAGE}

Local Model with TP=8 profile specified#

export MODEL=/mnt/models/my-120b-model

docker run --gpus=all \
  -v $(pwd)/local_cache:/opt/nim/.cache \
  -v /mnt/models:/mnt/models \
  -e NIM_MODEL_PROFILE=<tp8_profile> \
  -e NIM_MODEL_PATH=$MODEL \
  -p 8000:8000 \
  ${NIM_LLM_IMAGE}