Model-Free NIM#

Run any supported model without a model-specific container image.

Overview#

By default, NIM containers ship with a baked-in model manifest that defines which model to serve. Model-free mode lets you point a generic NIM container at any model — a Hugging Face repo, an NGC model, an S3 bucket, or a local directory — and NIM will generate a manifest at startup and serve that model.

Model-free NIM is useful for:

Flexible single-container deployments — one container image passes security review and serves any supported model.
Day-zero model support — serve newly released models without waiting for a model-specific NIM container.
Custom and fine-tuned models — serve your own models from any supported source.

Note

Regardless of where the model is hosted, if its architecture is unsupported by vLLM, it will also be unsupported by NIM.

Configuring the Model#

Supply the model via either of these methods. If both are provided, the CLI positional argument takes precedence.

NIM_MODEL_PATH Environment Variable#

docker run --gpus=all \
  -e NIM_MODEL_PATH=hf://meta-llama/Llama-3.1-8B-Instruct \
  -e HF_TOKEN=<yourtoken> \
  -p 8000:8000 \
  ${NIM_LLM_IMAGE}

vLLM CLI Positional Argument#

docker run --gpus=all \
  -e HF_TOKEN=<yourtoken> \
  -p 8000:8000 \
  ${NIM_LLM_IMAGE} \
  hf://meta-llama/Llama-3.1-8B-Instruct

Supported Model Sources#

Prefix	Source	Example	Authentication
`hf://`	Hugging Face Hub	`hf://meta-llama/Llama-3.1-8B-Instruct`	`HF_TOKEN`
`ngc://`	NVIDIA NGC	`ngc://nim/meta/llama-3.3-70b-instruct:hf`	`NGC_API_KEY`
`s3://`	AWS S3 / S3-compatible	`s3://my-bucket/my-org/my-model`	`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY` (additional S3 variables)
`modelscope://`	ModelScope Hub	`modelscope://LLM-Research/Llama-3.2-1B-Instruct:d3e55134`	`MODELSCOPE_API_TOKEN`
`gs://`	Google Cloud Storage	`gs://my-bucket/my-org/my-model`	`GOOGLE_APPLICATION_CREDENTIALS` (or ADC)
(absolute path)	Local directory	`/mnt/models/my-llama`	None

For details on downloading models from each source (including URI formats, proxy configuration, and cache management), refer to Model Download.

Configuring Deployment Options#

Model-free NIM generates profiles for combinations of tensor parallelism (TP), pipeline parallelism (PP), and LoRA. To select a deployment configuration, use either of the following methods. If both are provided, vLLM CLI arguments take precedence.

NIM_MODEL_PROFILE Environment Variable#

Run list-model-profiles to see available profiles, then select one:

docker run --gpus=all \
  -e NIM_MODEL_PATH=hf://meta-llama/Llama-3.1-8B-Instruct \
  -e NIM_MODEL_PROFILE=vllm-bf16-tp2-pp1 \
  -e HF_TOKEN=<yourtoken> \
  -p 8000:8000 \
  ${NIM_LLM_IMAGE}

vLLM CLI Arguments#

docker run --gpus=all \
  -e HF_TOKEN=<yourtoken> \
  -p 8000:8000 \
  ${NIM_LLM_IMAGE} \
  hf://meta-llama/Llama-3.1-8B-Instruct \
  -tp 2

The following vLLM CLI arguments are supported:

Argument	Purpose	Default
`-tp`, `--tensor-parallel-size`	Number of GPUs	1
`-pp`, `--pipeline-parallel-size`	Number of nodes	1
`--enable-lora`	Enable LoRA adapter support	Disabled

Listing Profiles#

Use the list-model-profiles command to view the profiles generated for a given model:

docker run --gpus=all \
  -e NIM_MODEL_PATH=hf://meta-llama/Llama-3.1-8B-Instruct \
  -e HF_TOKEN=<yourtoken> \
  ${NIM_LLM_IMAGE} \
  list-model-profiles

For more details on profile selection, refer to Model Profiles and Selection.

S3-Specific Environment Variables#

Variable	Required	Purpose
`AWS_ACCESS_KEY_ID`	Yes	AWS access key
`AWS_SECRET_ACCESS_KEY`	Yes	AWS secret key
`AWS_REGION` or `AWS_DEFAULT_REGION`	Yes	AWS region (e.g., `us-east-1`)
`AWS_ENDPOINT_URL`	Only for S3-compatible	Custom endpoint (e.g., `http://localhost:9000` for MinIO)
`AWS_S3_USE_PATH_STYLE`	Only for S3-compatible	Set to `true` for MinIO or path-style S3 endpoints

For complete authentication and configuration details for each source, refer to Model Download.

Examples#

HuggingFace Model with TP=2#

export MODEL=hf://meta-llama/Llama-3.2-1B

# List available profiles
docker run --gpus=all \
  -v $(pwd)/local_cache:/opt/nim/.cache \
  -e HF_TOKEN \
  -e NIM_MODEL_PATH=$MODEL \
  ${NIM_LLM_IMAGE} \
  list-model-profiles

# Example output:
# - Compatible with system and runnable:
#   - c214460d2ad7a379660126062912d2aeecaa74a3ce14ab9966cd135de49a73f2 (vllm-tp1-pp1-0bdd169fb413e457cef3feda64108b085f73d16b6252a860e5e9ee85f533de73) [requires >=13 GB/gpu]
#   - With LoRA support:
#     - 289b03eb8c26104f416dd0a1055004e31fd9e4b0f84fe2e59754a3ceb710976a (vllm-tp1-pp1-feat_lora-0bdd169fb413e457cef3feda64108b085f73d16b6252a860e5e9ee85f533de73) [requires >=13 GB/gpu]

# Run with profile selected via NIM_MODEL_PROFILE
docker run --gpus=all \
  -v $(pwd)/local_cache:/opt/nim/.cache \
  -e HF_TOKEN \
  -e NIM_MODEL_PROFILE=vllm-tp1-pp1-0bdd169fb413e457cef3feda64108b085f73d16b6252a860e5e9ee85f533de73 \
  -e NIM_MODEL_PATH=$MODEL \
  -p 8000:8000 \
  ${NIM_LLM_IMAGE}

# OR run with profile selected via vLLM CLI argument override
docker run --gpus=all \
  -v $(pwd)/local_cache:/opt/nim/.cache \
  -e HF_TOKEN \
  -p 8000:8000 \
  ${NIM_LLM_IMAGE} \
  $MODEL -tp 2

Model hosted in S3 using default profile from automatic profile selection#

export MODEL=s3://my-bucket/my-org/my-fine-tuned-model

docker run --gpus=all \
  -v $(pwd)/local_cache:/opt/nim/.cache \
  -e NIM_MODEL_PATH=$MODEL \
  -e AWS_ACCESS_KEY_ID=<key> \
  -e AWS_SECRET_ACCESS_KEY=<secret> \
  -e AWS_REGION=us-east-1 \
  -p 8000:8000 \
  ${NIM_LLM_IMAGE}

Local Model with TP=8 profile specified#

export MODEL=/mnt/models/my-120b-model

docker run --gpus=all \
  -v $(pwd)/local_cache:/opt/nim/.cache \
  -v /mnt/models:/mnt/models \
  -e NIM_MODEL_PROFILE=<tp8_profile> \
  -e NIM_MODEL_PATH=$MODEL \
  -p 8000:8000 \
  ${NIM_LLM_IMAGE}