CLI Reference#

NIM LLM Command-Line Interface#

NIM for LLMs is invoked as a container entrypoint. The first positional argument selects the action to perform; any remaining arguments are passed through to the underlying backend (vLLM).

Synopsis#

docker run <image> [ACTION] [OPTIONS...]

When no action is specified, nim-serve is used by default.

Actions#

nim-serve#

Start the NIM inference server. This is the default action. Remaining arguments are forwarded to vLLM. Supports the --dry-run flag to preview the resolved configuration without launching.

docker run <image>
docker run <image> nim-serve --max-model-len 4096
docker run <image> nim-serve --dry-run

nim-serve-ray#

Start the NIM server with Ray for multi-node tensor-parallel inference. Requires --head (leader) or --head-address (worker).

docker run <image> nim-serve-ray --head --port 6379
docker run <image> nim-serve-ray --head-address host:6379

list-model-profiles#

List all available model profiles for the current hardware.

docker run <image> list-model-profiles

download-to-cache#

Download the selected model profile to the local cache without starting the server. Accepts --all to download all profiles, or -p <profile> for a specific one.

docker run <image> download-to-cache
docker run <image> download-to-cache -p <profile_id>

create-model-store#

Materialize a cached model profile into a model store directory. Requires -p <profile> and -m <path>.

docker run <image> create-model-store -p <profile_id> -m /path/to/store

show-build-info#

Print build metadata (git SHAs, versions).

docker run <image> show-build-info

mirror#

Mirror cached model profiles to an S3-compatible bucket or GCS bucket.

docker run <image> mirror s3 -b mybucket
docker run <image> mirror gcs -b my-gcs-bucket