CLI Reference#
NIM LLM Command-Line Interface#
NIM for LLMs is invoked as a container entrypoint. The first positional argument selects the action to perform; any remaining arguments are passed through to the underlying backend (vLLM).
Synopsis#
docker run <image> [ACTION] [OPTIONS...]
When no action is specified, nim-serve is used by default.
Actions#
nim-serve#
Start the NIM inference server. This is the default action.
Remaining arguments are forwarded to vLLM. Supports the
--dry-run flag to preview the resolved configuration
without launching.
docker run <image>
docker run <image> nim-serve --max-model-len 4096
docker run <image> nim-serve --dry-run
nim-serve-ray#
Start the NIM server with Ray for multi-node
tensor-parallel inference. Requires --head (leader)
or --head-address (worker).
docker run <image> nim-serve-ray --head --port 6379
docker run <image> nim-serve-ray --head-address host:6379
list-model-profiles#
List all available model profiles for the current hardware.
docker run <image> list-model-profiles
download-to-cache#
Download the selected model profile to the local cache
without starting the server. Accepts --all to
download all profiles, or -p <profile> for a
specific one.
docker run <image> download-to-cache
docker run <image> download-to-cache -p <profile_id>
create-model-store#
Materialize a cached model profile into a model store
directory. Requires -p <profile> and -m <path>.
docker run <image> create-model-store -p <profile_id> -m /path/to/store
show-build-info#
Print build metadata (git SHAs, versions).
docker run <image> show-build-info
mirror#
Mirror cached model profiles to an S3-compatible bucket or GCS bucket.
docker run <image> mirror s3 -b mybucket
docker run <image> mirror gcs -b my-gcs-bucket