> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/curator/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/curator/llms-full.txt.

# nemo_curator.utils.vllm_utils

Shared vLLM setup utilities.

These helpers centralise the boilerplate that every vLLM-based inference stage
needs: finding a free port, initialising an :class:`vllm.LLM` engine with
automatic port-collision retry, and resolving an HuggingFace model ID to a
local snapshot path.

They were extracted from the Nemotron-Parse inference stage, which was the
first stage in NeMo Curator to be tested at scale (320x H100).  Future stages
that use vLLM (video, text, audio) should import from here rather than
duplicating this logic.  See GitHub issue #1720 for the roadmap to wire these
utilities into other modalities.

## Module Contents

### Functions

| Name                                                                                  | Description                                                              |
| ------------------------------------------------------------------------------------- | ------------------------------------------------------------------------ |
| [`create_vllm_llm`](#nemo_curator-utils-vllm_utils-create_vllm_llm)                   | Create a :class:`vllm.LLM` instance with automatic port-collision retry. |
| [`pick_free_port`](#nemo_curator-utils-vllm_utils-pick_free_port)                     | Return a free TCP port on the local machine.                             |
| [`resolve_local_model_path`](#nemo_curator-utils-vllm_utils-resolve_local_model_path) | Resolve an HF model ID to a local snapshot path.                         |

### API

<Anchor id="nemo_curator-utils-vllm_utils-create_vllm_llm">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.utils.vllm_utils.create_vllm_llm(
        model_path: str,
        max_num_seqs: int = 64,
        enforce_eager: bool = False,
        dtype: str = 'bfloat16',
        trust_remote_code: bool = True,
        limit_mm_per_prompt: dict | None = None,
        max_port_retries: int = 3
    ) -> 'vllm.LLM'
    ```
  </CodeBlock>
</Anchor>

<Indent>
  Create a :class:`vllm.LLM` instance with automatic port-collision retry.

  vLLM selects a MASTER\_PORT for the distributed backend at startup.  On a
  busy node the chosen port may already be in use, causing an
  `EADDRINUSE` `RuntimeError`.  This helper picks a fresh free port on
  each attempt so that transient collisions are handled transparently.

  ## Parameters

  model\_path:
  Local path or HuggingFace model ID to load.
  max\_num\_seqs:
  Maximum number of sequences vLLM processes concurrently.
  enforce\_eager:
  Disable CUDA graph capture (slower but uses less memory).
  dtype:
  Model weight dtype passed to vLLM (e.g. `"bfloat16"`).
  trust\_remote\_code:
  Whether to trust remote code in the model repository.
  limit\_mm\_per\_prompt:
  Multimodal token limits per prompt (e.g. `&#123;"image": 1&#125;`).
  Defaults to `&#123;"image": 1&#125;` when `None`.
  max\_port\_retries:
  Number of port-pick attempts before re-raising the error.
</Indent>

<Anchor id="nemo_curator-utils-vllm_utils-pick_free_port">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.utils.vllm_utils.pick_free_port() -> int
    ```
  </CodeBlock>
</Anchor>

<Indent>
  Return a free TCP port on the local machine.
</Indent>

<Anchor id="nemo_curator-utils-vllm_utils-resolve_local_model_path">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.utils.vllm_utils.resolve_local_model_path(
        model_path: str
    ) -> str
    ```
  </CodeBlock>
</Anchor>

<Indent>
  Resolve an HF model ID to a local snapshot path.

  Uses `local_files_only=True` so that workers on compute nodes never
  attempt to reach the internet.  The model must be pre-downloaded (e.g.
  via `huggingface-cli download`) before submitting the job.

  ## Parameters

  model\_path:
  HuggingFace model ID or an already-local path.  If the path is
  already a local directory it is returned unchanged.
</Indent>