nemo_curator.utils.vllm_utils

View as Markdown

Shared vLLM setup utilities.

These helpers centralise the boilerplate that every vLLM-based inference stage needs: finding a free port, initialising an :class:vllm.LLM engine with automatic port-collision retry, and resolving an HuggingFace model ID to a local snapshot path.

They were extracted from the Nemotron-Parse inference stage, which was the first stage in NeMo Curator to be tested at scale (320x H100). Future stages that use vLLM (video, text, audio) should import from here rather than duplicating this logic. See GitHub issue #1720 for the roadmap to wire these utilities into other modalities.

Module Contents

Functions

NameDescription
create_vllm_llmCreate a :class:vllm.LLM instance with automatic port-collision retry.
pick_free_portReturn a free TCP port on the local machine.
resolve_local_model_pathResolve an HF model ID to a local snapshot path.

API

nemo_curator.utils.vllm_utils.create_vllm_llm(
model_path: str,
max_num_seqs: int = 64,
enforce_eager: bool = False,
dtype: str = 'bfloat16',
trust_remote_code: bool = True,
limit_mm_per_prompt: dict | None = None,
max_port_retries: int = 3
) -> 'vllm.LLM'

Create a :class:vllm.LLM instance with automatic port-collision retry.

vLLM selects a MASTER_PORT for the distributed backend at startup. On a busy node the chosen port may already be in use, causing an EADDRINUSE RuntimeError. This helper picks a fresh free port on each attempt so that transient collisions are handled transparently.

Parameters

model_path: Local path or HuggingFace model ID to load. max_num_seqs: Maximum number of sequences vLLM processes concurrently. enforce_eager: Disable CUDA graph capture (slower but uses less memory). dtype: Model weight dtype passed to vLLM (e.g. "bfloat16"). trust_remote_code: Whether to trust remote code in the model repository. limit_mm_per_prompt: Multimodal token limits per prompt (e.g. {"image": 1}). Defaults to {"image": 1} when None. max_port_retries: Number of port-pick attempts before re-raising the error.

nemo_curator.utils.vllm_utils.pick_free_port() -> int

Return a free TCP port on the local machine.

nemo_curator.utils.vllm_utils.resolve_local_model_path(
model_path: str
) -> str

Resolve an HF model ID to a local snapshot path.

Uses local_files_only=True so that workers on compute nodes never attempt to reach the internet. The model must be pre-downloaded (e.g. via huggingface-cli download) before submitting the job.

Parameters

model_path: HuggingFace model ID or an already-local path. If the path is already a local directory it is returned unchanged.