> For clean Markdown of any page, append .md to the page URL. > For a complete documentation index, see https://docs.nvidia.com/nemo/curator/llms.txt. > For full documentation content, see https://docs.nvidia.com/nemo/curator/llms-full.txt. # nemo_curator.utils.vllm_utils Shared vLLM setup utilities. These helpers centralise the boilerplate that every vLLM-based inference stage needs: finding a free port, initialising an :class:`vllm.LLM` engine with automatic port-collision retry, and resolving an HuggingFace model ID to a local snapshot path. They were extracted from the Nemotron-Parse inference stage, which was the first stage in NeMo Curator to be tested at scale (320x H100). Future stages that use vLLM (video, text, audio) should import from here rather than duplicating this logic. See GitHub issue #1720 for the roadmap to wire these utilities into other modalities. ## Module Contents ### Functions | Name | Description | | ------------------------------------------------------------------------------------- | ------------------------------------------------------------------------ | | [`create_vllm_llm`](#nemo_curator-utils-vllm_utils-create_vllm_llm) | Create a :class:`vllm.LLM` instance with automatic port-collision retry. | | [`pick_free_port`](#nemo_curator-utils-vllm_utils-pick_free_port) | Return a free TCP port on the local machine. | | [`resolve_local_model_path`](#nemo_curator-utils-vllm_utils-resolve_local_model_path) | Resolve an HF model ID to a local snapshot path. | ### API ```python nemo_curator.utils.vllm_utils.create_vllm_llm( model_path: str, max_num_seqs: int = 64, enforce_eager: bool = False, dtype: str = 'bfloat16', trust_remote_code: bool = True, limit_mm_per_prompt: dict | None = None, max_port_retries: int = 3 ) -> 'vllm.LLM' ``` Create a :class:`vllm.LLM` instance with automatic port-collision retry. vLLM selects a MASTER\_PORT for the distributed backend at startup. On a busy node the chosen port may already be in use, causing an `EADDRINUSE` `RuntimeError`. This helper picks a fresh free port on each attempt so that transient collisions are handled transparently. ## Parameters model\_path: Local path or HuggingFace model ID to load. max\_num\_seqs: Maximum number of sequences vLLM processes concurrently. enforce\_eager: Disable CUDA graph capture (slower but uses less memory). dtype: Model weight dtype passed to vLLM (e.g. `"bfloat16"`). trust\_remote\_code: Whether to trust remote code in the model repository. limit\_mm\_per\_prompt: Multimodal token limits per prompt (e.g. `{"image": 1}`). Defaults to `{"image": 1}` when `None`. max\_port\_retries: Number of port-pick attempts before re-raising the error. ```python nemo_curator.utils.vllm_utils.pick_free_port() -> int ``` Return a free TCP port on the local machine. ```python nemo_curator.utils.vllm_utils.resolve_local_model_path( model_path: str ) -> str ``` Resolve an HF model ID to a local snapshot path. Uses `local_files_only=True` so that workers on compute nodes never attempt to reach the internet. The model must be pre-downloaded (e.g. via `huggingface-cli download`) before submitting the job. ## Parameters model\_path: HuggingFace model ID or an already-local path. If the path is already a local directory it is returned unchanged.