nemo_curator.core.serve.dynamo.vllm

View as Markdown

Dynamo vLLM worker launch helpers for aggregated and disaggregated serving.

Module Contents

Functions

NameDescription
_launch_disagg_roleLaunch the N workers for a single disagg role (prefill or decode).
_launch_vllm_workerSpawn one python -m dynamo.vllm actor, pinned to bundle node_rank.
_worker_subprocess_env-
_write_actor_overrides_file-
aggregated_model_uses_exact_kv_eventsTrue if this aggregated model should publish ZMQ KV events.
build_worker_kv_events_configJSON blob for --kv-events-config.
dynamo_runtime_envMerge the user’s runtime_env with the Dynamo-vLLM package pin.
ensure_actor_overrides_on_all_nodesWrite the actor-venv --override file at a fixed path on every alive node.
launch_disagg_replicasPlan PGs and launch every worker actor for one disagg model.
launch_replicasPlan PGs and launch every worker actor for one non-disagg model.
merge_model_runtime_envsMerge every model’s runtime_env onto the Dynamo-vLLM pin for the shared frontend actor.
plan_disagg_shapePlan a single-bundle PG spec for one disagg worker.
resolve_disagg_role_configResolve (num_replicas, engine_kwargs) for one disagg role.

Data

DEFAULT_KV_TRANSFER_CONFIG

DYNAMO_VLLM_RUNTIME_ENV

_ACTOR_VENV_OVERRIDES_PATH

_DISAGG_KV_EVENTS_PORT_SEED

_DISAGG_NIXL_PORT_SEED

API

nemo_curator.core.serve.dynamo.vllm._launch_disagg_role(
model_config: nemo_curator.core.serve.dynamo.config.DynamoVLLMModelConfig,
base_env: dict[str, str],
role: typing.Literal['prefill', 'decode'],
num_workers: int,
engine_kwargs: dict[str, typing.Any],
publishes_kv_events: bool,
namespace: str,
request_plane: str,
event_plane: str,
component: str,
kv_transfer_config: str,
worker_index_start: int,
runtime_dir: str,
actor_name_prefix: str,
topology: list[dict[str, typing.Any]] | None
) -> tuple[list[ray.util.placement_group.PlacementGroup], list[nemo_curator.core.serve.subprocess_mgr.ManagedSubprocess], list[dict[str, typing.Any]], int]

Launch the N workers for a single disagg role (prefill or decode).

nemo_curator.core.serve.dynamo.vllm._launch_vllm_worker(
model_config: nemo_curator.core.serve.dynamo.config.DynamoVLLMModelConfig,
base_env: dict[str, str],
pg: ray.util.placement_group.PlacementGroup,
spec: nemo_curator.core.serve.placement.ReplicaBundleSpec,
replica_index: int,
node_rank: int,
master_addr: str | None,
namespace: str,
request_plane: str,
event_plane: str,
runtime_dir: str,
actor_name_prefix: str,
router_mode: str | None,
router_kv_events: bool
) -> nemo_curator.core.serve.subprocess_mgr.ManagedSubprocess

Spawn one python -m dynamo.vllm actor, pinned to bundle node_rank.

Rank 0 is the “real” worker (model registration + scheduler + KV events publisher). Rank >0 is --headless — no scheduler, so KV events are always disabled for it even if rank 0 publishes.

nemo_curator.core.serve.dynamo.vllm._worker_subprocess_env(
base_env: dict[str, str],
runtime_dir: str
) -> dict[str, str]
nemo_curator.core.serve.dynamo.vllm._write_actor_overrides_file(
path: str,
body: str
) -> None
nemo_curator.core.serve.dynamo.vllm.aggregated_model_uses_exact_kv_events(
model_config: nemo_curator.core.serve.dynamo.config.DynamoVLLMModelConfig,
router_mode: str | None,
router_kv_events: bool
) -> bool

True if this aggregated model should publish ZMQ KV events.

nemo_curator.core.serve.dynamo.vllm.build_worker_kv_events_config(
model_config: nemo_curator.core.serve.dynamo.config.DynamoVLLMModelConfig,
pg: ray.util.placement_group.PlacementGroup,
bundle_index: int,
port_seed: int,
enabled: bool
) -> str

JSON blob for --kv-events-config.

Always passed explicitly. Without this, Dynamo’s args.py auto-creates a KVEventsConfig bound to tcp://*:20080 when prefix_caching is enabled (vLLM >=0.16 default), causing every worker on the same node to fight over the same port.

nemo_curator.core.serve.dynamo.vllm.dynamo_runtime_env(
model_config: nemo_curator.core.serve.dynamo.config.DynamoVLLMModelConfig
) -> dict[str, typing.Any]

Merge the user’s runtime_env with the Dynamo-vLLM package pin.

nemo_curator.core.serve.dynamo.vllm.ensure_actor_overrides_on_all_nodes(
ignore_head_node: bool = False
) -> None

Write the actor-venv --override file at a fixed path on every alive node.

The file pins ray=={ray.__version__} (read from the driver) so the actor venv keeps the same ray patch as the cluster head — Ray rejects any mismatch.

Must run inside an active Ray context, before any worker spawned with :data:DYNAMO_VLLM_RUNTIME_ENV lands. The runtime_env_agent on each worker reads the file from the node-local filesystem; a single driver-side write doesn’t reach remote nodes.

Re-call after cluster topology changes (autoscale, node restart) — this is one-shot and not auto-triggered.

nemo_curator.core.serve.dynamo.vllm.launch_disagg_replicas(
model_config: nemo_curator.core.serve.dynamo.config.DynamoVLLMModelConfig,
base_env: dict[str, str],
namespace: str,
request_plane: str,
event_plane: str,
runtime_dir: str,
actor_name_prefix: str,
topology: list[dict[str, typing.Any]] | None = None,
worker_index_offset: int = 0
) -> tuple[list[ray.util.placement_group.PlacementGroup], list[nemo_curator.core.serve.subprocess_mgr.ManagedSubprocess], list[dict[str, typing.Any]]]

Plan PGs and launch every worker actor for one disagg model.

Each role (prefill/decode) becomes its own pool of single-bundle PGs so roles can scale independently. Only the prefill pool publishes KV events (decode reads them via Nixl). KV transfer defaults to NixlConnector with kv_both unless the user overrides via DynamoVLLMModelConfig.kv_transfer_config.

worker_index_offset lets the caller thread a global counter across multiple disagg models so their port seeds don’t overlap — without it, the first worker of every model lands on the same Nixl/KV-events seed and same-node placement risks a bind race.

nemo_curator.core.serve.dynamo.vllm.launch_replicas(
model_config: nemo_curator.core.serve.dynamo.config.DynamoVLLMModelConfig,
base_env: dict[str, str],
namespace: str,
request_plane: str,
event_plane: str,
runtime_dir: str,
actor_name_prefix: str,
router_mode: str | None,
router_kv_events: bool,
topology: list[dict[str, typing.Any]] | None = None
) -> tuple[list[ray.util.placement_group.PlacementGroup], list[nemo_curator.core.serve.subprocess_mgr.ManagedSubprocess], list[dict[str, typing.Any]]]

Plan PGs and launch every worker actor for one non-disagg model.

Returns (replica_pgs, worker_actors, manifest_entries); callers own the returned handles and are responsible for teardown.

nemo_curator.core.serve.dynamo.vllm.merge_model_runtime_envs(
models: list[nemo_curator.core.serve.dynamo.config.DynamoVLLMModelConfig]
) -> dict[str, typing.Any]

Merge every model’s runtime_env onto the Dynamo-vLLM pin for the shared frontend actor.

nemo_curator.core.serve.dynamo.vllm.plan_disagg_shape(
tp_size: int,
role: typing.Literal['prefill', 'decode'],
worker_index: int,
model_name: str,
topology: list[dict[str, typing.Any]] | None = None
) -> nemo_curator.core.serve.placement.ReplicaBundleSpec

Plan a single-bundle PG spec for one disagg worker.

Disagg does not support multi-node TP — each role’s TP group must fit on one node. Raise early if plan_replica_bundle_shape hands back a multi-bundle (multi-node) spec.

nemo_curator.core.serve.dynamo.vllm.resolve_disagg_role_config(
model_config: nemo_curator.core.serve.dynamo.config.DynamoVLLMModelConfig,
role: typing.Literal['prefill', 'decode']
) -> tuple[int, dict[str, typing.Any]]

Resolve (num_replicas, engine_kwargs) for one disagg role.

Role-level engine_kwargs merges over the model-wide engine_kwargs so users can override only what they need per role (for example a smaller TP on decode).

nemo_curator.core.serve.dynamo.vllm.DEFAULT_KV_TRANSFER_CONFIG: dict[str, Any] = {'kv_connector': 'NixlConnector', 'kv_role': 'kv_both'}
nemo_curator.core.serve.dynamo.vllm.DYNAMO_VLLM_RUNTIME_ENV: dict[str, Any] = {'uv': {'packages': ['ai-dynamo[vllm]'], 'uv_pip_install_options': ['--override'...
nemo_curator.core.serve.dynamo.vllm._ACTOR_VENV_OVERRIDES_PATH = Path(tempfile.gettempdir()) / 'nemo_curator_dynamo_actor_overrides.txt'
nemo_curator.core.serve.dynamo.vllm._DISAGG_KV_EVENTS_PORT_SEED = 20081
nemo_curator.core.serve.dynamo.vllm._DISAGG_NIXL_PORT_SEED = 20097