nemo_curator.core.serve.placement
nemo_curator.core.serve.placement
Ray-placement-group construction and bundle operations.
Covers two concerns:
- Planning — turning a TP size + cluster topology into a
ReplicaBundleSpec(single-nodeSTRICT_PACKor multi-nodeSTRICT_SPREADwith an equal per-node split). - Construction + bundle-scoped operations —
build_pg/build_replica_pgcreate detached, named PGs and wait until ready;get_bundle_node_ip/get_free_port_in_bundlediscover where a bundle actually landed;remove_named_pgs_with_prefixreaps orphans left by a prior driver session.
Subprocess lifecycle (actors, graceful stop, CUDA/env propagation)
lives in subprocess_mgr. Backend-specific PGs (e.g. the Dynamo
etcd+NATS+frontend bundle) live in the backend’s own subpackage.
Module Contents
Classes
Functions
API
Bundle shape + strategy for a single model replica.
Return per-node GPU topology: [{"node_id", "num_gpus", "is_head"}, ...].
Uses total node resources, not current availability — topology shape is a static property of the cluster. Ray’s PG scheduler handles dynamic capacity.
Parameters:
Ray node ID to tag as head in output (for
CURATOR_IGNORE_RAY_HEAD_NODE filtering). Defaults to the
node bearing the node:__internal_head__ resource marker;
falls back to the driver’s own node id if no marker is found
(matches the behaviour used by backends/utils.py).
Pre-fetched ray.nodes() to avoid a redundant call.
Schedule remote_fn into pg’s bundle bundle_index and return the result.
Create a detached, named PG and wait until ready; clean up on failure.
Create a detached, named PG for one replica and wait until ready.
PG is created with lifetime="detached" so it survives driver
disconnects between server.start(), pipeline.run(), and
server.stop(). The caller-supplied name is used for orphan
cleanup via remove_named_pgs_with_prefix.
Return the routable IP of the node hosting pg’s bundle bundle_index.
Used to resolve the master-addr for multi-node TP after pg.ready():
the rank-0 actor will schedule into this same bundle, so its peers can
connect to this IP.
Find a free port on the node hosting pg’s bundle bundle_index.
The remote task is scheduled into the target bundle via
PlacementGroupSchedulingStrategy, so port availability is checked on
the same node where the consuming actor will bind.
Pick the bundle shape for one replica given current cluster topology.
Single-node: if any node has >= tp_size GPUs, return one bundle of
size tp_size with STRICT_PACK.
Multi-node: find the smallest nnodes such that
tp_size % nnodes == 0 and at least nnodes nodes have
>= tp_size / nnodes GPUs each. Return nnodes equal bundles with
STRICT_SPREAD. vLLM requires an even per-node split (1+3 for TP=4
fails with a CUDA device ordinal error), so asymmetric splits are never
considered.
When CURATOR_IGNORE_RAY_HEAD_NODE is set, the head node is filtered
out of topology and every bundle gets
[{"ray.io/node-type": "worker"}] as a label selector.
Remove all placement groups in the current namespace whose name starts with prefix.
Requires a live Ray connection on the current driver. Intended for orphan
cleanup after a driver restart: since PGs are namespace-scoped and named,
a reconnecting driver (with matching namespace=) can find and reap
leftover state from a prior session. Removing a PG forcibly kills all
actors scheduled into it, releasing the reserved resources.
Returns the number of PGs removed.