nemo_rl.weight_sync.ipc_weight_synchronizer#
IPC (ZMQ) weight synchronizer for colocated vLLM generation.
Handles weight transfer between a colocated policy and vLLM generation backend using ZMQ IPC sockets and CUDA IPC handles. This is the primary transport for colocated vLLM deployments.
Lifecycle per sync:
policy.offload_before_refit() – free GPU for weight staging
generation.prepare_for_generation(tags=[“weights”]) – allocate buffers
policy.stream_weights_via_ipc_zmq() – send weights via ZMQ generation.update_weights_via_ipc_zmq() – receive weights
policy.offload_after_refit() – restore optimizer state
generation.prepare_for_generation(tags=[“kv_cache”]) – rebuild KV cache
Module Contents#
Classes#
Weight synchronizer using ZMQ IPC for colocated vLLM deployments. |
API#
- class nemo_rl.weight_sync.ipc_weight_synchronizer.IPCWeightSynchronizer(
- policy: Any,
- generation: Any,
- refit_buffer_size_gb: Optional[int] = None,
Bases:
nemo_rl.weight_sync.interfaces.WeightSynchronizerWeight synchronizer using ZMQ IPC for colocated vLLM deployments.
Both the policy and generation workers run on the same GPUs. Weights are transferred via CUDA IPC handles over ZMQ sockets, avoiding any network overhead.
- Parameters:
policy – Policy object implementing ColocatablePolicyInterface.
generation – Generation object implementing GenerationInterface (concretely a VllmGeneration instance).
refit_buffer_size_gb – Fixed buffer size in GB for weight staging. If None, buffer size is computed dynamically from free GPU memory.
Initialization
- sync_weights(
- *,
- timer: Optional[nemo_rl.utils.timer.Timer] = None,
- kv_scales: Optional[dict[str, float]] = None,
- property is_stale: bool#
- mark_stale() None#
- init_communicator() None#
- shutdown() None#
- _compute_buffer_size() int#