nemo_rl.weight_sync.ipc_weight_synchronizer#

IPC (ZMQ) weight synchronizer for colocated vLLM generation.

Handles weight transfer between a colocated policy and vLLM generation backend using ZMQ IPC sockets and CUDA IPC handles. This is the primary transport for colocated vLLM deployments.

Lifecycle per sync:

  1. policy.offload_before_refit() – free GPU for weight staging

  2. generation.prepare_for_generation(tags=[“weights”]) – allocate buffers

  3. policy.stream_weights_via_ipc_zmq() – send weights via ZMQ generation.update_weights_via_ipc_zmq() – receive weights

  4. policy.offload_after_refit() – restore optimizer state

  5. generation.prepare_for_generation(tags=[“kv_cache”]) – rebuild KV cache

Module Contents#

Classes#

IPCWeightSynchronizer

Weight synchronizer using ZMQ IPC for colocated vLLM deployments.

API#

class nemo_rl.weight_sync.ipc_weight_synchronizer.IPCWeightSynchronizer(
policy: Any,
generation: Any,
refit_buffer_size_gb: Optional[int] = None,
)#

Bases: nemo_rl.weight_sync.interfaces.WeightSynchronizer

Weight synchronizer using ZMQ IPC for colocated vLLM deployments.

Both the policy and generation workers run on the same GPUs. Weights are transferred via CUDA IPC handles over ZMQ sockets, avoiding any network overhead.

Parameters:
  • policy – Policy object implementing ColocatablePolicyInterface.

  • generation – Generation object implementing GenerationInterface (concretely a VllmGeneration instance).

  • refit_buffer_size_gb – Fixed buffer size in GB for weight staging. If None, buffer size is computed dynamically from free GPU memory.

Initialization

sync_weights(
*,
timer: Optional[nemo_rl.utils.timer.Timer] = None,
kv_scales: Optional[dict[str, float]] = None,
) None#
property is_stale: bool#
mark_stale() None#
init_communicator() None#
shutdown() None#
_compute_buffer_size() int#