nemo_rl.weight_sync.http_weight_synchronizer#

HTTP weight synchronizer for colocated SGLang generation.

Handles weight transfer between a colocated policy and SGLang generation backend using HTTP streaming. SGLang exposes an HTTP endpoint for weight updates, so the policy streams weights directly to SGLang servers.

Lifecycle per sync:

  1. policy.offload_before_refit() – free GPU for weight staging

  2. generation.prepare_for_generation(tags=[“weights”]) – allocate buffers

  3. generation.invalidate_kv_cache() – clear stale KV cache

  4. policy.stream_weights_via_http() – push weights via HTTP

  5. policy.offload_after_refit() – restore optimizer state

  6. generation.prepare_for_generation(tags=[“kv_cache”]) – rebuild KV cache

Module Contents#

Classes#

HTTPWeightSynchronizer

Weight synchronizer using HTTP for colocated SGLang deployments.

API#

class nemo_rl.weight_sync.http_weight_synchronizer.HTTPWeightSynchronizer(policy: Any, generation: Any)#

Bases: nemo_rl.weight_sync.interfaces.WeightSynchronizer

Weight synchronizer using HTTP for colocated SGLang deployments.

Both the policy and generation workers run on the same GPUs. Weights are streamed to SGLang servers via their HTTP weight-update API.

Parameters:
  • policy – Policy object implementing ColocatablePolicyInterface.

  • generation – SGLangGeneration instance exposing get_sglang_url_to_gpu_uuids().

Initialization

sync_weights(
*,
timer: Optional[nemo_rl.utils.timer.Timer] = None,
kv_scales: Optional[dict[str, float]] = None,
) None#
property is_stale: bool#
mark_stale() None#
init_communicator() None#
shutdown() None#