nemo_rl.weight_sync.http_weight_synchronizer#
HTTP weight synchronizer for colocated SGLang generation.
Handles weight transfer between a colocated policy and SGLang generation backend using HTTP streaming. SGLang exposes an HTTP endpoint for weight updates, so the policy streams weights directly to SGLang servers.
Lifecycle per sync:
policy.offload_before_refit() – free GPU for weight staging
generation.prepare_for_generation(tags=[“weights”]) – allocate buffers
generation.invalidate_kv_cache() – clear stale KV cache
policy.stream_weights_via_http() – push weights via HTTP
policy.offload_after_refit() – restore optimizer state
generation.prepare_for_generation(tags=[“kv_cache”]) – rebuild KV cache
Module Contents#
Classes#
Weight synchronizer using HTTP for colocated SGLang deployments. |
API#
- class nemo_rl.weight_sync.http_weight_synchronizer.HTTPWeightSynchronizer(policy: Any, generation: Any)#
Bases:
nemo_rl.weight_sync.interfaces.WeightSynchronizerWeight synchronizer using HTTP for colocated SGLang deployments.
Both the policy and generation workers run on the same GPUs. Weights are streamed to SGLang servers via their HTTP weight-update API.
- Parameters:
policy – Policy object implementing ColocatablePolicyInterface.
generation – SGLangGeneration instance exposing get_sglang_url_to_gpu_uuids().
Initialization
- sync_weights(
- *,
- timer: Optional[nemo_rl.utils.timer.Timer] = None,
- kv_scales: Optional[dict[str, float]] = None,
- property is_stale: bool#
- mark_stale() None#
- init_communicator() None#
- shutdown() None#