`nemo_rl.weight_sync.http_weight_synchronizer`#

HTTP weight synchronizer for colocated SGLang generation.

Handles weight transfer between a colocated policy and SGLang generation backend using HTTP streaming. SGLang exposes an HTTP endpoint for weight updates, so the policy streams weights directly to SGLang servers.

Lifecycle per sync:

policy.offload_before_refit() – free GPU for weight staging
generation.prepare_for_generation(tags=[“weights”]) – allocate buffers
generation.invalidate_kv_cache() – clear stale KV cache
policy.stream_weights_via_http() – push weights via HTTP
policy.offload_after_refit() – restore optimizer state
generation.prepare_for_generation(tags=[“kv_cache”]) – rebuild KV cache

Module Contents#

Classes#

HTTPWeightSynchronizer

Weight synchronizer using HTTP for colocated SGLang deployments.

API#

class nemo_rl.weight_sync.http_weight_synchronizer.HTTPWeightSynchronizer(policy: Any, generation: Any)#

Bases: nemo_rl.weight_sync.interfaces.WeightSynchronizer

Weight synchronizer using HTTP for colocated SGLang deployments.

Both the policy and generation workers run on the same GPUs. Weights are streamed to SGLang servers via their HTTP weight-update API.

Parameters:

policy – Policy object implementing ColocatablePolicyInterface.
generation – SGLangGeneration instance exposing get_sglang_url_to_gpu_uuids().

Initialization

sync_weights( *, timer: Optional[nemo_rl.utils.timer.Timer] = None, kv_scales: Optional[dict[str, float]] = None, ) → None#

property is_stale: bool#

mark_stale() → None#

init_communicator() → None#

shutdown() → None#

nemo_rl.weight_sync.http_weight_synchronizer#

Module Contents#

Classes#

API#

`nemo_rl.weight_sync.http_weight_synchronizer`#