nemo_rl.models.generation.vllm.vllm_backend#
Module Contents#
Classes#
API#
- class nemo_rl.models.generation.vllm.vllm_backend.VllmInternalWorkerExtension#
- init_collective(
- rank_prefix: int,
- ip: str,
- port: int,
- world_size: int,
- train_world_size: int,
Initialize the collective communication.
- report_device_id() str#
Retrieve the UUID of the current CUDA device.
- get_zmq_address()#
Get the ZMQ address for the current device.
- maybe_init_zmq()#
Initialize the ZMQ socket if it doesn’t exist.
- prepare_refit_info(state_dict_info: dict[str, Any]) None#
Prepare state dict metadata for weight refitting and IPC streaming.
- Parameters:
state_dict_info (dict) – A dictionary containing the info for refit. e.g. {tensor_name: (shape, dtype)}
- _maybe_process_fp8_kv_cache() None#
Process weights after loading for FP8 KV cache (static scales).
- static _split_policy_and_draft_weights(
- weights: list[tuple[str, torch.Tensor]],
Split trainer-owned draft weights from policy weights.
This path is only used for the Eagle3 online-training flow, where the trainer exports draft parameters under a
draft.prefix before sending them to vLLM. This implementation is specific to the eagle model. For MTP, we can add similar logic to this function to split weights and send it to the drafter. The “draft.” prefix is added here https://github.com/isomap/RL/blob/d3a5e1396d00f82fb888d9ec6800687a23bb4017/nemo_rl/models/policy/workers/megatron_policy_worker.py#L967-L997
- _load_draft_weights(
- draft_weights: list[tuple[str, torch.Tensor]],
- update_weights_via_ipc_zmq() bool#
Receive and update model weights via ZMQ IPC socket.
- Returns:
True if weights were successfully updated.
- Return type:
bool
- update_weights_from_collective() bool#
Update the model weights from collective communication.
- cleanup() None#
Shutdown and cleanup resources.
- start_gpu_profiling() None#
Start GPU profiling.
- stop_gpu_profiling() None#
Stop GPU profiling.