`nemo_rl.algorithms.async_utils.trajectory_collector`#

Module Contents#

Classes#

AsyncTrajectoryCollector

Collects trajectories asynchronously and adds them to replay buffer.

Data#

TokenizerType

API#

nemo_rl.algorithms.async_utils.trajectory_collector.TokenizerType#: None

class nemo_rl.algorithms.async_utils.trajectory_collector.AsyncTrajectoryCollector( policy_generation: nemo_rl.models.generation.interfaces.GenerationInterface, tokenizer: nemo_rl.algorithms.async_utils.trajectory_collector.TokenizerType, task_to_env: dict[str, nemo_rl.environments.interfaces.EnvironmentInterface], master_config: nemo_rl.algorithms.grpo.MasterConfig, replay_buffer: Any, start_step: int = 0, teacher_worker_groups: Optional[dict[str, Any]] = None, alias_to_group_alias: Optional[dict[str, str]] = None, on_policy_distillation_cfg: Optional[dict[str, Any]] = None, )#

Collects trajectories asynchronously and adds them to replay buffer.

Initialization

_calculate_target_weights(generation_weight_version: int) → list[int]#

Calculate target weight versions for given generation weight version.

The list of versions returned enumerate the possible version a generation server can target. These versions are looped over to see what training step they can target. If all target versions are exhausted, this generation server will remain idle until the next weight update.

Example: generation_weight_version = 10 max_trajectory_age_steps = 4

Returns:: [11, 12, 13, 14] # Meaning this generation server can create trajectories for training step 11, 12, 13, 14

_get_next_target_for_generation( generation_weight_version: int, ) → Optional[int]#: Get the next target weight that needs generation (if any).

set_weight_version(version: int) → None#

_should_pause_for_generation_limits() → bool#: Check if collection should be paused due to generation limits.

start_collection( dataloader: torchdata.stateful_dataloader.StatefulDataLoader, ) → None#: Start collecting trajectories from dataloader.

is_data_exhausted() → bool#: Check if collection stopped because the dataloader ran out of data.

get_status() → dict#: Return a snapshot of the collector’s internal state for driver-side diagnostics.

_collection_loop()#: Run the collection loop in background thread.

_process_batch( batch: nemo_rl.distributed.batched_data_dict.BatchedDataDict[nemo_rl.data.interfaces.DatumSpec], ) → None#: Process a single batch and generate for one target weight.

get_weight_version() → int#

pause() → None#: Pause trajectory collection.

resume() → None#: Resume trajectory collection.

prepare_for_refit() → None#

Pause new generation starts and optionally wait for pending generations.

For backends with an async engine in-flight weight updates allows ongoing generations to continue with their current KV caches while weights are updated. This significantly improves async performance.

For non-async engines, waits for all pending generations to complete before refit.

resume_after_refit() → None#: Resume new generation starts after refit is complete.

wait_for_pending_generations() → None#: Wait for all in-flight generation threads to complete.

get_dataloader_state() → dict#: Get the current dataloader state for checkpointing.

get_efficiency_metrics() → dict[str, float]#

Return accumulated efficiency metrics (sum of durations per category).

Called by the driver process each step to merge collector-side metrics.

_cleanup_finished_threads() → None#

_maybe_release_target(target_weight_version: int) → None#

Release a target’s reservation once all its workers have completed.

A worker counts as “completed” whether or not it managed to buffer a trajectory. The reservation is released exactly when the number of completed workers reaches the number spawned for the target and no more workers are being spawned for the same target. Safe to call repeatedly: the reservation is discarded at most once and the per-target counters are dropped on release (so a later re-reservation starts from a clean slate and the dicts don’t grow unbounded).

_compute_teacher_logprobs( input_ids: torch.Tensor, agent_refs: list[dict[str, Any]], input_lengths: Optional[torch.Tensor] = None, ) → tuple[torch.Tensor, float]#

Compute teacher logprobs for non-colocated teachers.

Groups samples by teacher, fans out in parallel, stitches results.

Parameters:

input_ids – [B, S] tokenized input tensor
agent_refs – list of B agent reference dicts
input_lengths – [B] per-sample lengths (required for sequence packing)

Returns:

([B, S] teacher logprobs tensor, total_time_seconds)

_run_prompt_group_worker( repeated_batch: nemo_rl.distributed.batched_data_dict.BatchedDataDict[nemo_rl.data.interfaces.DatumSpec], generation_weight_version: int, target_weight_version: int, prompt_idx: int, ) → None#

nemo_rl.algorithms.async_utils.trajectory_collector#

Module Contents#

Classes#

Data#

API#

`nemo_rl.algorithms.async_utils.trajectory_collector`#