nemo_rl.models.generation.vllm.utils#
Module Contents#
Functions#
Format a list of prompts for vllm generation (which requires a specific format for its own |
|
Aggregate speculative decoding counters from multiple workers. |
|
Compute delta and derived metrics for speculative decoding. |
API#
- nemo_rl.models.generation.vllm.utils.format_prompt_for_vllm_generation(
- data: nemo_rl.distributed.batched_data_dict.BatchedDataDict[nemo_rl.models.generation.interfaces.GenerationDatumSpec],
- sample_idx: Optional[int] = None,
Format a list of prompts for vllm generation (which requires a specific format for its own
generatemethod).See https://docs.vllm.ai/en/v0.9.1/features/multimodal_inputs.html for prompt format for multimodal inputs.
- nemo_rl.models.generation.vllm.utils.aggregate_spec_decode_counters(
- worker_metrics: list[dict[str, float | list[float]]],
Aggregate speculative decoding counters from multiple workers.
Combines spec decode metrics collected from DP leader workers into a single aggregated counter dictionary.
- Parameters:
worker_metrics – List of metric dictionaries from each worker. Each dict maps metric names to float values or lists of floats (for per-position metrics).
- Returns:
Dictionary mapping metric names to their aggregated float values. Per-position metrics use (name, position) tuples as keys.
.. rubric:: Example
metrics_from_workers = policy_generation.get_metrics() counters = aggregate_spec_decode_counters(metrics_from_workers) print(counters.get(“vllm:spec_decode_num_drafts”, 0)) 1234.0
- nemo_rl.models.generation.vllm.utils.compute_spec_decode_metrics(
- start_counters: dict[str | tuple[str, int], float],
- end_counters: dict[str | tuple[str, int], float],
Compute delta and derived metrics for speculative decoding.
Calculates the difference between two counter snapshots and derives acceptance rate and acceptance length metrics for logging.
- Parameters:
start_counters – Counter snapshot taken before generation.
end_counters – Counter snapshot taken after generation.
- Returns:
Dictionary of metrics suitable for logging to wandb/tensorboard. Keys are prefixed with “vllm/” for namespace consistency. Includes: - vllm/spec_num_drafts: Total number of draft batches - vllm/spec_num_draft_tokens: Total draft tokens generated - vllm/spec_num_accepted_tokens: Total tokens accepted - vllm/spec_acceptance_length: Average accepted tokens per draft + 1 - vllm/spec_acceptance_rate: Ratio of accepted to draft tokens - vllm/{metric}-{position}: Per-position acceptance counts - vllm/spec_acceptance_rate-pos-{position}: Per-position acceptance rates