`nemo_rl.environments.penguin`#

Module Contents#

Classes#

`PenguinConfig`
`Penguin`	This environment class isn’t really used for training. It’s really meant as an integration wrapper around Penguin that hooks into the existing NeMo RL resource management via ray. So there is still one source of truth for resource management in NeMo RL.

Functions#

`setup_penguin_config`
`penguin_example_to_nemo_rl_datum_spec`

API#

class nemo_rl.environments.penguin.PenguinConfig#

Bases: typing.TypedDict

model_name: str#: None

base_urls: List[str]#: None

initial_global_config_dict: Dict[str, Any]#: None

class nemo_rl.environments.penguin.Penguin(cfg: nemo_rl.environments.penguin.PenguinConfig)#

Bases: nemo_rl.environments.interfaces.EnvironmentInterface

This environment class isn’t really used for training. It’s really meant as an integration wrapper around Penguin that hooks into the existing NeMo RL resource management via ray. So there is still one source of truth for resource management in NeMo RL.

Initialization

health_check() → bool#

async run_rollouts( penguin_examples: list[dict], tokenizer: transformers.PreTrainedTokenizerBase, timer_prefix: str, ) → list[dict]#

_postprocess_penguin_to_nemo_rl_result( penguin_result: dict, tokenizer: transformers.PreTrainedTokenizerBase, ) → dict#

shutdown() → None#

abstractmethod step(message_log_batch, metadata)#

abstractmethod global_post_process_and_metrics(batch)#

nemo_rl.environments.penguin.setup_penguin_config(config, tokenizer) → None#

nemo_rl.environments.penguin.penguin_example_to_nemo_rl_datum_spec( penguin_example: dict, idx: int, ) → nemo_rl.data.interfaces.DatumSpec#

nemo_rl.environments.penguin#

Module Contents#

Classes#

Functions#

API#

`nemo_rl.environments.penguin`#