nemo_rl.environments.nemo_gym#

Module Contents#

Classes#

NemoGymConfig

NemoGym

This environment class isn’t really used for training. It’s really meant as an integration wrapper around NeMo-Gym that hooks into the existing NeMo RL resource management via ray. So there is still one source of truth for resource management in NeMo RL.

Functions#

API#

class nemo_rl.environments.nemo_gym.NemoGymConfig#

Bases: typing.TypedDict

model_name: str#

None

base_urls: List[str]#

None

initial_global_config_dict: Dict[str, Any]#

None

class nemo_rl.environments.nemo_gym.NemoGym(cfg: nemo_rl.environments.nemo_gym.NemoGymConfig)#

Bases: nemo_rl.environments.interfaces.EnvironmentInterface

This environment class isn’t really used for training. It’s really meant as an integration wrapper around NeMo-Gym that hooks into the existing NeMo RL resource management via ray. So there is still one source of truth for resource management in NeMo RL.

Initialization

health_check() bool#
async run_rollouts(
nemo_gym_examples: list[dict],
tokenizer: transformers.PreTrainedTokenizerBase,
timer_prefix: str,
) list[dict]#
_postprocess_nemo_gym_to_nemo_rl_result(
nemo_gym_result: dict,
tokenizer: transformers.PreTrainedTokenizerBase,
) dict#
shutdown() None#
abstractmethod step(message_log_batch, metadata)#
abstractmethod global_post_process_and_metrics(batch)#
nemo_rl.environments.nemo_gym.setup_nemo_gym_config(config, tokenizer) None#
nemo_rl.environments.nemo_gym.nemo_gym_example_to_nemo_rl_datum_spec(
nemo_gym_example: dict,
idx: int,
) nemo_rl.data.interfaces.DatumSpec#