nemo_rl.environments.rewards
#
Module Contents#
Functions#
Reward the agent when the answer within the <{tag}> tags is the same expression as the ground truth. |
|
Reward the agent when the response follows the format: (.) |
|
Reward the agent when the answer within the <{answer_tag}> tags is the same as the ground truth (case-insensitive). |
|
Given [x1, y1, x2, y2] normalized bounding box coordinates within the <{answer_tag}> tags, compute the GIoU between the ground truth and the response. |
|
Returns a callable function that takes (ground_truth, response) and collects multiple reward functions in sequence. |
Data#
API#
- nemo_rl.environments.rewards.math_verify_func#
βmath_metric(β¦)β
- nemo_rl.environments.rewards.boxed#
None
- nemo_rl.environments.rewards.math_expression_reward(
- ground_truth: str,
- response: str,
- tag: str = 'answer',
Reward the agent when the answer within the <{tag}> tags is the same expression as the ground truth.
The
tag
is customizable and must be specified as part of the user COT prompt text file.
- nemo_rl.environments.rewards.format_reward(
- ground_truth: str,
- response: str,
- think_tag: str = 'think',
- answer_tag: str = 'answer',
Reward the agent when the response follows the format: (.)
(. )(.*) .The
think_tag
andanswer_tag
are customizable and must be specified as part of the user COT prompt text file.
- nemo_rl.environments.rewards.exact_answer_alphanumeric_reward(
- ground_truth: str,
- response: str,
- answer_tag: str = 'answer',
Reward the agent when the answer within the <{answer_tag}> tags is the same as the ground truth (case-insensitive).
The
answer_tag
is customizable and must be specified as part of the user COT prompt text file.
- nemo_rl.environments.rewards.bbox_giou_reward(
- ground_truth: str,
- response: str,
- giou_penalty_thres: float = 10.0,
- answer_tag: str = 'answer',
Given [x1, y1, x2, y2] normalized bounding box coordinates within the <{answer_tag}> tags, compute the GIoU between the ground truth and the response.
The
answer_tag
is customizable and must be specified as part of the user COT prompt text file.
- nemo_rl.environments.rewards.combine_reward_functions(
- reward_functions: list[tuple[Callable[[str, str], tuple[float, bool]], float]],
Returns a callable function that takes (ground_truth, response) and collects multiple reward functions in sequence.
The reward functions are weighted by the second element of the tuple. This information can be provided in the YAML config file and resolved in the VLMEnvironment class.
- Parameters:
reward_functions β list[tuple[Callable[[str, str], tuple[float, bool]], float]]. A list of reward functions and their weights.
- Returns:
A callable function that takes (ground_truth, response) and collects multiple reward functions in sequence
- Return type:
Callable[[str, str], tuple[float, bool]]