nemo_rl.environments.dapo_math_verifier#

Module Contents#

Functions#

last_boxed_only_string

Extract the last LaTeX boxed expression from a string.

remove_boxed

Remove the LaTeX boxed command from a string.

normalize_final_answer

Normalize a final answer to a quantitative reasoning question.

is_correct_minerva

Check if the solution is correct according to Minerva criteria.

is_correct_strict_box

Check if the prediction is correct using strict boxed answer criteria.

verify

Verify if the solution is correct.

compute_score

Compute the reward score for a solution.

Data#

API#

nemo_rl.environments.dapo_math_verifier.last_boxed_only_string(string: str) Optional[str][source]#

Extract the last LaTeX boxed expression from a string.

Parameters:

string – Input string containing LaTeX code

Returns:

The last boxed expression or None if not found

nemo_rl.environments.dapo_math_verifier.remove_boxed(s: str) str[source]#

Remove the LaTeX boxed command from a string.

Parameters:

s – String with format β€œ\boxed{content}”

Returns:

The content inside the boxed command

nemo_rl.environments.dapo_math_verifier.SUBSTITUTIONS#

[(β€˜an β€˜, β€˜β€™), (β€˜a β€˜, β€˜β€™), (β€˜.\(', '\)’), (β€˜\$’, β€˜β€™), (’\ β€˜, β€˜β€™), (’ β€˜, β€˜β€™), (β€˜mbox’, β€˜text’), (β€˜,\t…

nemo_rl.environments.dapo_math_verifier.REMOVED_EXPRESSIONS#

[β€˜square’, β€˜ways’, β€˜integers’, β€˜dollars’, β€˜mph’, β€˜inches’, β€˜hours’, β€˜km’, β€˜units’, β€˜\ldots’, β€˜sue’,…

nemo_rl.environments.dapo_math_verifier.normalize_final_answer(final_answer: str) str[source]#

Normalize a final answer to a quantitative reasoning question.

Parameters:

final_answer – The answer string to normalize

Returns:

Normalized answer string

nemo_rl.environments.dapo_math_verifier.is_correct_minerva(
solution_str: str,
gt: str,
gt_need_extract: bool = False,
answer_pattern: str = '(?i)Answer\\s*:\\s*([^\\n]+)',
) tuple[bool, str][source]#

Check if the solution is correct according to Minerva criteria.

Parameters:
  • solution_str – The solution string to check

  • gt – The ground truth answer

  • gt_need_extract – Whether the ground truth needs extraction

  • answer_pattern – Regex pattern to extract the answer

Returns:

Tuple of (is_correct, normalized_prediction)

nemo_rl.environments.dapo_math_verifier.is_correct_strict_box(
pred: str,
gt: str,
pause_tokens_index: Optional[list[int]] = None,
) tuple[int, Optional[str]][source]#

Check if the prediction is correct using strict boxed answer criteria.

Parameters:
  • pred – The prediction string

  • gt – The ground truth answer

  • pause_tokens_index – Indices of pause tokens

Returns:

Tuple of (score, extracted_prediction)

nemo_rl.environments.dapo_math_verifier.verify(
solution_str: str,
answer: str,
strict_box_verify: bool = False,
pause_tokens_index: Optional[list[int]] = None,
) bool[source]#

Verify if the solution is correct.

Parameters:
  • solution_str – The solution string to verify

  • answer – The ground truth answer

  • strict_box_verify – Whether to use strict box verification

  • pause_tokens_index – Indices of pause tokens

Returns:

True if the solution is correct, False otherwise

nemo_rl.environments.dapo_math_verifier.compute_score(
solution_str: str,
ground_truth: str,
strict_box_verify: bool = False,
pause_tokens_index: Optional[list[int]] = None,
) float[source]#

Compute the reward score for a solution.

Parameters:
  • solution_str – The solution string

  • ground_truth – The ground truth answer

  • strict_box_verify – Whether to use strict box verification

  • pause_tokens_index – Indices of pause tokens

Returns:

Reward score (1.0 for correct, 0.0 for incorrect)