nemo_rl.data.processors#

Contains data processors for evaluation.

Module Contents#

Functions#

helpsteer3_data_processor

Process a HelpSteer3 preference datum into a DatumSpec for GRPO training.

sft_processor

Process a datum dictionary for SFT training.

preference_preprocessor

Process a datum dictionary for RM/DPO training.

math_data_processor

Process a datum dictionary (directly loaded from dataset) into a DatumSpec for the Math Environment.

math_gdpo_data_processor

Process a datum dictionary (directly loaded from data/hf_datasets/openmathinstruct2.py) into a DatumSpec for the Reward Model Environment.

math_hf_data_processor

Process a datum dictionary (directly loaded from data/hf_datasets/openmathinstruct2.py) into a DatumSpec for the Reward Model Environment.

vlm_hf_data_processor

Process a datum dictionary (directly loaded from response_datasets/<dataset_name>.py) into a DatumSpec for the VLM Environment.

_construct_multichoice_prompt

Construct prompt from question and options.

multichoice_qa_processor

Process a datum dictionary (directly loaded from dataset) into a DatumSpec for multiple-choice problems.

nemo_gym_data_processor

Process a datum dictionary (directly loaded from dataset) into a DatumSpec for Nemo Gym.

register_processor

Data#

API#

nemo_rl.data.processors.TokenizerType#

None

nemo_rl.data.processors.helpsteer3_data_processor(
datum_dict: dict[str, Any],
task_data_spec: nemo_rl.data.interfaces.TaskDataSpec,
tokenizer: nemo_rl.data.processors.TokenizerType,
max_seq_length: int,
idx: int,
) nemo_rl.data.interfaces.DatumSpec#

Process a HelpSteer3 preference datum into a DatumSpec for GRPO training.

This function converts HelpSteer3 preference data to work with GRPO by:

  1. Using the context as the prompt

  2. Using the preferred completion as the target response

  3. Creating a reward signal based on preference scores

nemo_rl.data.processors.sft_processor(
datum_dict: dict[str, Any],
task_data_spec: nemo_rl.data.interfaces.TaskDataSpec,
tokenizer,
max_seq_length: int,
idx: int,
add_bos: bool = True,
add_eos: bool = True,
add_generation_prompt: bool = False,
) nemo_rl.data.interfaces.DatumSpec#

Process a datum dictionary for SFT training.

nemo_rl.data.processors.preference_preprocessor(
datum_dict: dict[str, Any],
task_data_spec: nemo_rl.data.interfaces.TaskDataSpec,
tokenizer,
max_seq_length: int,
idx: int,
) nemo_rl.data.interfaces.PreferenceDatumSpec#

Process a datum dictionary for RM/DPO training.

.. rubric:: Examples

>>> from transformers import AutoTokenizer
>>> from nemo_rl.data.interfaces import TaskDataSpec
>>> from nemo_rl.data.processors import preference_preprocessor
>>>
>>> # Initialize tokenizer and task spec
>>> tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B-Instruct")
>>> ## set a passthrough chat template for simplicity
>>> tokenizer.chat_template = "{% for message in messages %}{{ message['content'] }}{% endfor %}"
>>> task_spec = TaskDataSpec(task_name="test_preference")
>>>
>>> datum = {
...     "context": [{"role": "user", "content": "What is 2+2?"}],
...     "completions": [
...         {"rank": 0, "completion": [{"role": "assistant", "content": "4"}]},
...         {"rank": 1, "completion": [{"role": "assistant", "content": "5"}]}
...     ]
... }
>>>
>>> processed = preference_preprocessor(datum, task_spec, tokenizer, max_seq_length=128, idx=0)
>>> len(processed["message_log_chosen"])
2
>>> processed["message_log_chosen"][0]["content"]
'<|begin_of_text|>What is 2+2?'
>>> processed["message_log_chosen"][-1]["content"]
'4<|eot_id|>'
>>> processed["message_log_rejected"][-1]["content"]
'5<|eot_id|>'
>>>
>>> # context can also contain multiple turns
>>> datum = {
...     "context": [{"role": "user", "content": "I have a question."}, {"role": "assistant", "content": "Sure!"}, {"role": "user", "content": "What is 2+2?"}],
...     "completions": [
...         {"rank": 0, "completion": [{"role": "assistant", "content": "4"}]},
...         {"rank": 1, "completion": [{"role": "assistant", "content": "5"}]}
...     ]
... }
>>> processed = preference_preprocessor(datum, task_spec, tokenizer, max_seq_length=128, idx=0)
>>> len(processed["message_log_chosen"])
4
>>> processed["message_log_chosen"][1]["content"]
'Sure!'
>>> processed["message_log_chosen"][-1]["content"]
'4<|eot_id|>'
>>> processed["message_log_rejected"][-1]["content"]
'5<|eot_id|>'
nemo_rl.data.processors.math_data_processor(
datum_dict: dict[str, Any],
task_data_spec: nemo_rl.data.interfaces.TaskDataSpec,
tokenizer: nemo_rl.data.processors.TokenizerType,
max_seq_length: int,
idx: int,
) nemo_rl.data.interfaces.DatumSpec#

Process a datum dictionary (directly loaded from dataset) into a DatumSpec for the Math Environment.

nemo_rl.data.processors.math_gdpo_data_processor(
datum_dict: dict[str, Any],
task_data_spec: nemo_rl.data.interfaces.TaskDataSpec,
tokenizer: nemo_rl.data.processors.TokenizerType,
max_seq_length: int,
idx: int,
) nemo_rl.data.interfaces.DatumSpec#

Process a datum dictionary (directly loaded from data/hf_datasets/openmathinstruct2.py) into a DatumSpec for the Reward Model Environment.

nemo_rl.data.processors.math_hf_data_processor(
datum_dict: dict[str, Any],
task_data_spec: nemo_rl.data.interfaces.TaskDataSpec,
tokenizer: nemo_rl.data.processors.TokenizerType,
max_seq_length: int,
idx: int,
) nemo_rl.data.interfaces.DatumSpec#

Process a datum dictionary (directly loaded from data/hf_datasets/openmathinstruct2.py) into a DatumSpec for the Reward Model Environment.

nemo_rl.data.processors.vlm_hf_data_processor(
datum_dict: dict[str, Any],
task_data_spec: nemo_rl.data.interfaces.TaskDataSpec,
processor: transformers.AutoProcessor,
max_seq_length: int,
idx: int,
) nemo_rl.data.interfaces.DatumSpec#

Process a datum dictionary (directly loaded from response_datasets/<dataset_name>.py) into a DatumSpec for the VLM Environment.

nemo_rl.data.processors._construct_multichoice_prompt(
prompt: str,
question: str,
options: dict[str, str],
) str#

Construct prompt from question and options.

nemo_rl.data.processors.multichoice_qa_processor(
datum_dict: dict[str, Any],
task_data_spec: nemo_rl.data.interfaces.TaskDataSpec,
tokenizer: nemo_rl.data.processors.TokenizerType,
max_seq_length: int,
idx: int,
) nemo_rl.data.interfaces.DatumSpec#

Process a datum dictionary (directly loaded from dataset) into a DatumSpec for multiple-choice problems.

nemo_rl.data.processors.nemo_gym_data_processor(
datum_dict: dict[str, Any],
task_data_spec: nemo_rl.data.interfaces.TaskDataSpec,
tokenizer: nemo_rl.data.processors.TokenizerType,
max_seq_length: int | None,
idx: int,
) nemo_rl.data.interfaces.DatumSpec#

Process a datum dictionary (directly loaded from dataset) into a DatumSpec for Nemo Gym.

nemo_rl.data.processors.PROCESSOR_REGISTRY: Dict[str, nemo_rl.data.interfaces.TaskDataProcessFnCallable]#

‘cast(…)’

nemo_rl.data.processors.register_processor(
processor_name: str,
processor_function: nemo_rl.data.interfaces.TaskDataProcessFnCallable,
) None#