`nemo_rl.data.processors`#

Contains data processors for evaluation.

Module Contents#

Functions#

`helpsteer3_data_processor`	Process a HelpSteer3 preference datum into a DatumSpec for GRPO training.
`sft_processor`	Process a datum dictionary for SFT training.
`preference_preprocessor`	Process a datum dictionary for RM/DPO training.
`math_data_processor`	Process a datum dictionary (directly loaded from dataset) into a DatumSpec for the Math Environment.
`math_hf_data_processor`	Process a datum dictionary (directly loaded from data/hf_datasets/openmathinstruct2.py) into a DatumSpec for the Reward Model Environment.
`vlm_hf_data_processor`	Process a datum dictionary (directly loaded from response_datasets/<dataset_name>.py) into a DatumSpec for the VLM Environment.
`_construct_multichoice_prompt`	Construct prompt from question and options.
`multichoice_qa_processor`	Process a datum dictionary (directly loaded from dataset) into a DatumSpec for multiple-choice problems.
`nemo_gym_data_processor`	Process a datum dictionary (directly loaded from dataset) into a DatumSpec for Nemo Gym.
`kd_data_processor`	Process a raw-text datum for cross-tokenizer distillation.
`register_processor`

Data#

`TokenizerType`
`PROCESSOR_REGISTRY`

API#

nemo_rl.data.processors.TokenizerType#: None

nemo_rl.data.processors.helpsteer3_data_processor( datum_dict: dict[str, Any], task_data_spec: nemo_rl.data.interfaces.TaskDataSpec, tokenizer: nemo_rl.data.processors.TokenizerType, max_seq_length: int, idx: int, ) → nemo_rl.data.interfaces.DatumSpec#

Process a HelpSteer3 preference datum into a DatumSpec for GRPO training.

This function converts HelpSteer3 preference data to work with GRPO by:

Using the context as the prompt
Using the preferred completion as the target response
Creating a reward signal based on preference scores

nemo_rl.data.processors.sft_processor( datum_dict: dict[str, Any], task_data_spec: nemo_rl.data.interfaces.TaskDataSpec, tokenizer, max_seq_length: int, idx: int, add_bos: bool = True, add_eos: bool = True, add_generation_prompt: bool = False, ) → nemo_rl.data.interfaces.DatumSpec#: Process a datum dictionary for SFT training.

nemo_rl.data.processors.preference_preprocessor( datum_dict: dict[str, Any], task_data_spec: nemo_rl.data.interfaces.TaskDataSpec, tokenizer, max_seq_length: int, idx: int, ) → nemo_rl.data.interfaces.PreferenceDatumSpec#

Process a datum dictionary for RM/DPO training.

.. rubric:: Examples

>>> from transformers import AutoTokenizer
>>> from nemo_rl.data.interfaces import TaskDataSpec
>>> from nemo_rl.data.processors import preference_preprocessor
>>>
>>> # Initialize tokenizer and task spec
>>> tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B-Instruct")
>>> ## set a passthrough chat template for simplicity
>>> tokenizer.chat_template = "{% for message in messages %}{{ message['content'] }}{% endfor %}"
>>> task_spec = TaskDataSpec(task_name="test_preference")
>>>
>>> datum = {
...     "context": [{"role": "user", "content": "What is 2+2?"}],
...     "completions": [
...         {"rank": 0, "completion": [{"role": "assistant", "content": "4"}]},
...         {"rank": 1, "completion": [{"role": "assistant", "content": "5"}]}
...     ]
... }
>>>
>>> processed = preference_preprocessor(datum, task_spec, tokenizer, max_seq_length=128, idx=0)
>>> len(processed["message_log_chosen"])
2
>>> processed["message_log_chosen"][0]["content"]
'<|begin_of_text|>What is 2+2?'
>>> processed["message_log_chosen"][-1]["content"]
'4<|eot_id|>'
>>> processed["message_log_rejected"][-1]["content"]
'5<|eot_id|>'
>>>
>>> # context can also contain multiple turns
>>> datum = {
...     "context": [{"role": "user", "content": "I have a question."}, {"role": "assistant", "content": "Sure!"}, {"role": "user", "content": "What is 2+2?"}],
...     "completions": [
...         {"rank": 0, "completion": [{"role": "assistant", "content": "4"}]},
...         {"rank": 1, "completion": [{"role": "assistant", "content": "5"}]}
...     ]
... }
>>> processed = preference_preprocessor(datum, task_spec, tokenizer, max_seq_length=128, idx=0)
>>> len(processed["message_log_chosen"])
4
>>> processed["message_log_chosen"][1]["content"]
'Sure!'
>>> processed["message_log_chosen"][-1]["content"]
'4<|eot_id|>'
>>> processed["message_log_rejected"][-1]["content"]
'5<|eot_id|>'

nemo_rl.data.processors.math_data_processor( datum_dict: dict[str, Any], task_data_spec: nemo_rl.data.interfaces.TaskDataSpec, tokenizer: nemo_rl.data.processors.TokenizerType, max_seq_length: int, idx: int, ) → nemo_rl.data.interfaces.DatumSpec#: Process a datum dictionary (directly loaded from dataset) into a DatumSpec for the Math Environment.

nemo_rl.data.processors.math_hf_data_processor( datum_dict: dict[str, Any], task_data_spec: nemo_rl.data.interfaces.TaskDataSpec, tokenizer: nemo_rl.data.processors.TokenizerType, max_seq_length: int, idx: int, ) → nemo_rl.data.interfaces.DatumSpec#: Process a datum dictionary (directly loaded from data/hf_datasets/openmathinstruct2.py) into a DatumSpec for the Reward Model Environment.

nemo_rl.data.processors.vlm_hf_data_processor( datum_dict: dict[str, Any], task_data_spec: nemo_rl.data.interfaces.TaskDataSpec, processor: transformers.AutoProcessor, max_seq_length: int, idx: int, ) → nemo_rl.data.interfaces.DatumSpec#: Process a datum dictionary (directly loaded from response_datasets/<dataset_name>.py) into a DatumSpec for the VLM Environment.

nemo_rl.data.processors._construct_multichoice_prompt( prompt: str, question: str, options: dict[str, str], ) → str#: Construct prompt from question and options.

nemo_rl.data.processors.multichoice_qa_processor( datum_dict: dict[str, Any], task_data_spec: nemo_rl.data.interfaces.TaskDataSpec, tokenizer: nemo_rl.data.processors.TokenizerType, max_seq_length: int, idx: int, ) → nemo_rl.data.interfaces.DatumSpec#: Process a datum dictionary (directly loaded from dataset) into a DatumSpec for multiple-choice problems.

nemo_rl.data.processors.nemo_gym_data_processor( datum_dict: dict[str, Any], task_data_spec: nemo_rl.data.interfaces.TaskDataSpec, tokenizer: nemo_rl.data.processors.TokenizerType, max_seq_length: int | None, idx: int, ) → nemo_rl.data.interfaces.DatumSpec#: Process a datum dictionary (directly loaded from dataset) into a DatumSpec for Nemo Gym.

nemo_rl.data.processors.kd_data_processor( datum_dict: dict[str, Any], task_data_spec: nemo_rl.data.interfaces.TaskDataSpec, tokenizer: nemo_rl.data.processors.TokenizerType, max_seq_length: int | None, idx: int, ) → nemo_rl.data.interfaces.DatumSpec#

Process a raw-text datum for cross-tokenizer distillation.

Tokenization is deferred to the collator, so the text is carried forward as a single assistant message in message_log.

nemo_rl.data.processors.PROCESSOR_REGISTRY: Dict[str, nemo_rl.data.interfaces.TaskDataProcessFnCallable]#: ‘cast(…)’

nemo_rl.data.processors.register_processor( processor_name: str, processor_function: nemo_rl.data.interfaces.TaskDataProcessFnCallable, ) → None#

nemo_rl.data.processors#

Module Contents#

Functions#

Data#

API#

`nemo_rl.data.processors`#