nemo_rl.data.processors#
Contains data processors for evaluation.
Module Contents#
Functions#
Process a HelpSteer3 preference datum into a DatumSpec for GRPO training. |
|
Process a datum dictionary for SFT training. |
|
Process a datum dictionary for RM/DPO training. |
|
Process a datum dictionary (directly loaded from dataset) into a DatumSpec for the Math Environment. |
|
Process a datum dictionary (directly loaded from data/hf_datasets/openmathinstruct2.py) into a DatumSpec for the Reward Model Environment. |
|
Process a datum dictionary (directly loaded from data/hf_datasets/openmathinstruct2.py) into a DatumSpec for the Reward Model Environment. |
|
Process a datum dictionary (directly loaded from response_datasets/<dataset_name>.py) into a DatumSpec for the VLM Environment. |
|
Construct prompt from question and options. |
|
Process a datum dictionary (directly loaded from dataset) into a DatumSpec for multiple-choice problems. |
|
Process a datum dictionary (directly loaded from dataset) into a DatumSpec for Nemo Gym. |
|
Data#
API#
- nemo_rl.data.processors.TokenizerType#
None
- nemo_rl.data.processors.helpsteer3_data_processor(
- datum_dict: dict[str, Any],
- task_data_spec: nemo_rl.data.interfaces.TaskDataSpec,
- tokenizer: nemo_rl.data.processors.TokenizerType,
- max_seq_length: int,
- idx: int,
Process a HelpSteer3 preference datum into a DatumSpec for GRPO training.
This function converts HelpSteer3 preference data to work with GRPO by:
Using the context as the prompt
Using the preferred completion as the target response
Creating a reward signal based on preference scores
- nemo_rl.data.processors.sft_processor(
- datum_dict: dict[str, Any],
- task_data_spec: nemo_rl.data.interfaces.TaskDataSpec,
- tokenizer,
- max_seq_length: int,
- idx: int,
- add_bos: bool = True,
- add_eos: bool = True,
- add_generation_prompt: bool = False,
Process a datum dictionary for SFT training.
- nemo_rl.data.processors.preference_preprocessor(
- datum_dict: dict[str, Any],
- task_data_spec: nemo_rl.data.interfaces.TaskDataSpec,
- tokenizer,
- max_seq_length: int,
- idx: int,
Process a datum dictionary for RM/DPO training.
.. rubric:: Examples
>>> from transformers import AutoTokenizer >>> from nemo_rl.data.interfaces import TaskDataSpec >>> from nemo_rl.data.processors import preference_preprocessor >>> >>> # Initialize tokenizer and task spec >>> tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B-Instruct") >>> ## set a passthrough chat template for simplicity >>> tokenizer.chat_template = "{% for message in messages %}{{ message['content'] }}{% endfor %}" >>> task_spec = TaskDataSpec(task_name="test_preference") >>> >>> datum = { ... "context": [{"role": "user", "content": "What is 2+2?"}], ... "completions": [ ... {"rank": 0, "completion": [{"role": "assistant", "content": "4"}]}, ... {"rank": 1, "completion": [{"role": "assistant", "content": "5"}]} ... ] ... } >>> >>> processed = preference_preprocessor(datum, task_spec, tokenizer, max_seq_length=128, idx=0) >>> len(processed["message_log_chosen"]) 2 >>> processed["message_log_chosen"][0]["content"] '<|begin_of_text|>What is 2+2?' >>> processed["message_log_chosen"][-1]["content"] '4<|eot_id|>' >>> processed["message_log_rejected"][-1]["content"] '5<|eot_id|>' >>> >>> # context can also contain multiple turns >>> datum = { ... "context": [{"role": "user", "content": "I have a question."}, {"role": "assistant", "content": "Sure!"}, {"role": "user", "content": "What is 2+2?"}], ... "completions": [ ... {"rank": 0, "completion": [{"role": "assistant", "content": "4"}]}, ... {"rank": 1, "completion": [{"role": "assistant", "content": "5"}]} ... ] ... } >>> processed = preference_preprocessor(datum, task_spec, tokenizer, max_seq_length=128, idx=0) >>> len(processed["message_log_chosen"]) 4 >>> processed["message_log_chosen"][1]["content"] 'Sure!' >>> processed["message_log_chosen"][-1]["content"] '4<|eot_id|>' >>> processed["message_log_rejected"][-1]["content"] '5<|eot_id|>'
- nemo_rl.data.processors.math_data_processor(
- datum_dict: dict[str, Any],
- task_data_spec: nemo_rl.data.interfaces.TaskDataSpec,
- tokenizer: nemo_rl.data.processors.TokenizerType,
- max_seq_length: int,
- idx: int,
Process a datum dictionary (directly loaded from dataset) into a DatumSpec for the Math Environment.
- nemo_rl.data.processors.math_gdpo_data_processor(
- datum_dict: dict[str, Any],
- task_data_spec: nemo_rl.data.interfaces.TaskDataSpec,
- tokenizer: nemo_rl.data.processors.TokenizerType,
- max_seq_length: int,
- idx: int,
Process a datum dictionary (directly loaded from data/hf_datasets/openmathinstruct2.py) into a DatumSpec for the Reward Model Environment.
- nemo_rl.data.processors.math_hf_data_processor(
- datum_dict: dict[str, Any],
- task_data_spec: nemo_rl.data.interfaces.TaskDataSpec,
- tokenizer: nemo_rl.data.processors.TokenizerType,
- max_seq_length: int,
- idx: int,
Process a datum dictionary (directly loaded from data/hf_datasets/openmathinstruct2.py) into a DatumSpec for the Reward Model Environment.
- nemo_rl.data.processors.vlm_hf_data_processor(
- datum_dict: dict[str, Any],
- task_data_spec: nemo_rl.data.interfaces.TaskDataSpec,
- processor: transformers.AutoProcessor,
- max_seq_length: int,
- idx: int,
Process a datum dictionary (directly loaded from response_datasets/<dataset_name>.py) into a DatumSpec for the VLM Environment.
- nemo_rl.data.processors._construct_multichoice_prompt(
- prompt: str,
- question: str,
- options: dict[str, str],
Construct prompt from question and options.
- nemo_rl.data.processors.multichoice_qa_processor(
- datum_dict: dict[str, Any],
- task_data_spec: nemo_rl.data.interfaces.TaskDataSpec,
- tokenizer: nemo_rl.data.processors.TokenizerType,
- max_seq_length: int,
- idx: int,
Process a datum dictionary (directly loaded from dataset) into a DatumSpec for multiple-choice problems.
- nemo_rl.data.processors.nemo_gym_data_processor(
- datum_dict: dict[str, Any],
- task_data_spec: nemo_rl.data.interfaces.TaskDataSpec,
- tokenizer: nemo_rl.data.processors.TokenizerType,
- max_seq_length: int | None,
- idx: int,
Process a datum dictionary (directly loaded from dataset) into a DatumSpec for Nemo Gym.
- nemo_rl.data.processors.PROCESSOR_REGISTRY: Dict[str, nemo_rl.data.interfaces.TaskDataProcessFnCallable]#
‘cast(…)’
- nemo_rl.data.processors.register_processor(
- processor_name: str,
- processor_function: nemo_rl.data.interfaces.TaskDataProcessFnCallable,