nemo_rl.data.datasets.utils
#
Module Contents#
Functions#
Assert that there are no double starting BOS tokens in the message. |
|
Converts a PIL Image object to a base64 encoded string. |
|
Load a dataset from a json or huggingface dataset. |
|
Get extra kwargs from the data config. |
Data#
API#
- nemo_rl.data.datasets.utils.TokenizerType#
None
- nemo_rl.data.datasets.utils.assert_no_double_bos(
- token_ids: torch.Tensor,
- tokenizer: nemo_rl.data.datasets.utils.TokenizerType,
Assert that there are no double starting BOS tokens in the message.
- Parameters:
token_ids – List of token IDs
tokenizer – Tokenizer
- nemo_rl.data.datasets.utils.pil_to_base64(image: PIL.Image.Image, format: str = 'PNG') str #
Converts a PIL Image object to a base64 encoded string.
- Parameters:
image – The PIL Image object to convert.
format – The image format (e.g., “PNG”, “JPEG”). Defaults to “PNG”.
- Returns:
A base64 encoded string representation of the image.
- nemo_rl.data.datasets.utils.load_dataset_from_path(
- data_path: str,
- data_split: Optional[str] = 'train',
Load a dataset from a json or huggingface dataset.
- Parameters:
data_path – The path to the dataset.
data_split – The split to load from the dataset.
- nemo_rl.data.datasets.utils.get_extra_kwargs(data_config: dict, keys: list[str]) dict #
Get extra kwargs from the data config.
If the key is not in the data config, it will be ignored.
- Parameters:
data_config – The data config.
keys – The keys to get from the data config.
- Returns:
The extra kwargs.