nemo_rl.data.datasets.response_datasets.oasst#

Module Contents#

Classes#

OasstDataset

Simple wrapper around the OASST dataset.

Functions#

parse_conversations

Recusive function that returns all the sub converstaions in a list starting from node tree_obj.

get_data_records

Data#

API#

nemo_rl.data.datasets.response_datasets.oasst.SYSTEM_PROMPT = <Multiline-String>#
nemo_rl.data.datasets.response_datasets.oasst.parse_conversations(tree_obj, first: bool = False)#

Recusive function that returns all the sub converstaions in a list starting from node tree_obj.

Parameters:

tree_obj (obj) – current conversation node

Returns:

a list of sub conversation threads including the current conversation node

nemo_rl.data.datasets.response_datasets.oasst.get_data_records(objs, task_name: str = 'oasst')#
class nemo_rl.data.datasets.response_datasets.oasst.OasstDataset(
split_validation_size: float = 0.05,
seed: int = 42,
**kwargs,
)#

Bases: nemo_rl.data.datasets.raw_dataset.RawDataset

Simple wrapper around the OASST dataset.

Parameters:
  • split_validation_size – Size of the validation data, default is 0.05

  • seed – Seed for train/validation split when split_validation_size > 0, default is 42

Initialization