Dataset#
- class nemo_microservices.types.Dataset(*args: Any, **kwargs: Any)
Bases:
BaseModel
- files_url: str
The location where the artifact files are stored.
This can be a URL pointing to NDS, Hugging Face, S3, or any other accessible resource location.
- id: str | None = None
The ID of the entity.
With the exception of namespaces, this is always a semantically-prefixed base58-encoded uuid4 [<prefix>-base58(uuid4())].
- created_at: datetime | None = None
Timestamp for when the entity was created.
- custom_fields: Dict[str, str] | None = None
A set of custom fields that the user can define and use for various purposes.
- description: str | None = None
The description of the entity.
- format: str | None = None
Specifies the dataset format, referring to the schema of the dataset rather than the file format. Examples include SQuAD, BEIR, etc.
- hf_endpoint: str | None = None
For HuggingFace URLs, the endpoint that should be used.
By default, this is set to the Data Store URL. For HuggingFace Hub, this should be set to “https://huggingface.co”.
- limit: int | None = None
The maximum number of items to be used from the dataset.
- name: str | None = None
The name of the entity.
Must be unique inside the namespace. If not specified, it will be the same as the automatically generated id.
- namespace: str | None = None
The namespace of the entity.
This can be missing for namespace entities or in deployments that don’t use namespaces.
- project: str | None = None
The URN of the project associated with this entity.
- split: str | None = None
The split of the dataset. Examples include train, validation, test, etc.
- updated_at: datetime | None = None
Timestamp for when the entity was last updated.