Dataset#

class nemo_microservices.types.Dataset(*args: Any, **kwargs: Any)

Bases: BaseModel

files_url: str

The location where the artifact files are stored.

This can be a URL pointing to NDS, Hugging Face, S3, or any other accessible resource location.

id: str | None = None

The ID of the entity.

With the exception of namespaces, this is always a semantically-prefixed base58-encoded uuid4 [<prefix>-base58(uuid4())].

created_at: datetime | None = None: Timestamp for when the entity was created.

custom_fields: Dict[str, str] | None = None: A set of custom fields that the user can define and use for various purposes.

format: str | None = None: Specifies the dataset format, referring to the schema of the dataset rather than the file format. Examples include SQuAD, BEIR, etc.

hf_endpoint: str | None = None

For HuggingFace URLs, the endpoint that should be used.

By default, this is set to the Data Store URL. For HuggingFace Hub, this should be set to “https://huggingface.co”.

limit: int | None = None: The maximum number of items to be used from the dataset.

name: str | None = None

The name of the entity.

Must be unique inside the namespace. If not specified, it will be the same as the automatically generated id.

namespace: str | None = None

The namespace of the entity.

This can be missing for namespace entities or in deployments that don’t use namespaces.

project: str | None = None: The URN of the project associated with this entity.

split: str | None = None: The split of the dataset. Examples include train, validation, test, etc.

updated_at: datetime | None = None: Timestamp for when the entity was last updated.