aistore.sdk.dataset.dataset_config
aistore.sdk.dataset.dataset_config
Module Contents
Classes
API
Represents the configuration for managing datasets, particularly focusing on how data attributes are structured
Parameters:
primary_attribute
The primary key used for looking up any secondary_attributes will be determined by the filename of each sample defined by primary_attribute
secondary_attributes
A list of configurations for each attribute or feature in the dataset
secondary_attributes
staticmethod
Get a key string for an item in webdataset format
Generate a dataset in webdataset format
Parameters:
max_shard_items
The maximum number of items to include in a shard
Write the dataset to a bucket in webdataset format and log the missing attributes
Parameters:
skip_missing
Skip samples that are missing one or more attributes, defaults to True
**kwargs
Additional arguments to pass to the webdataset writer