aistore.sdk.bucket
Module Contents
Classes
API
Bases: AISSource
A class representing a bucket that contains user data.
Parameters:
Client for interfacing with AIS cluster
name of bucket
Provider of bucket (one of “ais”, “aws”, “gcp”, …), defaults to “ais”
Namespace of bucket, defaults to None
The client used by this bucket.
The name of this bucket.
The namespace for this bucket.
The provider for this bucket.
Default query parameters to use with API calls from this bucket.
Verify the bucket provider is AIS
Return a data-model of the bucket
Returns: BucketModel
BucketModel representation
Returns job ID that can be used later to check the status of the asynchronous operation.
Parameters:
Destination bucket
Only copy objects with names starting with this prefix
Value to prepend to the name of copied objects
Dict mapping each extension to the extension that will replace it (e.g. {“jpg”: “txt”})
Determines if the copy should actually happen or not
Override existing destination bucket
GET the latest object version from the associated remote bucket
synchronize destination bucket with its remote (e.g., Cloud or remote AIS) source
Number of concurrent workers for the copy job per target
- 0 (default): number of mountpaths
- -1: single thread, serial execution
Returns: str
Job ID (as str) that can be used to check the status of the operation
Raises:
aistore.sdk.errors.AISError: All other types of errors with AIStorerequests.ConnectionError: Connection errorrequests.ConnectionTimeout: Timed out connecting to AIStorerequests.exceptions.HTTPError: Service unavailablerequests.RequestException: “There was an ambiguous exception that occurred while handling…”requests.ReadTimeout: Timed out receiving response from AIStore
Creates a bucket in AIStore cluster. Can only create a bucket for AIS provider on localized cluster. Remote cloud buckets do not support creation.
Parameters:
Ignore error if the cluster already contains this bucket
Raises:
aistore.sdk.errors.AISError: All other types of errors with AIStoreaistore.sdk.errors.InvalidBckProvider: Invalid bucket provider for requested operationrequests.ConnectionError: Connection errorrequests.ConnectionTimeout: Timed out connecting to AIStorerequests.exceptions.HTTPError: Service unavailablerequests.RequestException: “There was an ambiguous exception that occurred while handling…”requests.ReadTimeout: Timed out receiving response from AIStore
Create a native bucket inventory (NBI) — a pre-computed snapshot of the bucket’s object listing stored as chunked inventory files.
Parameters:
Inventory name (must be unique per bucket). Auto-generated if empty.
Only inventory objects matching this prefix.
Comma-separated object properties to include.
Default: “name,size,cached” (see Go createNBIHandler).
Number of object names per inventory
chunk. 0 means use the server default.
See Go api/apc/nbi.go: MinInvNamesPerChunk (2),
DfltInvNamesPerChunk (20K), MaxInvNamesPerChunk (640K).
If True, remove any existing inventories for this bucket before creating a new one.
Returns: str
Job ID (xaction ID) for monitoring the inventory creation.
Destroys bucket in AIStore cluster. In all cases removes both the bucket’s content and the bucket’s metadata from the cluster. Note: AIS will not call the remote backend provider to delete the corresponding Cloud bucket (iff the bucket in question is, in fact, a Cloud bucket).
Parameters:
Ignore error if bucket does not exist
Raises:
aistore.sdk.errors.AISError: All other types of errors with AIStoreaistore.sdk.errors.InvalidBckProvider: Invalid bucket provider for requested operationrequests.ConnectionError: Connection errorrequests.ConnectionTimeout: Timed out connecting to AIStorerequests.exceptions.HTTPError: Service unavailablerequests.RequestException: “There was an ambiguous exception that occurred while handling…”requests.ReadTimeout: Timed out receiving response from AIStore
Destroy a native bucket inventory. If no name is specified, destroys all inventories for this bucket.
Parameters:
Inventory name to destroy. If empty, all inventories for this bucket are destroyed.
Evicts bucket in AIStore cluster. NOTE: only Cloud buckets can be evicted.
Parameters:
If true, evicts objects but keeps the bucket’s metadata (i.e., the bucket’s name and its properties)
Raises:
aistore.sdk.errors.AISError: All other types of errors with AIStoreaistore.sdk.errors.InvalidBckProvider: Invalid bucket provider for requested operationrequests.ConnectionError: Connection errorrequests.ConnectionTimeout: Timed out connecting to AIStorerequests.exceptions.HTTPError: Service unavailablerequests.RequestException: “There was an ambiguous exception that occurred while handling…”requests.ReadTimeout: Timed out receiving response from AIStore
Get the path representation of this bucket
Requests bucket properties.
Returns: CaseInsensitiveDict
Response header with the bucket properties
Raises:
aistore.sdk.errors.AISError: All other types of errors with AIStorerequests.ConnectionError: Connection errorrequests.ConnectionTimeout: Timed out connecting to AIStorerequests.exceptions.HTTPError: Service unavailablerequests.RequestException: “There was an ambiguous exception that occurred while handling…”requests.ReadTimeout: Timed out receiving response from AIStore
Returns bucket summary and information/properties.
Parameters:
Describes the presence of buckets and objects with respect to their existence or non-existence in the AIS cluster using the enum FLTPresence. Defaults to value FLT_EXISTS and values are: FLT_EXISTS - object or bucket exists inside and/or outside cluster FLT_EXISTS_NO_PROPS - same as FLT_EXISTS but no need to return summary FLT_PRESENT - bucket is present or object is present and properly located FLT_PRESENT_NO_PROPS - same as FLT_PRESENT but no need to return summary FLT_PRESENT_CLUSTER - objects present anywhere/how in the cluster as replica, ec-slices, misplaced FLT_EXISTS_OUTSIDE - not present; exists outside cluster
If True, returned bucket info will include remote objects as well
Only include objects with the given prefix in the bucket
Raises:
UnexpectedHTTPStatusCode: If the response status code is not as expectedrequests.ConnectionError: Connection errorrequests.ConnectionTimeout: Timed out connecting to AIStorerequests.exceptions.HTTPError: Service unavailablerequests.RequestException: “There was an ambiguous exception that occurred while handling…”requests.ReadTimeout: Timed out receiving response from AIStoreValueError:flt_presenceis not one of the expected valuesaistore.sdk.errors.AISError: All other types of errors with AIStore
Returns a list of all objects in bucket
Parameters:
return only objects that start with the prefix
comma-separated list of object properties to return. Default value is “name,size”. Properties: “name”, “size”, “atime”, “version”, “checksum”, “cached”, “target_url”, “status”, “copies”, “ec”, “custom”, “node”.
return at most “page_size” objects The maximum number of objects in response depends on the bucket backend. E.g, AWS bucket cannot return more than 5,000 objects in a single page. NOTE: If “page_size” is greater than a backend maximum, the backend maximum objects are returned. Defaults to “0” - return maximum number objects
Optional list of ListObjectFlag enums to include as flags in the request
Only list objects on this specific target node
Name of a native bucket inventory (NBI) to list from.
See list_objects for details.
Returns: List[BucketEntry]
List[BucketEntry]: list of objects in bucket
Raises:
aistore.sdk.errors.AISError: All other types of errors with AIStorerequests.ConnectionError: Connection errorrequests.ConnectionTimeout: Timed out connecting to AIStorerequests.exceptions.HTTPError: Service unavailablerequests.RequestException: “There was an ambiguous exception that occurred while handling…”requests.ReadTimeout: Timed out receiving response from AIStore
Implementation of the abstract method from AISSource that provides an iterator of all the objects in this bucket matching the specified prefix.
Parameters:
Limit objects selected by a given string prefix
Comma-separated list of object properties to return. Default value is “name,size”. Properties: “name”, “size”, “atime”, “version”, “checksum”, “target_url”, “copies”.
List files contained in an archived object (*.tar, *.zip, *.tgz, etc.).
This is a convenience wrapper around list_all_objects that
automatically enables the ARCH_DIR list-flag so the cluster opens
the shard and returns its directory.
Parameters:
Object key of the shard inside this bucket
(e.g. "my-archive.tar"). Can include a prefix path.
If True the returned
list includes the parent archive object itself. When
False (default) only the entries inside the shard are
returned.
Comma-separated list of object properties to
request. Defaults to "" (no properties).
Same meaning as in
list_all_objects – how many names per internal page.
Returns: List[BucketEntry]
List[BucketEntry]: Entries representing the shard (optionally) and every file stored inside it.
Returns a structure that contains a page of objects, job ID, and continuation token (to read the next page, if available).
Parameters:
Return only objects that start with the prefix
Comma-separated list of object properties to return. Default value is “name,size”. Properties: “name”, “size”, “atime”, “version”, “checksum”, “cached”, “target_url”, “status”, “copies”, “ec”, “custom”, “node”.
Return at most “page_size” objects. The maximum number of objects in response depends on the bucket backend. E.g, AWS bucket cannot return more than 5,000 objects in a single page. NOTE: If “page_size” is greater than a backend maximum, the backend maximum objects are returned. Defaults to “0” - return maximum number of objects.
Job ID, required to get the next page of objects
Marks the object to start reading the next page
Optional list of ListObjectFlag enums to include as flags in the request.
Only list objects on this specific target node.
Name of a native bucket inventory (NBI) to list from.
Lists objects from the named inventory snapshot instead of querying the remote
backend. Requires a previously created inventory (see create_inventory).
Alternatively, to list without specifying a name pass
flags=[ListObjectFlag.NBI] (valid only when exactly one inventory exists).
Returns: BucketList
the page of objects in the bucket and the continuation token to get the next page
Raises:
aistore.sdk.errors.AISError: All other types of errors with AIStorerequests.ConnectionError: Connection errorrequests.ConnectionTimeout: Timed out connecting to AIStorerequests.exceptions.HTTPError: Service unavailablerequests.RequestException: “There was an ambiguous exception that occurred while handling…”requests.ReadTimeout: Timed out receiving response from AIStore
Returns an iterator for all objects in bucket
Parameters:
Return only objects that start with the prefix
Comma-separated list of object properties to return. Default value is “name,size”. Properties: “name”, “size”, “atime”, “version”, “checksum”, “cached”, “target_url”, “status”, “copies”, “ec”, “custom”, “node”.
return at most “page_size” objects The maximum number of objects in response depends on the bucket backend. E.g, AWS bucket cannot return more than 5,000 objects in a single page. NOTE: If “page_size” is greater than a backend maximum, the backend maximum objects are returned. Defaults to “0” - return maximum number objects
Optional list of ListObjectFlag enums to include as flags in the request
Only list objects on this specific target node
Name of a native bucket inventory (NBI) to list from.
See list_objects for details.
Returns: ObjectIterator
object iterator
Raises:
aistore.sdk.errors.AISError: All other types of errors with AIStorerequests.ConnectionError: Connection errorrequests.ConnectionTimeout: Timed out connecting to AIStorerequests.exceptions.HTTPError: Service unavailablerequests.RequestException: “There was an ambiguous exception that occurred while handling…”requests.ReadTimeout: Timed out receiving response from AIStore
Generates full URLs for all objects in the bucket that match the specified prefix.
Parameters:
A string prefix to filter objects. Only objects with names starting with this prefix will be included. Defaults to an empty string (no filtering).
An optional ETL configuration. If provided, the URLs will include ETL processing parameters. Defaults to None.
Use the bucket’s client to make a request to the bucket endpoint on the AIS server
Parameters:
HTTP method to use, e.g. POST/GET/DELETE
Action string used to create an ActionMsg to pass to the server
Additional value parameter to pass in the ActionMsg
Optional parameters to pass in the request
Name parameter to pass in the ActionMsg
Returns: requests.Response
Response from the server
Factory constructor for an object in this bucket. Does not make any HTTP request, only instantiates an object in a bucket owned by the client.
Parameters:
Name of object
Properties of the object, as updated by head(), optionally pre-initialized.
Returns: Object
The object created.
Factory constructor for multiple objects belonging to this bucket.
Parameters:
Names of objects to include in the group
Range of objects to include in the group
String template defining objects to include in the group
Returns: ObjectGroup
The ObjectGroup created
Puts files found in a given filepath as objects to a bucket in AIS storage.
Parameters:
Local filepath, can be relative or absolute
Only put files with names starting with this prefix
Shell-style wildcard pattern to filter files
Whether to use the file names only as object names and omit the path information
Optional string to use as a prefix in the object name for all objects uploaded No delimiter (”/”, ”-”, etc.) is automatically applied between the prepend value and the object name
Whether to recurse through the provided path directories
Option to only show expected behavior without an actual put operation
Whether to print upload info to standard output
Returns: List[str]
List of object names put to a bucket in AIS
Raises:
requests.RequestException: “There was an ambiguous exception that occurred while handling…”requests.ConnectionError: Connection errorrequests.ConnectionTimeout: Timed out connecting to AIStorerequests.ReadTimeout: Timed out waiting response from AIStoreValueError: The path provided is not a valid directory
Renames bucket in AIStore cluster. Only works on AIS buckets. Returns job ID that can be used later to check the status of the asynchronous operation.
Parameters:
New bucket name for bucket to be renamed as
Returns: str
Job ID (as str) that can be used to check the status of the operation
Raises:
aistore.sdk.errors.AISError: All other types of errors with AIStoreaistore.sdk.errors.InvalidBckProvider: Invalid bucket provider for requested operationrequests.ConnectionError: Connection errorrequests.ConnectionTimeout: Timed out connecting to AIStorerequests.exceptions.HTTPError: Service unavailablerequests.RequestException: “There was an ambiguous exception that occurred while handling…”requests.ReadTimeout: Timed out receiving response from AIStore
Show native bucket inventory metadata.
Parameters:
Inventory name to query. If empty, returns all inventories for this bucket.
Returns: Dict[str, NBIInfo]
Dict[str, NBIInfo]: Mapping of inventory object name to its metadata.
Returns bucket summary (starts xaction job and polls for results).
Parameters:
Identifier for the bucket summary. Defaults to an empty string.
Prefix for objects to be included in the bucket summary. Defaults to an empty string (all objects).
If True, summary entails cached entities. Defaults to True.
If True, summary entails present entities. Defaults to True.
Raises:
UnexpectedHTTPStatusCode: If the response status code is not as expectedrequests.ConnectionError: Connection errorrequests.ConnectionTimeout: Timed out connecting to AIStorerequests.exceptions.HTTPError: Service unavailablerequests.RequestException: “There was an ambiguous exception that occurred while handling…”requests.ReadTimeout: Timed out receiving response from AIStoreaistore.sdk.errors.AISError: All other types of errors with AIStore
Visits all selected objects in the source bucket and for each object, puts the transformed result to the destination bucket
Parameters:
name of etl to be used for transformations
destination bucket for transformations
Timeout of the ETL job (e.g. 5m for 5 minutes)
Only transform objects with names starting with this prefix
Value to prepend to the name of resulting transformed objects
Dict mapping each extension to the extension that will replace it (e.g. {“jpg”: “txt”})
determines if the copy should actually happen or not
override existing destination bucket
GET the latest object version from the associated remote bucket
synchronize destination bucket with its remote (e.g., Cloud or remote AIS) source
Number of concurrent workers for the transformation job per target
- 0 (default): number of mountpaths
- -1: single thread, serial execution
(bool): If True, continue processing objects even if some of them fail
List of ETL names to be used for the transformation pipeline
Returns: str
Job ID (as str) that can be used to check the status of the operation
Verify the bucket provider is a cloud provider
Write a dataset to a bucket in AIS in webdataset format using wds.ShardWriter. Logs the missing attributes
Parameters:
Configuration dict specifying how to process and store each part of the dataset item
Skip samples that are missing one or more attributes, defaults to True
Optional keyword arguments to pass to the ShardWriter