aistore.sdk.bucket

View as Markdown

Module Contents

Classes

NameDescription
BucketA class representing a bucket that contains user data.

API

class aistore.sdk.bucket.Bucket(
name: str,
client: aistore.sdk.request_client.RequestClient,
provider: typing.Union[aistore.sdk.provider.Provider, str] = Provider.AIS,
namespace: typing.Optional[aistore.sdk.types.Namespace] = None
)

Bases: AISSource

A class representing a bucket that contains user data.

Parameters:

client
RequestClient

Client for interfacing with AIS cluster

name
str

name of bucket

provider
str or ProviderDefaults to Provider.AIS

Provider of bucket (one of “ais”, “aws”, “gcp”, …), defaults to “ais”

namespace
NamespaceDefaults to None

Namespace of bucket, defaults to None

_provider
= Provider.parse(provider)
_qparam
= {QPARAM_PROVIDER: self.provider.value}
client
RequestClient

The client used by this bucket.

name
str

The name of this bucket.

namespace
Optional[Namespace]

The namespace for this bucket.

provider
Provider

The provider for this bucket.

qparam
Dict

Default query parameters to use with API calls from this bucket.

aistore.sdk.bucket.Bucket._get_uploaded_obj_name(
file,
root_path,
basename,
prepend
)
staticmethod
aistore.sdk.bucket.Bucket._verify_ais_bucket()

Verify the bucket provider is AIS

aistore.sdk.bucket.Bucket.as_model() -> aistore.sdk.types.BucketModel

Return a data-model of the bucket

Returns: BucketModel

BucketModel representation

aistore.sdk.bucket.Bucket.copy(
to_bck: aistore.sdk.bucket.Bucket,
prefix_filter: str = '',
prepend: str = '',
ext: typing.Optional[typing.Dict[str, str]] = None,
dry_run: bool = False,
force: bool = False,
latest: bool = False,
sync: bool = False,
num_workers: typing.Optional[int] = 0
) -> str

Returns job ID that can be used later to check the status of the asynchronous operation.

Parameters:

to_bck
Bucket

Destination bucket

prefix_filter
strDefaults to ''

Only copy objects with names starting with this prefix

prepend
strDefaults to ''

Value to prepend to the name of copied objects

ext
Dict[str, str]Defaults to None

Dict mapping each extension to the extension that will replace it (e.g. {“jpg”: “txt”})

dry_run
boolDefaults to False

Determines if the copy should actually happen or not

force
boolDefaults to False

Override existing destination bucket

latest
boolDefaults to False

GET the latest object version from the associated remote bucket

sync
boolDefaults to False

synchronize destination bucket with its remote (e.g., Cloud or remote AIS) source

num_workers
intDefaults to 0

Number of concurrent workers for the copy job per target

  • 0 (default): number of mountpaths
  • -1: single thread, serial execution

Returns: str

Job ID (as str) that can be used to check the status of the operation

Raises:

  • aistore.sdk.errors.AISError: All other types of errors with AIStore
  • requests.ConnectionError: Connection error
  • requests.ConnectionTimeout: Timed out connecting to AIStore
  • requests.exceptions.HTTPError: Service unavailable
  • requests.RequestException: “There was an ambiguous exception that occurred while handling…”
  • requests.ReadTimeout: Timed out receiving response from AIStore
aistore.sdk.bucket.Bucket.create(
exist_ok = False
)

Creates a bucket in AIStore cluster. Can only create a bucket for AIS provider on localized cluster. Remote cloud buckets do not support creation.

Parameters:

exist_ok
boolDefaults to False

Ignore error if the cluster already contains this bucket

Raises:

  • aistore.sdk.errors.AISError: All other types of errors with AIStore
  • aistore.sdk.errors.InvalidBckProvider: Invalid bucket provider for requested operation
  • requests.ConnectionError: Connection error
  • requests.ConnectionTimeout: Timed out connecting to AIStore
  • requests.exceptions.HTTPError: Service unavailable
  • requests.RequestException: “There was an ambiguous exception that occurred while handling…”
  • requests.ReadTimeout: Timed out receiving response from AIStore
aistore.sdk.bucket.Bucket.create_inventory(
name: str = '',
prefix: str = '',
props: str = '',
names_per_chunk: int = 0,
force: bool = False
) -> str

Create a native bucket inventory (NBI) — a pre-computed snapshot of the bucket’s object listing stored as chunked inventory files.

Parameters:

name
strDefaults to ''

Inventory name (must be unique per bucket). Auto-generated if empty.

prefix
strDefaults to ''

Only inventory objects matching this prefix.

props
strDefaults to ''

Comma-separated object properties to include. Default: “name,size,cached” (see Go createNBIHandler).

names_per_chunk
intDefaults to 0

Number of object names per inventory chunk. 0 means use the server default. See Go api/apc/nbi.go: MinInvNamesPerChunk (2), DfltInvNamesPerChunk (20K), MaxInvNamesPerChunk (640K).

force
boolDefaults to False

If True, remove any existing inventories for this bucket before creating a new one.

Returns: str

Job ID (xaction ID) for monitoring the inventory creation.

aistore.sdk.bucket.Bucket.delete(
missing_ok = False
)

Destroys bucket in AIStore cluster. In all cases removes both the bucket’s content and the bucket’s metadata from the cluster. Note: AIS will not call the remote backend provider to delete the corresponding Cloud bucket (iff the bucket in question is, in fact, a Cloud bucket).

Parameters:

missing_ok
boolDefaults to False

Ignore error if bucket does not exist

Raises:

  • aistore.sdk.errors.AISError: All other types of errors with AIStore
  • aistore.sdk.errors.InvalidBckProvider: Invalid bucket provider for requested operation
  • requests.ConnectionError: Connection error
  • requests.ConnectionTimeout: Timed out connecting to AIStore
  • requests.exceptions.HTTPError: Service unavailable
  • requests.RequestException: “There was an ambiguous exception that occurred while handling…”
  • requests.ReadTimeout: Timed out receiving response from AIStore
aistore.sdk.bucket.Bucket.destroy_inventory(
name: str = ''
) -> None

Destroy a native bucket inventory. If no name is specified, destroys all inventories for this bucket.

Parameters:

name
strDefaults to ''

Inventory name to destroy. If empty, all inventories for this bucket are destroyed.

aistore.sdk.bucket.Bucket.evict(
keep_md: bool = False
)

Evicts bucket in AIStore cluster. NOTE: only Cloud buckets can be evicted.

Parameters:

keep_md
boolDefaults to False

If true, evicts objects but keeps the bucket’s metadata (i.e., the bucket’s name and its properties)

Raises:

  • aistore.sdk.errors.AISError: All other types of errors with AIStore
  • aistore.sdk.errors.InvalidBckProvider: Invalid bucket provider for requested operation
  • requests.ConnectionError: Connection error
  • requests.ConnectionTimeout: Timed out connecting to AIStore
  • requests.exceptions.HTTPError: Service unavailable
  • requests.RequestException: “There was an ambiguous exception that occurred while handling…”
  • requests.ReadTimeout: Timed out receiving response from AIStore
aistore.sdk.bucket.Bucket.get_path() -> str

Get the path representation of this bucket

aistore.sdk.bucket.Bucket.head() -> requests.structures.CaseInsensitiveDict

Requests bucket properties.

Returns: CaseInsensitiveDict

Response header with the bucket properties

Raises:

  • aistore.sdk.errors.AISError: All other types of errors with AIStore
  • requests.ConnectionError: Connection error
  • requests.ConnectionTimeout: Timed out connecting to AIStore
  • requests.exceptions.HTTPError: Service unavailable
  • requests.RequestException: “There was an ambiguous exception that occurred while handling…”
  • requests.ReadTimeout: Timed out receiving response from AIStore
aistore.sdk.bucket.Bucket.info(
flt_presence: int = FLTPresence.FLT_EXISTS,
bsumm_remote: bool = True,
prefix: str = ''
)

Returns bucket summary and information/properties.

Parameters:

flt_presence
FLTPresenceDefaults to FLTPresence.FLT_EXISTS

Describes the presence of buckets and objects with respect to their existence or non-existence in the AIS cluster using the enum FLTPresence. Defaults to value FLT_EXISTS and values are: FLT_EXISTS - object or bucket exists inside and/or outside cluster FLT_EXISTS_NO_PROPS - same as FLT_EXISTS but no need to return summary FLT_PRESENT - bucket is present or object is present and properly located FLT_PRESENT_NO_PROPS - same as FLT_PRESENT but no need to return summary FLT_PRESENT_CLUSTER - objects present anywhere/how in the cluster as replica, ec-slices, misplaced FLT_EXISTS_OUTSIDE - not present; exists outside cluster

bsumm_remote
boolDefaults to True

If True, returned bucket info will include remote objects as well

prefix
strDefaults to ''

Only include objects with the given prefix in the bucket

Raises:

  • UnexpectedHTTPStatusCode: If the response status code is not as expected
  • requests.ConnectionError: Connection error
  • requests.ConnectionTimeout: Timed out connecting to AIStore
  • requests.exceptions.HTTPError: Service unavailable
  • requests.RequestException: “There was an ambiguous exception that occurred while handling…”
  • requests.ReadTimeout: Timed out receiving response from AIStore
  • ValueError: flt_presence is not one of the expected values
  • aistore.sdk.errors.AISError: All other types of errors with AIStore
aistore.sdk.bucket.Bucket.list_all_objects(
prefix: str = '',
props: str = '',
page_size: int = 0,
flags: typing.Optional[typing.List[aistore.sdk.list_object_flag.ListObjectFlag]] = None,
target: str = '',
inventory_name: str = ''
) -> typing.List[aistore.sdk.types.BucketEntry]

Returns a list of all objects in bucket

Parameters:

prefix
strDefaults to ''

return only objects that start with the prefix

props
strDefaults to ''

comma-separated list of object properties to return. Default value is “name,size”. Properties: “name”, “size”, “atime”, “version”, “checksum”, “cached”, “target_url”, “status”, “copies”, “ec”, “custom”, “node”.

page_size
intDefaults to 0

return at most “page_size” objects The maximum number of objects in response depends on the bucket backend. E.g, AWS bucket cannot return more than 5,000 objects in a single page. NOTE: If “page_size” is greater than a backend maximum, the backend maximum objects are returned. Defaults to “0” - return maximum number objects

flags
List[ListObjectFlag]Defaults to None

Optional list of ListObjectFlag enums to include as flags in the request

target
strDefaults to ''

Only list objects on this specific target node

inventory_name
strDefaults to ''

Name of a native bucket inventory (NBI) to list from. See list_objects for details.

Returns: List[BucketEntry]

List[BucketEntry]: list of objects in bucket

Raises:

  • aistore.sdk.errors.AISError: All other types of errors with AIStore
  • requests.ConnectionError: Connection error
  • requests.ConnectionTimeout: Timed out connecting to AIStore
  • requests.exceptions.HTTPError: Service unavailable
  • requests.RequestException: “There was an ambiguous exception that occurred while handling…”
  • requests.ReadTimeout: Timed out receiving response from AIStore
aistore.sdk.bucket.Bucket.list_all_objects_iter(
prefix: str = '',
props: str = 'name,size'
) -> typing.Iterable[aistore.sdk.obj.object.Object]

Implementation of the abstract method from AISSource that provides an iterator of all the objects in this bucket matching the specified prefix.

Parameters:

prefix
strDefaults to ''

Limit objects selected by a given string prefix

props
strDefaults to 'name,size'

Comma-separated list of object properties to return. Default value is “name,size”. Properties: “name”, “size”, “atime”, “version”, “checksum”, “target_url”, “copies”.

aistore.sdk.bucket.Bucket.list_archive(
archive_obj_name: str,
include_archive_obj: bool = False,
props: str = '',
page_size: int = 0
) -> typing.List[aistore.sdk.types.BucketEntry]

List files contained in an archived object (*.tar, *.zip, *.tgz, etc.).

This is a convenience wrapper around list_all_objects that automatically enables the ARCH_DIR list-flag so the cluster opens the shard and returns its directory.

Parameters:

archive_obj_name
str

Object key of the shard inside this bucket (e.g. "my-archive.tar"). Can include a prefix path.

include_archive_obj
boolDefaults to False

If True the returned list includes the parent archive object itself. When False (default) only the entries inside the shard are returned.

props
strDefaults to ''

Comma-separated list of object properties to request. Defaults to "" (no properties).

page_size
intDefaults to 0

Same meaning as in list_all_objects – how many names per internal page.

Returns: List[BucketEntry]

List[BucketEntry]: Entries representing the shard (optionally) and every file stored inside it.

aistore.sdk.bucket.Bucket.list_objects(
prefix: str = '',
props: str = '',
page_size: int = 0,
uuid: str = '',
continuation_token: str = '',
flags: typing.Optional[typing.List[aistore.sdk.list_object_flag.ListObjectFlag]] = None,
target: str = '',
inventory_name: str = ''
) -> aistore.sdk.types.BucketList

Returns a structure that contains a page of objects, job ID, and continuation token (to read the next page, if available).

Parameters:

prefix
strDefaults to ''

Return only objects that start with the prefix

props
strDefaults to ''

Comma-separated list of object properties to return. Default value is “name,size”. Properties: “name”, “size”, “atime”, “version”, “checksum”, “cached”, “target_url”, “status”, “copies”, “ec”, “custom”, “node”.

page_size
intDefaults to 0

Return at most “page_size” objects. The maximum number of objects in response depends on the bucket backend. E.g, AWS bucket cannot return more than 5,000 objects in a single page. NOTE: If “page_size” is greater than a backend maximum, the backend maximum objects are returned. Defaults to “0” - return maximum number of objects.

uuid
strDefaults to ''

Job ID, required to get the next page of objects

continuation_token
strDefaults to ''

Marks the object to start reading the next page

flags
List[ListObjectFlag]Defaults to None

Optional list of ListObjectFlag enums to include as flags in the request.

target
strDefaults to ''

Only list objects on this specific target node.

inventory_name
strDefaults to ''

Name of a native bucket inventory (NBI) to list from. Lists objects from the named inventory snapshot instead of querying the remote backend. Requires a previously created inventory (see create_inventory). Alternatively, to list without specifying a name pass flags=[ListObjectFlag.NBI] (valid only when exactly one inventory exists).

Returns: BucketList

the page of objects in the bucket and the continuation token to get the next page

Raises:

  • aistore.sdk.errors.AISError: All other types of errors with AIStore
  • requests.ConnectionError: Connection error
  • requests.ConnectionTimeout: Timed out connecting to AIStore
  • requests.exceptions.HTTPError: Service unavailable
  • requests.RequestException: “There was an ambiguous exception that occurred while handling…”
  • requests.ReadTimeout: Timed out receiving response from AIStore
aistore.sdk.bucket.Bucket.list_objects_iter(
prefix: str = '',
props: str = '',
page_size: int = 0,
flags: typing.Optional[typing.List[aistore.sdk.list_object_flag.ListObjectFlag]] = None,
target: str = '',
inventory_name: str = ''
) -> aistore.sdk.obj.object_iterator.ObjectIterator

Returns an iterator for all objects in bucket

Parameters:

prefix
strDefaults to ''

Return only objects that start with the prefix

props
strDefaults to ''

Comma-separated list of object properties to return. Default value is “name,size”. Properties: “name”, “size”, “atime”, “version”, “checksum”, “cached”, “target_url”, “status”, “copies”, “ec”, “custom”, “node”.

page_size
intDefaults to 0

return at most “page_size” objects The maximum number of objects in response depends on the bucket backend. E.g, AWS bucket cannot return more than 5,000 objects in a single page. NOTE: If “page_size” is greater than a backend maximum, the backend maximum objects are returned. Defaults to “0” - return maximum number objects

flags
List[ListObjectFlag]Defaults to None

Optional list of ListObjectFlag enums to include as flags in the request

target
strDefaults to ''

Only list objects on this specific target node

inventory_name
strDefaults to ''

Name of a native bucket inventory (NBI) to list from. See list_objects for details.

Returns: ObjectIterator

object iterator

Raises:

  • aistore.sdk.errors.AISError: All other types of errors with AIStore
  • requests.ConnectionError: Connection error
  • requests.ConnectionTimeout: Timed out connecting to AIStore
  • requests.exceptions.HTTPError: Service unavailable
  • requests.RequestException: “There was an ambiguous exception that occurred while handling…”
  • requests.ReadTimeout: Timed out receiving response from AIStore
aistore.sdk.bucket.Bucket.list_urls(
prefix: str = '',
etl: typing.Optional[aistore.sdk.etl.ETLConfig] = None
) -> typing.Iterable[str]

Generates full URLs for all objects in the bucket that match the specified prefix.

Parameters:

prefix
strDefaults to ''

A string prefix to filter objects. Only objects with names starting with this prefix will be included. Defaults to an empty string (no filtering).

etl
Optional[ETLConfig]Defaults to None

An optional ETL configuration. If provided, the URLs will include ETL processing parameters. Defaults to None.

aistore.sdk.bucket.Bucket.make_request(
method: str,
action: str,
value: typing.Optional[typing.Dict] = None,
params: typing.Optional[typing.Dict] = None,
name: str = ''
) -> requests.Response

Use the bucket’s client to make a request to the bucket endpoint on the AIS server

Parameters:

method
str

HTTP method to use, e.g. POST/GET/DELETE

action
str

Action string used to create an ActionMsg to pass to the server

value
dictDefaults to None

Additional value parameter to pass in the ActionMsg

params
dictDefaults to None

Optional parameters to pass in the request

name
strDefaults to ''

Name parameter to pass in the ActionMsg

Returns: requests.Response

Response from the server

aistore.sdk.bucket.Bucket.object(
obj_name: str,
props: typing.Optional[aistore.sdk.obj.object_props.ObjectProps] = None
) -> aistore.sdk.obj.object.Object

Factory constructor for an object in this bucket. Does not make any HTTP request, only instantiates an object in a bucket owned by the client.

Parameters:

obj_name
str

Name of object

props
ObjectPropsDefaults to None

Properties of the object, as updated by head(), optionally pre-initialized.

Returns: Object

The object created.

aistore.sdk.bucket.Bucket.objects(
obj_names: typing.Optional[typing.List[str]] = None,
obj_range: typing.Optional[aistore.sdk.multiobj.ObjectRange] = None,
obj_template: typing.Optional[str] = None
) -> aistore.sdk.multiobj.ObjectGroup

Factory constructor for multiple objects belonging to this bucket.

Parameters:

obj_names
List[str]Defaults to None

Names of objects to include in the group

obj_range
ObjectRangeDefaults to None

Range of objects to include in the group

obj_template
strDefaults to None

String template defining objects to include in the group

Returns: ObjectGroup

The ObjectGroup created

aistore.sdk.bucket.Bucket.put_files(
path: typing.Union[str, pathlib.Path],
prefix_filter: str = '',
pattern: str = '*',
basename: bool = False,
prepend: typing.Optional[str] = None,
recursive: bool = False,
dry_run: bool = False,
verbose: bool = True
) -> typing.List[str]

Puts files found in a given filepath as objects to a bucket in AIS storage.

Parameters:

path
str or Path

Local filepath, can be relative or absolute

prefix_filter
strDefaults to ''

Only put files with names starting with this prefix

pattern
strDefaults to '*'

Shell-style wildcard pattern to filter files

basename
boolDefaults to False

Whether to use the file names only as object names and omit the path information

prepend
strDefaults to None

Optional string to use as a prefix in the object name for all objects uploaded No delimiter (”/”, ”-”, etc.) is automatically applied between the prepend value and the object name

recursive
boolDefaults to False

Whether to recurse through the provided path directories

dry_run
boolDefaults to False

Option to only show expected behavior without an actual put operation

verbose
boolDefaults to True

Whether to print upload info to standard output

Returns: List[str]

List of object names put to a bucket in AIS

Raises:

  • requests.RequestException: “There was an ambiguous exception that occurred while handling…”
  • requests.ConnectionError: Connection error
  • requests.ConnectionTimeout: Timed out connecting to AIStore
  • requests.ReadTimeout: Timed out waiting response from AIStore
  • ValueError: The path provided is not a valid directory
aistore.sdk.bucket.Bucket.rename(
to_bck_name: str
) -> str

Renames bucket in AIStore cluster. Only works on AIS buckets. Returns job ID that can be used later to check the status of the asynchronous operation.

Parameters:

to_bck_name
str

New bucket name for bucket to be renamed as

Returns: str

Job ID (as str) that can be used to check the status of the operation

Raises:

  • aistore.sdk.errors.AISError: All other types of errors with AIStore
  • aistore.sdk.errors.InvalidBckProvider: Invalid bucket provider for requested operation
  • requests.ConnectionError: Connection error
  • requests.ConnectionTimeout: Timed out connecting to AIStore
  • requests.exceptions.HTTPError: Service unavailable
  • requests.RequestException: “There was an ambiguous exception that occurred while handling…”
  • requests.ReadTimeout: Timed out receiving response from AIStore
aistore.sdk.bucket.Bucket.show_inventory(
name: str = ''
) -> typing.Dict[str, aistore.sdk.types.NBIInfo]

Show native bucket inventory metadata.

Parameters:

name
strDefaults to ''

Inventory name to query. If empty, returns all inventories for this bucket.

Returns: Dict[str, NBIInfo]

Dict[str, NBIInfo]: Mapping of inventory object name to its metadata.

aistore.sdk.bucket.Bucket.summary(
uuid: str = '',
prefix: str = '',
cached: bool = True,
present: bool = True
)

Returns bucket summary (starts xaction job and polls for results).

Parameters:

uuid
strDefaults to ''

Identifier for the bucket summary. Defaults to an empty string.

prefix
strDefaults to ''

Prefix for objects to be included in the bucket summary. Defaults to an empty string (all objects).

cached
boolDefaults to True

If True, summary entails cached entities. Defaults to True.

present
boolDefaults to True

If True, summary entails present entities. Defaults to True.

Raises:

  • UnexpectedHTTPStatusCode: If the response status code is not as expected
  • requests.ConnectionError: Connection error
  • requests.ConnectionTimeout: Timed out connecting to AIStore
  • requests.exceptions.HTTPError: Service unavailable
  • requests.RequestException: “There was an ambiguous exception that occurred while handling…”
  • requests.ReadTimeout: Timed out receiving response from AIStore
  • aistore.sdk.errors.AISError: All other types of errors with AIStore
aistore.sdk.bucket.Bucket.transform(
etl_name: str,
to_bck: aistore.sdk.bucket.Bucket,
timeout: str = DEFAULT_ETL_TIMEOUT,
prefix_filter: str = '',
prepend: str = '',
ext: typing.Optional[typing.Dict[str, str]] = None,
force: bool = False,
dry_run: bool = False,
latest: bool = False,
sync: bool = False,
num_workers: typing.Optional[int] = 0,
cont_on_err: bool = False,
etl_pipeline: typing.Optional[typing.List[str]] = None
) -> str

Visits all selected objects in the source bucket and for each object, puts the transformed result to the destination bucket

Parameters:

etl_name
str

name of etl to be used for transformations

to_bck
str

destination bucket for transformations

timeout
strDefaults to DEFAULT_ETL_TIMEOUT

Timeout of the ETL job (e.g. 5m for 5 minutes)

prefix_filter
strDefaults to ''

Only transform objects with names starting with this prefix

prepend
strDefaults to ''

Value to prepend to the name of resulting transformed objects

ext
Dict[str, str]Defaults to None

Dict mapping each extension to the extension that will replace it (e.g. {“jpg”: “txt”})

dry_run
boolDefaults to False

determines if the copy should actually happen or not

force
boolDefaults to False

override existing destination bucket

latest
boolDefaults to False

GET the latest object version from the associated remote bucket

sync
boolDefaults to False

synchronize destination bucket with its remote (e.g., Cloud or remote AIS) source

num_workers
intDefaults to 0

Number of concurrent workers for the transformation job per target

  • 0 (default): number of mountpaths
  • -1: single thread, serial execution
cont_on_err
boolDefaults to False

(bool): If True, continue processing objects even if some of them fail

etl_pipeline
List[str]Defaults to None

List of ETL names to be used for the transformation pipeline

Returns: str

Job ID (as str) that can be used to check the status of the operation

aistore.sdk.bucket.Bucket.verify_cloud_bucket()

Verify the bucket provider is a cloud provider

aistore.sdk.bucket.Bucket.write_dataset(
config: aistore.sdk.dataset.dataset_config.DatasetConfig,
skip_missing: bool = True,
kwargs = {}
)

Write a dataset to a bucket in AIS in webdataset format using wds.ShardWriter. Logs the missing attributes

Parameters:

config
DatasetConfig

Configuration dict specifying how to process and store each part of the dataset item

skip_missing
boolDefaults to True

Skip samples that are missing one or more attributes, defaults to True

**kwargs
optionalDefaults to {}

Optional keyword arguments to pass to the ShardWriter