aistore.sdk.retry_config

Module Contents

Classes

Name	Description
`ColdGetConf`	Configuration class for retrying HEAD requests to objects that are not present in cluster when attempting a cold
`RetryConfig`	Configuration class for managing both HTTP and network retries in AIStore.

Functions

Name	Description
`_log_and_raise_on_exhaust`	`retry_error_callback`: log the underlying error with full traceback,

Data

NETWORK_RETRY_EXCEPTIONS

API

class aistore.sdk.retry_config.ColdGetConf(
    est_bandwidth_bps: int = DEFAULT_COLD_GET_EST_BPS,
    max_cold_wait: int = DEFAULT_COLD_GET_MAX_WAIT,
    enable_remote_head: bool = True
)

Dataclass

Configuration class for retrying HEAD requests to objects that are not present in cluster when attempting a cold GET.

Attributes: est_bandwidth_bps (int): Estimated bandwidth in bytes per second from the AIS cluster to backend buckets. Used to determine retry intervals for fetching remote objects. Raising this will decrease the initial time we expect object fetch to take. Defaults to 10 Gbps. max_cold_wait (int): Within an individual retry, the maximum seconds to wait for an object’s write lock to be released before re-raising a ReadTimeoutError to be handled by the top-level RetryConfig. Defaults to 3 minutes. enable_remote_head (bool): Whether to send a HEAD request to the backend bucket if no size information for an object exists locally. Used for retry optimization, but increases requests to the remote backend. Defaults to True.

enable_remote_head

bool = field(default=True)

est_bandwidth_bps

int = field(default=DEFAULT_COLD_GET_EST_BPS)

max_cold_wait

int = field(default=DEFAULT_COLD_GET_MAX_WAIT)

class aistore.sdk.retry_config.RetryConfig(
    http_retry: urllib3.util.retry.Retry,
    network_retry: tenacity.Retrying,
    cold_get_conf: aistore.sdk.retry_config.ColdGetConf = ColdGetConf()
)

Dataclass

Configuration class for managing both HTTP and network retries in AIStore.

AIStore implements two types of retries to ensure reliability and fault tolerance:

HTTP Retry (urllib3.Retry) - Handles HTTP errors based on status codes (e.g., 429, 500, 502, 503, 504).
Network Retry (tenacity) - Recovers from connection failures, timeouts, and unreachable targets.

Why two types of retries?

AIStore uses redirects for GET/PUT operations.
If a target node is down, we must retry the request via the proxy instead of the same failing target.
network_retry ensures that the request is reattempted at the proxy level, preventing unnecessary failures.

Attributes: http_retry (urllib3.Retry): Defines retry behavior for transient HTTP errors. network_retry (tenacity.Retrying): Configured tenacity.Retrying instance managing retries for network-related issues, such as connection failures, timeouts, or unreachable targets. cold_get_conf (ColdGetConf): Configuration for retrying COLD GET requests, see ColdGetConf class.

Note on pickling (multi-process workloads): network_retry is a tenacity Retrying object that internally uses lambdas/closures and is not picklable. When this config crosses a process boundary (e.g. PyTorch DataLoader(num_workers > 0) under the forkserver/spawn start method, Ray, Dask, ProcessPoolExecutor), network_retry is dropped during serialization and rebuilt from RetryConfig.default() in the worker — any caller-customized tenacity policy is lost in workers. Other fields (http_retry, cold_get_conf) survive pickling unchanged. Single-process usage is unaffected.

cold_get_conf

ColdGetConf = field(default_factory=ColdGetConf)

http_retry

Retry

network_retry

Retrying

aistore.sdk.retry_config.RetryConfig.__getstate__()

aistore.sdk.retry_config.RetryConfig.__setstate__(
    state
)

aistore.sdk.retry_config.RetryConfig.default() -> aistore.sdk.retry_config.RetryConfigaistore.sdk.retry_config.RetryConfig.default() -> aistore.sdk.retry_config.RetryConfig

staticmethod

Returns the default retry configuration for AIStore.

aistore.sdk.retry_config._log_and_raise_on_exhaust(
    retry_state
)

retry_error_callback: log the underlying error with full traceback, then re-raise it. Per-retry attempts stay concise via before_sleep_log; the call stack is emitted once, here, when retries are exhausted.

aistore.sdk.retry_config.NETWORK_RETRY_EXCEPTIONS = (ConnectTimeout, ReadTimeout, ChunkedEncodingError, RequestsConnectionError, AIS...

Module Contents

Classes

Name	Description
`ColdGetConf`	Configuration class for retrying HEAD requests to objects that are not present in cluster when attempting a cold
`RetryConfig`	Configuration class for managing both HTTP and network retries in AIStore.

Functions

Name	Description
`_log_and_raise_on_exhaust`	`retry_error_callback`: log the underlying error with full traceback,

Data

NETWORK_RETRY_EXCEPTIONS

API

class aistore.sdk.retry_config.ColdGetConf(
    est_bandwidth_bps: int = DEFAULT_COLD_GET_EST_BPS,
    max_cold_wait: int = DEFAULT_COLD_GET_MAX_WAIT,
    enable_remote_head: bool = True
)

Dataclass

Configuration class for retrying HEAD requests to objects that are not present in cluster when attempting a cold GET.

enable_remote_head

bool = field(default=True)

est_bandwidth_bps

int = field(default=DEFAULT_COLD_GET_EST_BPS)

max_cold_wait

int = field(default=DEFAULT_COLD_GET_MAX_WAIT)

class aistore.sdk.retry_config.RetryConfig(
    http_retry: urllib3.util.retry.Retry,
    network_retry: tenacity.Retrying,
    cold_get_conf: aistore.sdk.retry_config.ColdGetConf = ColdGetConf()
)

Dataclass

Configuration class for managing both HTTP and network retries in AIStore.

AIStore implements two types of retries to ensure reliability and fault tolerance:

HTTP Retry (urllib3.Retry) - Handles HTTP errors based on status codes (e.g., 429, 500, 502, 503, 504).
Network Retry (tenacity) - Recovers from connection failures, timeouts, and unreachable targets.

Why two types of retries?

AIStore uses redirects for GET/PUT operations.
If a target node is down, we must retry the request via the proxy instead of the same failing target.
network_retry ensures that the request is reattempted at the proxy level, preventing unnecessary failures.

cold_get_conf

ColdGetConf = field(default_factory=ColdGetConf)

http_retry

Retry

network_retry

Retrying

aistore.sdk.retry_config.RetryConfig.__getstate__()

aistore.sdk.retry_config.RetryConfig.__setstate__(
    state
)

aistore.sdk.retry_config.RetryConfig.default() -> aistore.sdk.retry_config.RetryConfigaistore.sdk.retry_config.RetryConfig.default() -> aistore.sdk.retry_config.RetryConfig

staticmethod

Returns the default retry configuration for AIStore.

aistore.sdk.retry_config._log_and_raise_on_exhaust(
    retry_state
)

aistore.sdk.retry_config.NETWORK_RETRY_EXCEPTIONS = (ConnectTimeout, ReadTimeout, ChunkedEncodingError, RequestsConnectionError, AIS...