For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Blog
DocsAPI Reference
DocsAPI Reference
  • Python SDK
        • Aistore
          • Botocore Patch
          • Mcp
          • Pytorch
          • Sdk
            • Ais Source
            • Archive Config
            • Authn
            • Batch
            • Blob Download Config
            • Bucket
            • Client
            • Cluster
            • Const
            • Dataset
            • Dsort
            • Enums
            • Errors
            • Etl
            • Job
            • List Object Flag
            • Lock Poller
            • Multiobj
            • Namespace
            • Obj
            • Provider
            • Request Client
            • Request Executor
            • Response Handler
            • Retry Config
            • Retry Manager
            • Session Manager
            • Types
            • Utils
            • Wait Result
          • Version
Blog
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoAIStore
On this page
  • Module Contents
  • Classes
  • Functions
  • Data
  • API
Python SDKPythonPythonAistoreSdk

aistore.sdk.retry_config

||View as Markdown|
Previous

aistore.sdk.response_handler

Next

aistore.sdk.retry_manager

Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.

Module Contents

Classes

NameDescription
ColdGetConfConfiguration class for retrying HEAD requests to objects that are not present in cluster when attempting a cold
RetryConfigConfiguration class for managing both HTTP and network retries in AIStore.

Functions

NameDescription
_log_and_raise_on_exhaustretry_error_callback: log the underlying error with full traceback,

Data

NETWORK_RETRY_EXCEPTIONS

API

class aistore.sdk.retry_config.ColdGetConf(
est_bandwidth_bps: int = DEFAULT_COLD_GET_EST_BPS,
max_cold_wait: int = DEFAULT_COLD_GET_MAX_WAIT,
enable_remote_head: bool = True
)
Dataclass

Configuration class for retrying HEAD requests to objects that are not present in cluster when attempting a cold GET.

Attributes: est_bandwidth_bps (int): Estimated bandwidth in bytes per second from the AIS cluster to backend buckets. Used to determine retry intervals for fetching remote objects. Raising this will decrease the initial time we expect object fetch to take. Defaults to 10 Gbps. max_cold_wait (int): Within an individual retry, the maximum seconds to wait for an object’s write lock to be released before re-raising a ReadTimeoutError to be handled by the top-level RetryConfig. Defaults to 3 minutes. enable_remote_head (bool): Whether to send a HEAD request to the backend bucket if no size information for an object exists locally. Used for retry optimization, but increases requests to the remote backend. Defaults to True.

enable_remote_head
bool = field(default=True)
est_bandwidth_bps
int = field(default=DEFAULT_COLD_GET_EST_BPS)
max_cold_wait
int = field(default=DEFAULT_COLD_GET_MAX_WAIT)
class aistore.sdk.retry_config.RetryConfig(
http_retry: urllib3.util.retry.Retry,
network_retry: tenacity.Retrying,
cold_get_conf: aistore.sdk.retry_config.ColdGetConf = ColdGetConf()
)
Dataclass

Configuration class for managing both HTTP and network retries in AIStore.

AIStore implements two types of retries to ensure reliability and fault tolerance:

  1. HTTP Retry (urllib3.Retry) - Handles HTTP errors based on status codes (e.g., 429, 500, 502, 503, 504).
  2. Network Retry (tenacity) - Recovers from connection failures, timeouts, and unreachable targets.

Why two types of retries?

  • AIStore uses redirects for GET/PUT operations.
  • If a target node is down, we must retry the request via the proxy instead of the same failing target.
  • network_retry ensures that the request is reattempted at the proxy level, preventing unnecessary failures.

Attributes: http_retry (urllib3.Retry): Defines retry behavior for transient HTTP errors. network_retry (tenacity.Retrying): Configured tenacity.Retrying instance managing retries for network-related issues, such as connection failures, timeouts, or unreachable targets. cold_get_conf (ColdGetConf): Configuration for retrying COLD GET requests, see ColdGetConf class.

Note on pickling (multi-process workloads): network_retry is a tenacity Retrying object that internally uses lambdas/closures and is not picklable. When this config crosses a process boundary (e.g. PyTorch DataLoader(num_workers > 0) under the forkserver/spawn start method, Ray, Dask, ProcessPoolExecutor), network_retry is dropped during serialization and rebuilt from RetryConfig.default() in the worker — any caller-customized tenacity policy is lost in workers. Other fields (http_retry, cold_get_conf) survive pickling unchanged. Single-process usage is unaffected.

cold_get_conf
ColdGetConf = field(default_factory=ColdGetConf)
http_retry
Retry
network_retry
Retrying
aistore.sdk.retry_config.RetryConfig.__getstate__()
aistore.sdk.retry_config.RetryConfig.__setstate__(
state
)
aistore.sdk.retry_config.RetryConfig.default() -> aistore.sdk.retry_config.RetryConfigaistore.sdk.retry_config.RetryConfig.default() -> aistore.sdk.retry_config.RetryConfig
staticmethod

Returns the default retry configuration for AIStore.

aistore.sdk.retry_config._log_and_raise_on_exhaust(
retry_state
)

retry_error_callback: log the underlying error with full traceback, then re-raise it. Per-retry attempts stay concise via before_sleep_log; the call stack is emitted once, here, when retries are exhausted.

aistore.sdk.retry_config.NETWORK_RETRY_EXCEPTIONS = (ConnectTimeout, ReadTimeout, ChunkedEncodingError, RequestsConnectionError, AIS...