> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/aistore/llms.txt.
> For full documentation content, see https://docs.nvidia.com/aistore/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/aistore/_mcp/server.

# aistore.sdk.retry_config

Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.

## Module Contents

### Classes

| Name                                                   | Description                                                                                                      |
| ------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------- |
| [`ColdGetConf`](#aistore-sdk-retry_config-ColdGetConf) | Configuration class for retrying HEAD requests to objects that are not present in cluster when attempting a cold |
| [`RetryConfig`](#aistore-sdk-retry_config-RetryConfig) | Configuration class for managing both HTTP and network retries in AIStore.                                       |

### Functions

| Name                                                                               | Description                                                           |
| ---------------------------------------------------------------------------------- | --------------------------------------------------------------------- |
| [`_log_and_raise_on_exhaust`](#aistore-sdk-retry_config-_log_and_raise_on_exhaust) | `retry_error_callback`: log the underlying error with full traceback, |

### Data

[`NETWORK_RETRY_EXCEPTIONS`](#aistore-sdk-retry_config-NETWORK_RETRY_EXCEPTIONS)

### API

```python
class aistore.sdk.retry_config.ColdGetConf(
    est_bandwidth_bps: int = DEFAULT_COLD_GET_EST_BPS,
    max_cold_wait: int = DEFAULT_COLD_GET_MAX_WAIT,
    enable_remote_head: bool = True
)
```

Dataclass

Configuration class for retrying HEAD requests to objects that are not present in cluster when attempting a cold
GET.

**Attributes:**
est\_bandwidth\_bps (int): Estimated bandwidth in bytes per second from the AIS cluster to backend buckets.
Used to determine retry intervals for fetching remote objects.
Raising this will decrease the initial time we expect object fetch to take.
Defaults to 10 Gbps.
max\_cold\_wait (int): Within an individual retry, the maximum seconds to wait for an object's write lock to be
released before re-raising a ReadTimeoutError to be handled by the top-level RetryConfig.
Defaults to 3 minutes.
enable\_remote\_head (bool): Whether to send a HEAD request to the backend bucket if no size information for an
object exists locally. Used for retry optimization, but increases requests to the remote backend.
Defaults to True.

```python
class aistore.sdk.retry_config.RetryConfig(
    http_retry: urllib3.util.retry.Retry,
    network_retry: tenacity.Retrying,
    cold_get_conf: aistore.sdk.retry_config.ColdGetConf = ColdGetConf()
)
```

Dataclass

Configuration class for managing both HTTP and network retries in AIStore.

AIStore implements two types of retries to ensure reliability and fault tolerance:

1. **HTTP Retry (urllib3.Retry)** - Handles HTTP errors based on status codes (e.g., 429, 500, 502, 503, 504).
2. **Network Retry (tenacity)** - Recovers from connection failures, timeouts, and unreachable targets.

**Why two types of retries?**

* AIStore uses **redirects** for GET/PUT operations.
* If a target node is down, we must retry the request via the proxy instead of the same failing target.
* `network_retry` ensures that the request is reattempted at the **proxy level**, preventing unnecessary failures.

**Attributes:**
http\_retry (urllib3.Retry): Defines retry behavior for transient HTTP errors.
network\_retry (tenacity.Retrying): Configured `tenacity.Retrying` instance managing retries for network-related
issues, such as connection failures, timeouts, or unreachable targets.
cold\_get\_conf (ColdGetConf): Configuration for retrying COLD GET requests, see ColdGetConf class.

**Note on pickling (multi-process workloads):**
`network_retry` is a tenacity `Retrying` object that internally uses
lambdas/closures and is not picklable. When this config crosses a
process boundary (e.g. PyTorch `DataLoader(num_workers &gt; 0)` under the
`forkserver`/`spawn` start method, Ray, Dask, `ProcessPoolExecutor`),
`network_retry` is dropped during serialization and **rebuilt from
`RetryConfig.default()` in the worker** — any caller-customized
tenacity policy is lost in workers. Other fields (`http_retry`,
`cold_get_conf`) survive pickling unchanged. Single-process usage is
unaffected.

```python
aistore.sdk.retry_config.RetryConfig.__getstate__()
```

```python
aistore.sdk.retry_config.RetryConfig.__setstate__(
    state
)
```

```python
aistore.sdk.retry_config.RetryConfig.default() -> aistore.sdk.retry_config.RetryConfig
```

staticmethod

Returns the default retry configuration for AIStore.

```python
aistore.sdk.retry_config._log_and_raise_on_exhaust(
    retry_state
)
```

`retry_error_callback`: log the underlying error with full traceback,
then re-raise it. Per-retry attempts stay concise via `before_sleep_log`;
the call stack is emitted once, here, when retries are exhausted.

```python
aistore.sdk.retry_config.NETWORK_RETRY_EXCEPTIONS = (ConnectTimeout, ReadTimeout, ChunkedEncodingError, RequestsConnectionError, AIS...
```