For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Blog
DocsAPI Reference
DocsAPI Reference
    • AIStore
    • Documentation
  • Core Documentation
    • In-depth Overview
    • Terminology and core abstractions
    • Getting Started
    • Networking model
    • Buckets: design, operations, namespaces, and system buckets
    • Observability overview
    • CLI overview
    • Production deployment
    • Technical Blog
  • APIs, SDKs, and Compatibility
    • Go API
    • Python SDK
    • PyPI package
    • Python SDK reference guide
    • PyTorch integration
    • TensorFlow integration
    • HTTP API reference
    • curl examples
    • Easy URL
    • S3 compatibility
    • s3cmd quick start
    • Presigned S3 requests
    • Boto3 support
  • Command-Line Interface
    • CLI overview
    • ais help
    • CLI reference guide
    • Bucket operations
    • Cluster and remote-cluster management
    • Storage and mountpath management
    • Monitoring and ais show
    • Downloads
    • Jobs
    • Authentication and access control
    • Configuration via CLI
    • ETL CLI
    • Distributed shuffle CLI
    • ML / get-batch CLI
    • GCP credentials
    • TLS certificate management
  • Storage and Data Management
    • Storage services
    • Buckets: design, operations, namespaces, and system buckets
    • Native Bucket Inventory (NBI)
    • Backend providers
    • On-disk layout
    • Virtual directories
    • System files
    • Evicting remote buckets and cached data
  • Cluster Operations
    • Node lifecycle: maintenance, shutdown, decommission
    • Global rebalance
    • Resilver
    • AIS in Containerized Environments
    • Highly available control plane
    • Information Center (IC)
    • Out-of-band updates
    • Troubleshooting
  • Configuration and Security
    • Configuration
    • Environment variables
    • Feature flags
    • AuthN and access control
    • Authentication validation
    • HTTPS and certificates
    • Switching a cluster to HTTPS
  • ETL and Advanced Workflows
    • ETL overview
    • ETL CLI docs
    • ETL Python SDK examples
    • Custom transformers
    • ETL Python webserver SDK
    • ETL Go webserver package
    • Archives: read, write, and list
    • Distributed shuffle (dsort)
    • Initial sharding utility (ishard)
    • Downloader
    • Blob Downloader
    • Batch object retrieval (get-batch)
    • Batch operations
    • Tools and utilities
    • Extended actions (xactions)
  • Observability, Monitoring, and Performance
    • Observability overview
    • Monitoring with CLI
    • Logs
    • Prometheus integration
    • Metrics reference
    • Grafana dashboards
    • Kubernetes monitoring
    • Distributed tracing
    • Monitoring get-batch
    • AIS load generator (aisloader)
    • Benchmarking AIStore
    • Performance tuning and testing
    • Performance monitoring via CLI
    • Rate limiting
    • Checksumming
    • Filesystem Health Checker (FSHC)
    • Traffic patterns
  • Networking
    • Networking: multi-homing, network separation, IPv6
    • HTTPS configuration
    • Switching to HTTPS
    • Idle connections
    • MessagePack protocol
  • Deployment
    • AIStore on Kubernetes
    • Kubernetes Operator
    • Ansible playbooks
    • Helm charts
    • Deployment monitoring
    • Docker
  • Developer Resources
    • Development guide
    • aisnode command line
    • Build tags
  • Object and Bucket Naming
    • Unicode and special symbols in object and bucket names
    • Extremely long object names
Blog
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoAIStore
On this page
  • Blob Downloader
  • Usage
  • 1. Single object blob-download job
  • 2. Prefetch with blob-threshold
  • 3. Streaming GET (Python SDK Only)
  • Selecting an effective blob-threshold for prefetch
ETL and Advanced Workflows

Blob Downloader

||View as Markdown|
Previous

Downloader

Next

Batch object retrieval (get-batch)

Blob Downloader

Blob downloader is AIStore’s facility for downloading large remote objects (BLOBs) using concurrent range-reads.
Instead of pulling a 10–100+ GiB object with a single sequential stream, blob downloader:

  • splits the object into chunks (configurable chunk size),
  • fetches those chunks in parallel from the remote backend (configurable number of workers),
  • writes them directly into AIStore’s chunked object layout so all target disks are writing in parallel, effectively aggregating the full disk write bandwidth of the node.

Blob Downloader

The result is that, beyond a certain object size, blob downloader can deliver much higher throughput than a regular cold GET. In our internal benchmarks, a 4 GiB S3 object fetched with blob downloader was up to 4× faster than a monolithic cold GET.

Blob downloader is also load‑aware: it consults AIStore’s internal load advisors to avoid overcommitting memory or disks, backing off when the node is under pressure and running at full speed when the system has headroom.

For a deeper dive into the internals and detailed benchmarks, see the blog post.


Usage

AIStore exposes blob download functionality through three distinct interfaces, each suited to different use cases.

  • Single object blob-download job – explicitly start a blob-download job for one or more objects.
  • Prefetch + blob-threshold – route large objects in the prefetch job through blob downloader.
  • Streaming GET – stream a large object from blob downloader while it is being cached in AIS.

1. Single object blob-download job

Use this when you want direct control over which/how objects are fetched with blob downloader.

Help and options:

1$ ais blob-download --help
2
3NAME:
4 ais blob-download - (alias for "job start blob-download") Download a large object or multiple objects from remote storage, e.g.:
5 - 'blob-download s3://ab/largefile --chunk-size=2mb --progress' - download one blob at a given chunk size
6 - 'blob-download s3://ab --list "f1, f2" --num-workers=4 --progress' - run 4 concurrent readers to download 2 (listed) blobs
7 When _not_ using '--progress' option, run 'ais show job' to monitor.
8
9USAGE:
10 ais blob-download BUCKET/OBJECT_NAME [command options]
11
12OPTIONS:
13 chunk-size value Chunk size in IEC or SI units, or "raw" bytes (e.g.: 4mb, 1MiB, 1048576, 128k)
14 num-workers value Number of concurrent blob-downloading workers (readers); system default when omitted or zero (default: 0)
15 list value Comma-separated list of object or file names
16 latest Check and optionally synchronize the latest object version from the remote bucket
17 progress Show progress bar(s) in real time
18 wait Block until the job finishes (optionally use '--timeout' to limit waiting time)
19 ...

Examples:

  • Single large object

    1$ ais blob-download s3://my-bucket/large-model.bin \
    2 --chunk-size 4MiB \
    3 --num-workers 8 \
    4 --wait --progress
  • Multiple objects in one job

    1$ ais blob-download s3://my-bucket \
    2 --list "obj1.tar,obj2.bin,obj3.dat" \
    3 --chunk-size 8MiB \
    4 --num-workers 4 \
    5 --wait --progress

2. Prefetch with blob-threshold

prefetch is AIStore’s multi‑object “warm‑up” job for remote buckets. When you add a blob size threshold, it automatically decides which objects are large enough to benefit from blob downloader:

  • Objects ≥ --blob-threshold are fetched via blob downloader (parallel range‑reads, chunked writes).
  • Objects < --blob-threshold are fetched with the normal cold GET path.

This lets you get the large‑object gains of blob downloader by just tuning prefetch’s knobs.

Example:

1# Inspect a remote bucket
2$ ais ls s3://my-bucket
3NAME SIZE CACHED
4model.ckpt 12.50GiB no
5dataset.tar 8.30GiB no
6config.json 4.20KiB no
7
8# Prefetch with 1 GiB threshold:
9# - objects ≥ threshold use blob downloader (parallel chunks)
10# - objects < threshold use standard cold GET
11$ ais prefetch s3://my-bucket \
12 --blob-threshold 1GiB \
13 --blob-chunk-size 8MiB \
14 --wait --progress
15prefetch-objects[E-abc123]: prefetch entire bucket s3://my-bucket

Key prefetch options:

  • --blob-threshold SIZE: turn blob downloader on for objects at/above SIZE.
  • --blob-chunk-size SIZE (if available in your build): override default blob chunk size for this prefetch.
  • --prefix / --list / --template: scope which objects are prefetched.

3. Streaming GET (Python SDK Only)

In addition to CLI jobs, blob downloader can be used to stream large objects while they are concurrently downloaded in the cluster. This is useful when you want to feed data directly into an application (for example, model loading or preprocessing) and still keep a local cached copy in AIS.

1from aistore import Client
2from aistore.sdk.blob_download_config import BlobDownloadConfig
3
4# Set up AIS client and bucket
5client = Client("AIS_ENDPOINT")
6bucket = client.bucket(name="my_bucket", provider="aws")
7
8# Configure blob downloader (4 MiB chunks, 16 workers)
9blob_cfg = BlobDownloadConfig(chunk_size="4MiB", num_workers="16")
10
11# Stream large object using blob downloader settings
12reader = bucket.object("my_large_object").get_reader(blob_download_config=blob_cfg)
13data = reader.read_all()

Selecting an effective blob-threshold for prefetch

The ideal --blob-threshold depends on your cluster (CPU, disks, network), backend (S3/GCS/…​), and object size distribution.
Running full prefetch experiments for many candidate values can easily take hours, so instead we recommend using a shorter single‑object blob-download benchmark to pick a good starting point and then using that value directly in your prefetch job.

To do this in practice, compare cold GET vs. blob-download on a single object:

  1. Pick a representative large remote object in your bucket (for example, a model shard or big archive).

  2. Evict it from AIStore to ensure a cold path:

    1$ ais evict s3://my-bucket --list "large-model.bin"
  3. Measure cold GET time for that object:

    1$ time ais get s3://my-bucket/large-model.bin /dev/null
  4. Measure blob-download time for the same object:

    1$ ais evict s3://my-bucket --list "large-model.bin"
    2
    3$ time ais blob-download s3://my-bucket/large-model.bin --wait
  5. Repeat the above for a few object sizes (for example: 64 MiB, 256 MiB, 1 GiB, 4 GiB) until you see a pattern:

  • Below some size, cold GET is as fast or faster (blob overhead dominates).
  • Above that size, blob-download is consistently faster.

The crossover size where blob-download wins is your blob-threshold for prefetch: use that size as --blob-threshold when you run your real ais prefetch job. This single‑object comparison gives you a quick, reasonable approximation.

In our internal 1.56 TiB S3 benchmark, applying this method led us to a threshold of about 256 MiB. This value provided the best trade‑off for that specific cluster and workload and delivered roughly 2.3× faster end‑to‑end prefetch compared to a pure cold‑GET baseline.