GetBatch: Distributed Multi-Object Retrieval
GetBatch: Distributed Multi-Object Retrieval
GetBatch: Distributed Multi-Object Retrieval
Paper: GetBatch: Distributed Multi-Object Retrieval for ML Data Loading - implementation, benchmarks, and discussion.
GetBatch is AIStore’s high-performance API for retrieving multiple objects and/or archived files in a single request. Behind the scenes, GetBatch assembles the requested data from across the cluster and delivers the result as a continuous serialized stream.
Regardless of retrieval source (in-cluster objects, remote objects, or shard extractions), GetBatch always preserves the exact ordering of request entries in both streaming and buffered modes.
Other supported capabilities include:
GetBatch is implemented by the eXtended Action (job). Internally, the job is codenamed
x-moss. The respective API endpoint is:/v1/ml/moss.
Note: buffered mode always returns both metadata (that describes the output) and the resulting serialized archive containing all requested data items.
Note: for TAR.GZ, both .tgz and .tar.gz are accepted (and interchangeable) aliases - both denote the same exact format.
A Note on HTTP Semantics
GetBatch uses HTTP GET with a JSON request body, which is:
Rest of this document is structured as follows:
Table of Contents
GetBatch is typically called from:
api/ml.go client bindings, andThe respective Go and Python-based usage examples follow below, in the sections that include:
ML Training Pipelines
Distributed Shard Processing
Ordered Batch Retrieval
Whether or not batch sizes can be deemed “small” will ultimately (obviously) depend on the workload and cluster; any specific numbers in this document must be considered a “rule of thumb” rather than direct prescription or limitation of any kind.
apc.MossReqField Details:
apc.MossInEach entry in the in array can specify:
apc.MossRespStreaming mode (strm: true):
Buffered mode (strm: false):
MossResp JSON)Response Entry: apc.MossOut
GetBatch also offers robust integration with Python data pipelines, with official support in both the AIStore Python SDK and third-party libraries:
The Python SDK provides a Batch class that wraps the /v1/ml/moss endpoint with a Pythonic fluent API.
Key features:
MossReq, MossIn, MossOut) that mirror Go structs exactlyThe SDK mirrors Go structures, with minor naming conventions:
This mapping helps translate examples between Go and Python.
Basic usage:
Batch class methods:
add(obj, archpath=None, opaque=None, start=None, length=None) - Add object with advanced parametersget(raw=False, decode_as_stream=False, clear_batch=True) - Execute and return generator of (MossOut, bytes) tuplesclear() - Clear batch for reuseResponse structure:
streaming_get=True, default): Returns TAR stream, MossOut metadata inferred from requeststreaming_get=False): Returns server-validated MossOut with actual sizes, errorsSee Python SDK Batch API for complete documentation.
Lhotse is a speech/audio data toolkit used by NVIDIA NeMo and other frameworks. It includes native AIStore support via AISBatchLoader.
How it works:
AISBatchLoader collects all URLs from a CutSet batchUsage:
See also:
A complete, runnable example for batch loading audios from AIStore with Lhotse is available in here:
Archive extraction example:
Lhotse URLs like ais://mybucket/shard-0000.tar.gz/audio/sample_42.wav automatically:
shard-0000.tar.gz) + archpath (audio/sample_42.wav)archpath parameterArchitecture:
Key benefits:
See also:
NVIDIA NeMo is an end-to-end platform for building and training state-of-the-art AI models, including Automatic Speech Recognition (ASR), Natural Language Processing (NLP), and Text-to-Speech (TTS).
GetBatch is now integrated into the Lhotse-based ASR dataloader. Instead of fetching individual audio files from AIStore during training, the dataloader now batches all required samples for an epoch or mini-batch into a single GetBatch request. This reduces network overhead and improves data loading throughput for large-scale ASR training.
Enabling GetBatch:
A single environment variable activates batch loading for ASR training pipelines using Lhotse+AIStore. No code changes required.
How it works:
This same pattern can be integrated in other NeMo training pipelines (NLP, TTS, multimodal) where datasets are stored in AIStore, providing similar performance benefits for data-intensive workloads.
See also:
The AIStore PyTorch Plugin provides AISBatchIterDataset, an iterable-style dataset that uses GetBatch API for efficient multi-worker data loading with automatic batching and streaming support.
Note: curl examples in this section are purely illustrative - do not copy/paste.
Result: batch.tar containing:
Result: extracted.tar containing:
The request bucket (
default-bucketin URL) is used when bucket is omitted in an in entry.
Result: TAR containing objects from three different buckets in one request.
Response metadata shows which items failed:
TAR contains:
Note: Actual throughput will vary significantly based on object sizes, network topology, storage backend, CPU capability, and cluster configuration. Numbers below are purely indicative ranges rather than guaranteed performance targets.
Note: Archived file extraction from compressed formats is CPU-intensive. A single target extracting from 1000s of TAR/ZIP shards will see significant CPU load.
First-byte latency: 50-500ms (can vary based on cluster size and load)
Streaming throughput: Wire-speed after first byte
When the same TAR shard appears in many batches — typical of sharded ML training datasets — linear scanning of the archive for each requested archpath entry becomes the dominant cost. The shard index subsystem builds and persists a per-shard index of offsets/lengths so that lom.NewArchpathReader can seek directly to the requested file.
GetBatch consults the index transparently on the read path (xact/xs/moss.go calls NewArchpathReader, which uses the fast path when an index exists). To prepare a bucket:
ais shard-index create <bucket> builds indexes for existing shards.ais://.sys-shardidx system bucket and are removed automatically when the underlying shard object is deleted.For batches that fan out across many archived files in different shards, this changes per-file extraction from O(archive size) to O(1) + read.
Memory: Bounded by DT capacity + load-based throttling
CPU: Varies by workload
coer: false)Behavior: First error aborts entire request
Use when:
Error response: HTTP 4xx/5xx, no partial data
coer: true) ✅ RecommendedBehavior: Continue processing, mark missing items
Use when:
Missing items:
__404__/bucket/object with size=0err_msg describing failure__404__/Soft error limit: Configurable per work item (default: 6)
get_batch.max_soft_errs (default: 6)Maximum transient errors per work item before aborting.
When to increase:
When to decrease:
get_batch.warmup_workers (default: 2)Pagecache warming pool size (best-effort read-ahead).
When to increase:
When to disable (set to -1):
To disable warmup/look-ahead operation:
GetBatch supports multiple archive formats via the mime field:
Recommendation: Use .tar for maximum throughput unless network bandwidth is constrained.
onob: false)Files in output TAR include bucket prefix:
onob: true)Files in output TAR omit bucket:
When extracting from shards with archpath:
Default (onob: false):
Object-only (onob: true):
GetBatch exposes Prometheus metrics for:
See: Monitoring GetBatch for detailed metrics, PromQL queries, and operational guidance.
CLI wll render
CtlMsgoutput on multiple lines when it includes multiple aggregated messages.
GetBatch works on in-cluster data: sharded (with shards of any kind — TAR, TAR.GZ, TAR.LZ4, ZIP), or plain monolithic objects, or chunked objects. As long as the requested data is stored on the cluster’s target disks, GetBatch will assemble and return it.
For TAR shards, GetBatch reads files via the shard index fast path when an index has been built for the shard — extracting an archpath entry becomes direct random access into the archive instead of a linear scan. The index is persisted in the ais://.sys-shardidx system bucket and is consulted transparently by both GET and GetBatch.
Remote (cold) GET is not supported.
When GetBatch is invoked on a remote bucket (s3://, gs://, az://, oc://) and one or more requested objects are not already stored on any target’s disks, GetBatch will not fetch them from the backend. Instead, each missing object is recorded as a soft-error placeholder (__404__/<objname> in the output archive), and the request as a whole may fail when the soft-error budget (max_soft_errs) is exceeded. The requested objects must be pre-loaded into the cluster (e.g., via ais bucket prefetch, an explicit ais get, or a separate warmup pass) before issuing GetBatch.
Cluster membership changes are disruptive to in-flight requests. Any event that mutates the cluster map (target restart, pod reschedule, scale-up, scale-down, primary proxy failover) will:
reason: starting x-rebalance[...] and surface as ErrNotFound: (prep-rx not done?) on the client.context deadline exceeded for several minutes.Clients should retry GetBatch after a brief backoff once the rebalance completes. For long-running training pipelines, treat GetBatch failures during/after a membership event as expected and rely on the client-side retry path.
Range reads not yet implemented.
Per-entry start/length fields in MossIn are reserved for future use; supplying them currently returns ErrNotImpl.
Shard extraction is sequential within each archive. When multiple files are requested from the same shard, they are extracted in sequence rather than in parallel. This is a performance limitation, not a correctness issue — and is largely mitigated when the shard index is enabled, which converts each per-file lookup from a linear scan into direct random access.