AIS Loader (aisloader)
AIS Loader (aisloader)
AIS Loader (aisloader)
AIS Loader (aisloader) is a tool to measure storage performance. It is a load generator that we constantly use to benchmark and stress-test AIStore or any S3-compatible backend.
In fact, aisloader can list, write, and read S3(**) buckets directly, which makes it quite useful, convenient, and easy to use benchmark to compare storage performance with AIStore in front of S3 and without.
(**)
aisloadercan be further easily extended to work directly with any Cloud storage provider including, but not limited to, AIStore-supported GCP, OCI, and Azure.
In addition, aisloader generates synthetic workloads that mimic training and inference workloads - the capability that allows to run benchmarks in isolation (which is often preferable) avoiding compute-side bottlenecks (if any) and associated complexity.
There’s a large set of command-line switches that allow to realize almost any conceivable workload, with basic permutations always including:
Detailed protocol-level tracing statistics are also available - see HTTP tracing section below for brief introduction.
October 2025 update: aisloader retains its StatsD integration for operational and benchmarking use cases. While AIStore itself now uses Prometheus exclusively, aisloader (for now) continues to emit its runtime metrics via StatsD — allowing DevOps to collect and visualize performance data from large aisloader fleets.
To integrate aisloader with Prometheus-based observability stacks, run the official Prometheus StatsD Exporter (which would translate aisloader’s StatsD metrics into Prometheus format).
Table of Contents
You can install aisloader using the following Bash script:
Alternatively, you can also build the tool directly from the source:
For usage, run: aisloader, aisloader usage, or aisloader --help.
For usage examples and extended commentary, see also:
This section presents two alternative, intentionally redundant views for usability: a concise alphabetical quick reference for fast lookups, and a grouped-by-category presentation with explanations and examples for deeper understanding.
For the most recently updated command-line options and examples, please run aisloader or aisloader usage.
clusterParams)bucketParams)workloadParams)sizeCksumParams)archParams)namingParams)readParams)multipartParams)mpdStreamParams)etlParams)loaderParams)statsParams)miscParams)The loads can run for a given period of time (option -duration <duration>) or until the specified amount of data is generated (option -totalputsize=<total size in KBs>).
If both options are provided the test finishes on the whatever-comes-first basis.
Example 100% write into the bucket “abc” for 2 hours:
The above will run for two hours or until it writes around 4GB data into the bucket, whatever comes first.
You can choose a percentage of writing (versus reading) by setting the option -pctput=<put percentage>.
Example with a mixed PUT=30% and GET=70% load:
Example 100% PUT:
The duration in both examples above is set to 5 minutes.
To test 100% read (
-pctput=0), make sure to fill the bucket beforehand.
The loader can read the entire object (default) or a range of object bytes.
To set the offset and length to read, use option -readoff=<read offset (in bytes)> and readlen=<length to read (in bytes)>.
For convenience, both options support size suffixes: k - for KiB, m - for MiB, and g - for GiB.
Example that reads a 32MiB segment at 1KB offset from each object stored in the bucket “abc”:
The test (above) will run for 5 minutes and will not “cleanup” after itself (next section).
NOTE: -cleanup is a mandatory option defining whether to destroy bucket upon completion of the benchmark.
The option must be specified in the command line.
Example:
The first line in this example above fills the bucket “abc” with 16MiB of random data. The second - uses existing data to test read performance for 1 hour, and then removes all data.
If you just need to clean up old data prior to running a test, run the loader with 0 (zero) total put size and zero duration:
For the PUT workload the loader generates randomly-filled objects. But what about object sizing?
By default, object sizes are randomly selected as well in the range between 1MiB and 1GiB. To set preferred (or fixed) object size(s), use the options -minsize=<minimal object size in KiB> and -maxsize=<maximum object size in KiB>
Before starting a test, it is possible to set mirror or EC properties on a bucket (for background, please see storage services).
For background on local mirroring and erasure coding (EC), please see storage services.
To achieve that, use the option -bprops. For example:
The above example shows the values that are globally default. You can omit the defaults and specify only those values that you’d want to change. For instance, to enable erasure coding on the bucket “abc”:
This example sets the number of data and parity slices to 2 which, in turn, requires the cluster to have at least 5 target nodes: 2 for data slices, 2 for parity slices and one for the original object.
Once erasure coding is enabled, its properties
data_slicesandparity_slicescannot be changed on the fly.
Note that (n
data_slices, mparity_slices) erasure coding requires at least (n + m + 1) target nodes in a cluster.
Even though erasure coding and/or mirroring can be enabled/disabled and otherwise reconfigured at any point in time, specifically for the purposes of running benchmarks it is generally recommended to do it once prior to writing any data to the bucket in question.
The following sequence populates a bucket configured for both local mirroring and erasure coding, and then reads from it for 1h:
Parameters in aisLoader that represent the number of bytes can be specified with a multiplicative suffix.
For example: 8M would specify 8 MiB.
The following multiplicative suffixes are supported: ‘t’ or ‘T’ - TiB ‘g’ or ‘G’ - GiB, ‘m’ or ‘M’ - MiB, ‘k’ or ‘K’ - KiB.
Note that this is entirely optional, and therefore an input such as 300 will be interpreted as 300 Bytes.
To state the same slightly differently, cluster endpoint can be defined in two ways:
AIS_ENDPOINT environment universally supported across all AIS clients, e.g.:In addition, environment can be used to specify client-side TLS (aka, HTTPS) configuration:
See also:
AIStore supports packing many small files into shards (TAR, ZIP, TGZ, LZ4-TAR) to improve performance and reduce metadata overhead.
AISLoader can benchmark both archive creation (PUT) and reading individual files from existing shards (GET).
When -arch.pct > 0 (PUT workloads that create shards) or -arch.list is set (read workloads), aisloader:
LsArchDir)shard-987.tar/file-042.bin)?archpath= API parameter / MossIn.ArchPath for GetBatchWithout either flag, an archive bucket is treated as a flat object store — GETs and GetBatch fetch whole shards, not inner files.
The displayed statistics will show whether objects are plain or archived, e.g.:
All PUTs create shards; each shard contains 10 files between 1KB and 100KB.
30% of PUT operations create shards; the rest create plain objects.
-arch.list is required for read-only archpath workloads — without it, listing returns shard names only and GET / GetBatch fetch whole shards.
With -arch.list set, aisloader:
photos.tar/00042.jpg)?archpath= to retrieve individual files inside shardsExample startup message:
Small-file workloads benefit greatly from sharding: fewer large objects → fewer metadata lookups → higher throughput.
Archived GETs add CPU overhead, especially for compressed formats (.tgz, .tar.lz4).
Throughput vs. operation rate tradeoff:
Example typical comparison:
-s3endpoint) does not support archive operations (sharding requires AIStore).For more information about AIStore’s archive/shard support, see
aisloader can benchmark the MultipartDownloadStream API, which downloads a single object using multiple concurrent HTTP range requests. This improves single-object GET throughput for chunked objects by engaging multiple disks on the server side.
When -pctmpdstream is non-zero, the specified fraction of GET operations will use MultipartDownloadStream instead of regular single-stream GET. The remaining GETs use the standard path. Statistics for MPD stream operations are tracked separately (labeled GET-MPDSTREAM in the output).
100% read using multipart download stream with 32 workers:
Mixed workload — 50% regular GET, 50% MPD stream:
-s3endpoint (direct S3 access)-get-batchsize-readoff / -readlen (range reads)With version 2.1, aisloader can now benchmark Get-Batch operations using the --get-batchsize flag (range: 1-1000). The tool consumes TAR streams (see note below), validates archived file counts, and tracks Get-Batch-specific statistics. The --continue-on-err flag enables testing of soft-error handling behavior.
Supported serialization formats include:
.tar(default),.tar.gz,.tar.lz4, and.zip.
The tool uses the [name-getter] abstraction (see https://github.com/NVIDIA/aistore/blob/main/bench/tools/aisloader/namegetter/ng.go) to enable efficient random reads across very large collections: objects and archived files.
The --epochs N flag enables full-dataset read passes, with different algorithms selected automatically based on dataset size:
PermAffinePrime: For datasets larger than 100k (by default) objects, an affine transformation with prime modulus provides memory-efficient pseudo-random access without storing full permutations. The algorithm fills batch requests completely and may span epoch boundaries.
PermShuffle: For datasets up to (default) 100k objects, Fisher-Yates shuffle with uint32 indices (50% memory reduction compared to previous implementation).
Selection Logic:
Command-line override to set the size threshold (instead of default
100k):--perm-shuffle-maxflag.
For the most recently updated command-line options and examples, please run aisloader or aisloader usage.
1. Create a 10-seconds load of 50% PUT and 50% GET requests:
2. Time-based 100% PUT into ais bucket. Upon exit the bucket is destroyed:
3. Timed (for 1h) 100% GET from a Cloud bucket, no cleanup:
4. Mixed 30%/70% PUT and GET of variable-size objects to/from a Cloud bucket. PUT will generate random object names and is limited by the 10GB total size. Cleanup enabled - upon completion all generated objects and the bucket itself will be deleted:
5. PUT 1GB total into an ais bucket with cleanup disabled, object size = 1MB, duration unlimited:
6. 100% GET from an ais bucket:
7. PUT 2000 objects named as aisloader/hex({0..2000}{loaderid}):
8. Use random object names and loaderID to report statistics:
9. PUT objects with random name generation being based on the specified loaderID and the total number of concurrent aisloaders:
10. Same as above except that loaderID is computed by the aisloader as hash(loaderstring) & 0xff:
11. Print loaderID and exit (all 3 examples below) with the resulting loaderID shown on the right:
12. Destroy existing ais bucket. If the bucket is Cloud-based, delete all objects:
13. Generate load on a cluster listening on custom IP address and port:
14. Generate load on a cluster listening on custom IP address and port from environment variable:
15. Use HTTPS when connecting to a cluster:
16. PUT TAR files with random files inside into a cluster:
17. Generate load on tar2tf ETL. New ETL is started and then stopped at the end. TAR files are PUT to the cluster. Only available when cluster is deployed on Kubernetes.
18. Timed 100% GET directly from S3 bucket (notice ‘-s3endpoint’ command line):
19. PUT approx. 8000 files into s3 bucket directly, skip printing usage and defaults. Similar to the previous example, aisloader goes directly to a given S3 endpoint (‘-s3endpoint’), and AIStore is not being used:
20. Generate a list of object names (once), and then run aisloader without executing list-objects:
21. GetBatch example: read random batches each consisting of 64 archived files
Collecting is easy - aisloader supports at-runtime monitoring via with Graphite using StatsD.
When starting up, aisloader will try to connect
to provided StatsD server (see: statsdip and statsdport options). Once the
connection is established the statistics from aisloader are send in the following
format:
metric_type - can be: gauge, timer, counterhostname - is the hostname of the machine on which the loader is ranloaderid - see: -loaderid optionmetric - can be: latency.*, get.*, put.*Grafana helps visualize the collected statistics. It is convenient to use and provides numerous tools to measure and calculate different metrics.
We provide simple script which allows you to set up the Graphite and Grafana servers which run inside separate dockers. To add new dashboards and panels, please follow: grafana tutorial.
When selecting a series in panel view, it should be in the format: stats.aisloader.<loader>.*.
Remember that metrics will not be visible (and you will not be able to select
them) until you start the loader.
Following is a brief illustrated sequence to enable detailed tracing, capture statistics, and toggle tracing on/off at runtime.
IMPORTANT NOTE:
The amount of generated (and extremely detailed) metrics can put a strain on your StatsD server. That’s exactly the reason for runtime switch to toggle HTTP tracing on/off. The example below shows how to do it (in particular, see
kill -HUP).
netcat listening on the default StatsD port 8125:SIGHUP:Note that other than
--trace-http, all command-line options in this section are used for purely illustrative purposes.
For benchmarking production-level clusters, a single AISLoader instance may not be able to fully saturate the load the cluster can handle. In this case, multiple aisloader instances can be coordinated via the AISLoader Composer. See the README for instructions on setting up.
For AIS observability (including CLI, Prometheus, and Kubernetes integration), please see:
For StatsD compliant backends, see:
Finally, for another supported - and alternative to StatsD - monitoring via Prometheus integration, see: