AIStore & Amazon S3 Compatibility

View as Markdown

AIStore (AIS) is a lightweight, distributed object storage system designed to scale linearly with each added storage node. It provides a uniform API across various storage backends while maintaining high performance for AI/ML and data analytics workloads.

AIS integrates with Amazon S3 on three fronts:

  1. Backend storage – via the backend provider abstraction, AIStore can be utiized to access (and cache or reliably store in-cluster) a remote cloud bucket such as s3://my-bucket (aws:// is accepted as an alias). This provides seamless access to existing S3 data.

  2. Front‑end compatibility – every gateway speaks the S3 REST API. The default endpoint is http(s)://gw-host:port/s3, but you can enable the S3-API-via-Root feature flag to serve requests at the cluster root (http(s)://gw-host:port/). The same API works uniformly across all bucket types—native ais://, cloud‑backed s3://, gs://, and more.

  3. Presigned request offload – AIS can receive a presigned S3 URL, execute it, and store the resulting object in the cluster. This lets you leverage S3’s authentication while using AIS for storage.


Which interface should I use?

AIS exposes a pure S3 surface for seamless compatibility and a native API for advanced, cluster‑aware workloads. The table below helps decide which path fits your scenario:

Use the S3 compatibility API when you…Use the native AIS API / CLI when you…
Need drop‑in support for unmodified S3 tools & SDKs (aws, boto3, s3cmd, …)Want cluster‑wide batch jobs (ais etl, ais prefetch, ais copy, ais archive, …)
Rely on an existing S3‑centric workflow or third‑party appNeed fine‑grained control‑plane ops (ais cluster, ais bucket props set, node lifecycle)
Accept MD5‑based ETag semantics—even though MD5 is slower and not crypto‑secureValue AIS‑native features: virtual directories, adaptive rate‑limiting, WebSocket ETL, streaming cold‑GET, etc.
Accept that some S3 features (CORS, Website hosting, CloudFront) are not yet implementedCare about advanced list-objects options (to list shards), working with remote clusters, non-S3 buckets)
Are okay with slight performance overhead from the S3‑to‑AIS adaptation layer (MD5 hashing, XML marshaling/translation)Want full Prometheus visibility with AIS‑rich metrics & labels

Table of Contents


Environment assumption – Local Playground

The CLI examples below use localhost:8080, which is the default endpoint when running AIS in the Local Playground.

For other deployment modes (including Kubernetes Playground, Docker Compose, bare-metal cluster, or Kubernetes for production deployments) — replace the host:port with any AIS gateway endpoint.

See Deployment Options and the main project Features list for a broader overview.


Quick Start

Quick start with aws CLI

1AWS_EP=http://localhost:8080/s3
2aws --endpoint-url "$AWS_EP" s3 mb s3://demo
3aws --endpoint-url "$AWS_EP" s3 cp README.md s3://demo/
4aws --endpoint-url "$AWS_EP" s3 ls s3://demo
5# Expected output:
6# 2023-05-14 14:25 10493 s3://demo/README.md

Quick start with s3cmd

One‑liner (HTTP):

1s3cmd put README.md s3://demo \
2 --no-ssl \
3 --host=localhost:8080/s3 \
4 --host-bucket="localhost:8080/s3/%(bucket)"
5
6# Expected output:
7# upload: 'README.md' -> 's3://demo/README.md' [1 of 1]
8# 10493 of 10493 100% in 0s 4.20 MB/s done

Tip — use a cluster‑specific .s3cfg so you can drop the --host* flags. See the Example .s3cfg section below.


Configuring Clients

Finding the AIS endpoint

Choose any gateway’s host:port and append /s3, e.g. 10.10.0.1:51080/s3. All gateways accept reads and writes, so you can connect to any of them.

In fact, AIS gateways are completely equivalent, API-wise.


Checksum considerations

Amazon S3’s ETag is MD5 (or a multipart hash); AIS defaults to xxhash for better performance. To avoid client mismatch warnings, set MD5 per bucket:

1ais bucket props set ais://demo checksum.type=md5 # per bucket
2# OR cluster‑wide default
3ais config cluster checksum.type=md5

Setting the checksum type to MD5 ensures compatibility with S3 clients that validate checksums, though it comes with a minor performance cost compared to xxhash.


HTTPS vs HTTP

Enable TLS in ais.json (net.http.use_https=true) or pass --no-ssl/--insecure flags when using tools. By default, AIS uses HTTP, while many S3 clients expect HTTPS.


Using s3cmd with AIS

Interactive s3cmd --configure transcript

$ s3cmd --configure
Enter new values or accept defaults in brackets with Enter.
Access Key: FAKEKEY
Secret Key: FakeSecret
Default Region [US]:
Use HTTPS protocol [Yes]: n
HTTP Proxy server name:
New settings:
Access Key: FAKEKEY
Secret Key: FAKESECRET
Region: US
Use HTTPS protocol: False
HTTP Proxy server name:
Test access with supplied credentials? [Y/n] y
Please wait, attempting to list buckets...

During the test, enter your AIS endpoint when prompted:

Hostname: localhost:8080/s3
Bucket host: localhost:8080/s3/%(bucket)
Success. Your access key and secret key worked fine :-)
...
Save settings? [y/N] y
Configuration saved to ~/.s3cfg

Example .s3cfg

Edit your ~/.s3cfg file to include these lines (replace with your actual gateway endpoint):

1host_base = 10.10.0.1:51080/s3
2host_bucket = 10.10.0.1:51080/s3/%(bucket)
3access_key = FAKEKEY
4secret_key = FAKESECRET
5signature_v2 = False
6use_https = False

This configuration allows you to run s3cmd commands without having to specify the host parameters each time.

Multipart uploads with s3cmd

For large files, use multipart uploads to improve reliability and performance:

1s3cmd put ./large.bin s3://demo --multipart-chunk-size-mb=8

The optimal chunk size depends on your network conditions and file size, but 8-16MB chunks work well for most cases.

Authentication (JWT) tips

When AIStore Authentication is enabled, each request must include a JWT Bearer token in the Authorization header. However, s3cmd’s built-in AWS signer overwrites any --add-header values, so you need to patch the client directly.

Edit the sign() method of the S3Request class in s3cmd/S3/S3.py.

Add the following line to override the Authorization header:

1$ git diff
2diff --git a/S3/S3.py b/S3/S3.py
3index 26c516f..1a0d5e7 100644
4--- a/S3/S3.py
5+++ b/S3/S3.py
6@@ -199,6 +199,7 @@ class S3Request(object):
7 ## Sign the data.
8 self.headers = sign_request_v4(self.method_string, hostname, resource_uri, self.params,
9 bucket_region, self.headers, self.body)
10+ self.headers["Authorization"] = "Bearer <TOKEN>"
11
12 def get_triplet(self):
13 self.update_timestamp()

Replace <token> with your actual JWT token. This modification ensures the token is included in every request.


Supported Operations

PUT / GET / HEAD

Regular verbs work with aws, s3cmd, or the native ais CLI:

1# Using aws CLI
2aws --endpoint-url "$AWS_EP" s3api put-object --bucket demo --key obj --body file.txt
3aws --endpoint-url "$AWS_EP" s3api get-object --bucket demo --key obj output.txt
4aws --endpoint-url "$AWS_EP" s3api head-object --bucket demo --key obj
5# Output:
6# {
7# "AcceptRanges": "bytes",
8# "LastModified": "Wed, 14 May 2025 14:30:22 GMT",
9# "ContentLength": 1024,
10# "ETag": "\"a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6\"",
11# "ContentType": "text/plain"
12# }
13
14# Native AIS CLI equivalents
15ais put file.txt ais://demo/obj
16ais get ais://demo/obj output.txt
17ais object show ais://demo/obj

Range reads

S3 API supports byte range requests for partial object downloads:

1aws s3api get-object \
2 --range bytes=0-99 \
3 --bucket demo --key README.md \
4 part.txt \
5 --endpoint-url "$AWS_EP"
6# Output:
7# {
8# "AcceptRanges": "bytes",
9# "LastModified": "Wed, 14 May 2025 14:30:22 GMT",
10# "ContentRange": "bytes 0-99/10493",
11# "ContentLength": 100,
12# "ETag": "\"a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6\"",
13# "ContentType": "text/plain"
14# }

This would download only the first 100 bytes of the file.


Multipart uploads with aws CLI

1# 1 — initiate
2aws s3api create-multipart-upload --bucket demo --key big \
3 --endpoint-url "$AWS_EP"
4# Output:
5# {
6# "Bucket": "demo",
7# "Key": "big",
8# "UploadId": "xu3DvVzJK"
9# }
10
11# 2 — upload parts individually
12aws s3api upload-part --bucket demo --key big \
13 --part-number 1 --body part1.bin \
14 --upload-id "YOUR-UPLOAD-ID" \
15 --endpoint-url "$AWS_EP"
16# Repeat for each part, incrementing the part-number
17# Output:
18# {
19# "ETag": "\"a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6\""
20# }
21
22# 3 — complete (requires a JSON file listing all parts)
23aws s3api complete-multipart-upload --bucket demo --key big \
24 --upload-id "YOUR-UPLOAD-ID" --multipart-upload file://parts.json \
25 --endpoint-url "$AWS_EP"

Example parts.json:

1{
2 "Parts": [
3 {
4 "PartNumber": 1,
5 "ETag": "etag-from-upload-part-response"
6 },
7 {
8 "PartNumber": 2,
9 "ETag": "etag-from-upload-part-response"
10 }
11 ]
12}

Presigned S3 requests

Presigned URLs allow temporary access to objects without sharing credentials:

  1. Enable feature:

    1ais config cluster features S3-Presigned-Request
  2. Generate URL (typically done on the system with AWS credentials):

    1aws s3 presign s3://demo/README.md --endpoint-url https://s3.amazonaws.com
    2# Returns a URL with authentication parameters in the query string
  3. Replace the host with AIS_ENDPOINT/s3 and add header when using path style:

    1curl -H 'ais-s3-signed-request-style: path' '<PRESIGNED_URL>' -o README.md

This allows AIS to handle the authenticated S3 request on behalf of the client.


Use Native Bucket Inventory

The older S3-specific inventory integration has been removed in v4.4.

Use native bucket inventory (NBI) for fast, inventory-backed listing of large remote buckets, including (but not limited to) s3 buckets.

Note that S3-compatible clients may request NBI-backed listing via Ais-Bucket-Inventory: true, and optionally select a specific inventory via Ais-Inv-Name.


Deleting nonexistent object

AIStore does not emulate S3’s silent-success delete semantics. In AIS, deleting a missing object is reported as “not found” - with a single exception: when the bucket has an S3 backend. This does not violate HTTP idempotency: a repeated DELETE still has the same intended effect on server state, even though the response differs.

When the bucket does have an S3 backend and the object is missing, we return whatever the backend gives us - which, for S3, is 204 (no error).

Apart from this single exception, returning an error on delete of a nonexistent object is an intentional semantic choice for consistency with the rest of AIS.


Compatibility Matrix

S3 featureAISs3cmdaws CLI
Create/Destroy bucketmb/rbmb/rb
PUT / GET / HEAD objectput/get/infocp/head
Range readsget-object --range
Multipart upload
Copy objectS3 API onlypartial
Inventory listing
AuthenticationJWTmodified
Presigned URLs

Not yet supported: Regions, CORS, Website hosting, CloudFront; full ACL parity (AIS uses its own ACL model).


Boto3 Examples

Python applications can use Boto3 (the AWS SDK for Python) to connect to AIStore. Since AIStore implements S3 API compatibility, most standard Boto3 S3 operations work with minimal changes.

Prerequisites

For Boto3 to work with AIStore, you need to patch Boto3’s redirect handling:

1# Import the patch before using boto3
2from aistore.botocore_patch import botocore
3import boto3

This patch modifies Boto3’s HTTP client behavior to handle AIStore’s redirect-based load balancing. For details, see the Boto3 compatibility documentation.

Client Initialization

1client = boto3.client(
2 "s3",
3 region_name="us-east-1", # Any valid region will work
4 endpoint_url="http://localhost:8080/s3", # Your AIStore endpoint
5 aws_access_key_id="YOUR_ACCESS_KEY", # Can be dummy values when
6 aws_secret_access_key="YOUR_SECRET_KEY", # AIS auth is disabled
7)

Basic Bucket Operations

1# Create a bucket
2client.create_bucket(Bucket="my-bucket")
3
4# List all buckets
5response = client.list_buckets()
6bucket_names = [bucket["Name"] for bucket in response["Buckets"]]
7print(f"Existing buckets: {bucket_names}")
8
9# Delete a bucket
10client.delete_bucket(Bucket="my-bucket")

Object Operations

1# Upload an object
2client.put_object(
3 Bucket="my-bucket",
4 Key="sample.txt",
5 Body="Hello, AIStore!"
6)
7
8# Download an object
9response = client.get_object(Bucket="my-bucket", Key="sample.txt")
10content = response["Body"].read().decode("utf-8")
11print(f"Object content: {content}")
12
13# Delete an object
14client.delete_object(Bucket="my-bucket", Key="sample.txt")

Multipart Upload Example

For large files, you can use multipart uploads:

1import boto3
2from aistore.botocore_patch import botocore
3
4# Initialize the client
5client = boto3.client(
6 "s3",
7 region_name="us-east-1",
8 endpoint_url="http://localhost:8080/s3",
9 aws_access_key_id="FAKEKEY",
10 aws_secret_access_key="FAKESECRET",
11)
12
13# Create a bucket if it doesn't exist
14bucket_name = "multipart-demo"
15try:
16 client.head_bucket(Bucket=bucket_name)
17except:
18 client.create_bucket(Bucket=bucket_name)
19
20# Prepare data for multipart upload
21object_key = "large-file.txt"
22data = "x" * 10_000_000 # 10MB of data
23chunk_size = 5_000_000 # 5MB chunks
24chunks = [data[i:i+chunk_size] for i in range(0, len(data), chunk_size)]
25
26# Initiate multipart upload
27response = client.create_multipart_upload(Bucket=bucket_name, Key=object_key)
28upload_id = response["UploadId"]
29
30# Upload individual parts
31parts = []
32for i, chunk in enumerate(chunks):
33 part_num = i + 1 # Part numbers start at 1
34 response = client.upload_part(
35 Body=chunk,
36 Bucket=bucket_name,
37 Key=object_key,
38 PartNumber=part_num,
39 UploadId=upload_id,
40 )
41 parts.append({"PartNumber": part_num, "ETag": response["ETag"]})
42
43# Complete the multipart upload
44client.complete_multipart_upload(
45 Bucket=bucket_name,
46 Key=object_key,
47 UploadId=upload_id,
48 MultipartUpload={"Parts": parts}
49)
50
51# Verify upload was successful
52response = client.head_object(Bucket=bucket_name, Key=object_key)
53print(f"Successfully uploaded {response['ContentLength']} bytes")

FAQs & Troubleshooting

SymptomCause & Fix
MD5 sum mismatchSet bucket checksum to md5 with ais bucket props set BUCKET checksum.type=md5
SSL certificate problemSelf-signed cert ➜ use --no-ssl or --insecure
Presigned URL failsAdd header ais-s3-signed-request-style: path for path‑style URLs
Authorization header missing (s3cmd + AuthN)Patch S3.py to include JWT as shown in Authentication
Upload fails with timeoutTry with smaller multipart chunks (e.g., --multipart-chunk-size-mb=5)
Unable to list large S3 bucketUse bucket inventory to efficiently list contents
Boto3/TensorFlow integration issuesSee Boto3 compatibility patch for redirects

Further Reading