For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Blog
DocsAPI Reference
DocsAPI Reference
    • AIStore
    • Documentation
  • Core Documentation
    • In-depth Overview
    • Terminology and core abstractions
    • Getting Started
    • Networking model
    • Buckets: design, operations, namespaces, and system buckets
    • Observability overview
    • CLI overview
    • Production deployment
    • Technical Blog
  • APIs, SDKs, and Compatibility
    • Go API
    • Python SDK
    • PyPI package
    • Python SDK reference guide
    • PyTorch integration
    • TensorFlow integration
    • HTTP API reference
    • curl examples
    • Easy URL
    • S3 compatibility
    • s3cmd quick start
    • Presigned S3 requests
    • Boto3 support
  • Command-Line Interface
    • CLI overview
    • ais help
    • CLI reference guide
    • Bucket operations
    • Cluster and remote-cluster management
    • Storage and mountpath management
    • Monitoring and ais show
    • Downloads
    • Jobs
    • Authentication and access control
    • Configuration via CLI
    • ETL CLI
    • Distributed shuffle CLI
    • ML / get-batch CLI
    • GCP credentials
    • TLS certificate management
  • Storage and Data Management
    • Storage services
    • Buckets: design, operations, namespaces, and system buckets
    • Native Bucket Inventory (NBI)
    • Backend providers
    • On-disk layout
    • Virtual directories
    • System files
    • Evicting remote buckets and cached data
  • Cluster Operations
    • Node lifecycle: maintenance, shutdown, decommission
    • Global rebalance
    • Resilver
    • AIS in Containerized Environments
    • Highly available control plane
    • Information Center (IC)
    • Out-of-band updates
    • Troubleshooting
  • Configuration and Security
    • Configuration
    • Environment variables
    • Feature flags
    • AuthN and access control
    • Authentication validation
    • HTTPS and certificates
    • Switching a cluster to HTTPS
  • ETL and Advanced Workflows
    • ETL overview
    • ETL CLI docs
    • ETL Python SDK examples
    • Custom transformers
    • ETL Python webserver SDK
    • ETL Go webserver package
    • Archives: read, write, and list
    • Distributed shuffle (dsort)
    • Initial sharding utility (ishard)
    • Downloader
    • Blob Downloader
    • Batch object retrieval (get-batch)
    • Batch operations
    • Tools and utilities
    • Extended actions (xactions)
  • Observability, Monitoring, and Performance
    • Observability overview
    • Monitoring with CLI
    • Logs
    • Prometheus integration
    • Metrics reference
    • Grafana dashboards
    • Kubernetes monitoring
    • Distributed tracing
    • Monitoring get-batch
    • AIS load generator (aisloader)
    • Benchmarking AIStore
    • Performance tuning and testing
    • Performance monitoring via CLI
    • Rate limiting
    • Checksumming
    • Filesystem Health Checker (FSHC)
    • Traffic patterns
  • Networking
    • Networking: multi-homing, network separation, IPv6
    • HTTPS configuration
    • Switching to HTTPS
    • Idle connections
    • MessagePack protocol
  • Deployment
    • AIStore on Kubernetes
    • Kubernetes Operator
    • Ansible playbooks
    • Helm charts
    • Deployment monitoring
    • Docker
  • Developer Resources
    • Development guide
    • aisnode command line
    • Build tags
  • Object and Bucket Naming
    • Unicode and special symbols in object and bucket names
    • Extremely long object names
Blog
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoAIStore
On this page
  • Motivation
  • Supported Formats
  • Operations
  • See also
ETL and Advanced Workflows

Archives: read, write, and list

||View as Markdown|
Previous

ETL overview

Next

Distributed shuffle (dsort)

AIStore natively supports four archive/serialization formats across all APIs, batch jobs, and functional extensions: TAR, TGZ (TAR.GZ), TAR.LZ4, and ZIP.

Motivation

Archives address the small-file problem - performance degradation from random access to very large datasets containing many small files.

To qualify “very large” and “small-file” - the range of the numbers we usually see in the field include datasets containing 10+ million files with sizes ranging from 1K to 100K.

AIStore’s implementation allows unmodified clients and applications to work efficiently with archived datasets.

Key benefits:

  • Improved I/O performance via reduced metadata lookups and network roundtrips
  • Seamless integration with existing, unmodified workflows
  • Implicit dataset backup: each archive acts as a self-contained, immutable copy of the original files

In addition to performance, sharded datasets provide a natural form of dataset backup: each shard is a self-contained, immutable representation of its original files, making it easy to replicate, snapshot, or version datasets without additional tooling.

Supported Formats

  • TAR (.tar) - Unix archive format (since 1979) supporting USTAR, PAX, and GNU TAR variants
  • TGZ (.tgz, .tar.gz) - TAR with gzip compression
  • TAR.LZ4 (.tar.lz4) - TAR with lz4 compression
  • ZIP (.zip) - PKWARE ZIP format (since 1989)

Operations

AIStore can natively read, write, append¹, and list archives. Operations include:

  • Regular GET and PUT requests:
    • Go API - see “ArchPath” parameter
    • Python SDK - ditto
    • Python SDK/Archive - see archive-related config
  • get-batch - efficient multi-object/multi-file retrieval
  • list-objects - “opens” archives and includes contained pathnames in results
  • dsort - distributed archive creation and transformation
  • aisloader - benchmarking with archive workloads
  • Concurrent multi-object transactions for bulk archive generation from selected objects

Default format: TAR is the system default when serialization format is unspecified.


¹ APPEND is supported for TAR format only. Other formats (ZIP, TGZ, TAR.LZ4) were not designed for true append operations - only extract-all-recreate emulation, which significantly impacts performance.

See also

  • CLI: archive
  • aisloader: archive
  • Initial Sharding Tool (ishard)
  • Distributed Shuffle
  • Get-Batch