Archives: read, write, and list
Archives: read, write, and list
Archives: read, write, and list
AIStore natively supports four archive/serialization formats across all APIs, batch jobs, and functional extensions: TAR, TGZ (TAR.GZ), TAR.LZ4, and ZIP.
Archives address the small-file problem - performance degradation from random access to very large datasets containing many small files.
To qualify “very large” and “small-file” - the range of the numbers we usually see in the field include datasets containing 10+ million files with sizes ranging from 1K to 100K.
AIStore’s implementation allows unmodified clients and applications to work efficiently with archived datasets.
Key benefits:
In addition to performance, sharded datasets provide a natural form of dataset backup: each shard is a self-contained, immutable representation of its original files, making it easy to replicate, snapshot, or version datasets without additional tooling.
.tar) - Unix archive format (since 1979) supporting USTAR, PAX, and GNU TAR variants.tgz, .tar.gz) - TAR with gzip compression.tar.lz4) - TAR with lz4 compression.zip) - PKWARE ZIP format (since 1989)AIStore can natively read, write, append¹, and list archives. Operations include:
Default format: TAR is the system default when serialization format is unspecified.
¹ APPEND is supported for TAR format only. Other formats (ZIP, TGZ, TAR.LZ4) were not designed for true append operations - only extract-all-recreate emulation, which significantly impacts performance.