On-disk layout

View as Markdown

AIStore’s on-disk layout supports multiple remote backends, configurable namespaces, and AIS-to-AIS caching with full data recovery capabilities.

Here’s a simplified drawing depicting two providers, AIS and AWS, and two buckets, ABC and XYZ, respectively. In the picture, mpath is a single mountpath - a single disk or a volume formatted with a local filesystem of choice, and a local directory (mpath/):

on-disk hierarchy

Further, each bucket would have a unified structure with several system directories (e.g., %ec that stores erasure coded content) and, of course, user data under %ob (“object”) locations.

Needless to say, the same exact structure reproduces itself across all AIS storage nodes, and all data drives of each clustered node.

With namespaces, the picture becomes only slightly more complicated. The following shows two AIS buckets, DEF and GHJ, under their respective user-defined namespaces called #namespace-local and #namespace-remote. Unlike a local namespace of this cluster, the remote one would have to be prefixed with UUID - to uniquely identify another AIStore cluster hosting GHJ (in this example) and from where this bucket’s content will be replicated or cached, on-demand or via Prefetch API and similar.

on-disk hierarchy with namespaces

Example

Say, we have an gs://llm-data bucket, and an object “images/dog.jpeg” in it. Given two different bucket’s namespaces, the respective FQNs inside AIStore may look like:

/vdi/@gcp/#prod/llm-data/%ob/images/dog.jpeg
and
/vdh/@gcp/#dev/llm-data/%ob/images/dog.jpeg

where:

ComponentExample 1Example 2Meaning
Mountpath/vdi/vdhPhysical (mounted) device
ProvidergcpgcpBackend provider
Namespace#prod#devAccount, profile, or user-defined alias
Bucketllm-datallm-dataBucket name
Content type%ob%obContent kind: objects, EC slices, chunks, manifests
Objectimages/dog.jpegimages/dog.jpegObject name (preserves virtual directory structure)

Content types

Within each bucket directory, AIS organizes content by type:

MarkerConstantContent
%obfs.ObjCTObject data
%wkfs.WorkCTWork/temporary files
%ecfs.ECSliceCTErasure-coded slices
%mtfs.ECMetaCTErasure-coded metadata
%chfs.ChunkCTChunked object data
%utfs.ChunkMetaCTChunked object metadata
%dsfs.DsortFileCTDistributed sort files
%dwfs.DsortWorkCTDistributed sort work

See the source for the most updated enumeration.

References

For the purposes of full disclosure and/or in-depth review, following are initial references into AIS sources that also handle on-disk representation of object metadata:

and AIS control structures:

System Files

In addition to user data, AIStore stores, maintains, and utilizes itself a relatively small number of system files that serve a variety of different purposes. Full description of the AIStore persistence would not be complete without listing those files (and their respective purposes) - for details, please refer to: