Native Bucket Inventory (NBI)
Native Bucket Inventory (NBI)
Native Bucket Inventory (NBI)
Native bucket inventory (NBI) lets AIS create and serve a local snapshot of a remote bucket namespace.
It is designed for very large, read-mostly buckets - for example, training and inference datasets - where repeatedly listing the remote backend is expensive, slow, and operationally noisy.
Native bucket inventory (NBI) was introduced in v4.3 and is stable as of 4.4.
Inventories are currently created manually: AIS does not yet provide built-in periodic refresh or automatic resynchronization.
This document occasionally references Go types and constants from the AIS codebase. For SDK usage, see the Go API and Python SDK documentation.
NBI replaces the older S3-specific inventory path.
Compared to that legacy approach, NBI is:
In practice, NBI turns repeated remote listing into a two-step workflow:
NBI applies to remote buckets only: cloud buckets (S3, GCS, Azure, OCI) and remote AIS buckets.
It is not currently supported for in-cluster ais:// buckets. The main reason is simple: NBI exists to avoid the latency and repeated backend traffic of remote listing. For local AIS buckets, namespace metadata is already local and served at native AIS speed, so an additional inventory layer would bring little practical benefit.
A point-in-time snapshot of an in-cluster bucket may still be a valid future use case, but that is not part of the current implementation.
At a high level, NBI has two phases:
The key design point is that inventories are stored as a flat, lexicographically ordered snapshot of object names plus selected properties. Listing behavior is derived from that stored snapshot.
When a user creates an inventory:
create-inventory job is running for the same bucket, and enforces the current one-inventory-per-bucket limit.--force is specified, any existing inventory for the bucket is removed first.create-inventory xaction: it lists the remote bucket, keeps only the object names that belong to it under the current cluster map, and writes them locally as inventory chunks.In the current implementation, all targets independently list the remote backend. A future optimization may designate a single target to list once and distribute the results.
When a user lists with --inventory, AIS serves the request from NBI.
The flow is:
LsNBI flag internally and switches to the inventory-backed path.The result is a normal AIS list response, but without walking the remote backend on every call.
S3-compatible clients may also request NBI-backed listing by sending the
Ais-Bucket-Inventory: trueheader;Ais-Inv-Namemay be used to select a specific inventory.
The same ais ls without --inventory produces the same namespace view but re-walks the remote backend each time.
Inventories are stored as chunked objects in a designated system bucket - a special AIS bucket whose name starts with a dot (.).
System buckets are AIS infrastructure and are not intended for direct user access. The dot-prefix naming convention (
.sys-*) is reserved. For details on naming rules, visibility, and future plans, see System Buckets.
The first and currently only system bucket is:
It is created automatically on the first inventory creation request. There is no need to create it manually.
Creating an inventory requires admin permissions. When AIStore is deployed with the Authentication Server (AuthN), the operation requires admin permissions. Clusters without AuthN do not enforce this check.
Use ais nbi create to create a native bucket inventory for a remote bucket:
inv-Tp4nR7kWx).--force removes any existing inventory for the bucket first and then proceeds with creation.--names-per-chunk is an advanced tuning knob that overrides the default chunk size.Internally, the control message is:
Inventory creation validates and normalizes the embedded LsoMsg.
Current behavior includes:
continuation_token must be emptystart_after is not supportedLsNoDirs is always set internallyThe default stored properties are:
namesizecachedIf properties are explicitly requested, AIS still adds cached.
Advanced tunables:
names_per_chunk: 2 * MaxPageSizeAIS (20K)names_per_chunk: 2names_per_chunk: 64 * MaxPageSizeAIS (640K)Inventory creation is a distributed xaction, so it appears in regular job monitoring:
Each target stores only the subset of names that belongs to it under the current cluster map. The total is the size of the full bucket namespace captured by the inventory.
Use ais show nbi to inspect existing inventories.
As with other
ais showsubcommands,ais show nbiis an alias forais nbi show.
Default output shows a compact view, including object count:
Verbose output includes additional metadata:
For example:
Once an inventory has been created, use it by adding --inventory to a regular ais ls command:
--inventory now means NBI-backed listing.
This is still a normal list-objects request. The difference is that AIS sets the LsNBI flag internally and serves the request from pre-stored inventory chunks rather than walking the remote backend.
In the Python SDK, the corresponding flag is ListObjectFlag.NBI.
When listing via NBI, page size is best-effort and approximate.
Because NBI listing is distributed across targets, each target returns an approximate share of the requested page size, subject to local chunking and minimum bounds. This behavior is intentional: it reduces roundtrips and improves list latency.
When inventory-backed listing is requested, AIS serves the request from pre-stored inventory chunks on each target rather than walking the remote backend.
The flow is:
A list-objects-page request with --inventory arrives at the proxy.
The proxy broadcasts the request to all targets.
Each target seeks to the first inventory name greater than the provided continuation token within its local, lexicographically ordered chunks.
Each target returns its next locally sorted batch, together with its own continuation token if more names remain.
The proxy merges the returned names into a single sorted stream and applies min-token logic:
Because continuation tokens are literal object names rather than opaque cursors, the process is deterministic, stateless on the proxy, and relatively easy to reason about.
Repeated listing therefore avoids re-walking the remote backend.
Inventory creation always stores a flat snapshot. Non-recursive behavior, when requested during listing, is derived later as a view over that flat snapshot rather than stored as a separate hierarchical inventory format.
That distinction is important:
AIS stores metadata alongside each inventory:
This metadata is surfaced by ais show nbi, especially in verbose mode.
When an inventory contains relatively few object names compared with the number of targets, some targets may end up with no local inventory entries.
That is expected. In such cases, a target still stores a valid local inventory object with zero size, zero chunks, and valid NBI metadata. This allows AIS to distinguish between:
That distinction matters for correct EOF behavior during distributed listing.
NBI is intentionally narrower than general remote listing.
A useful mental model is:
Those two phases do not accept the same controls.
Inventory creation is not an ad hoc ls request. It builds a reusable artifact that must be:
That is why several options that make sense for a live list request do not make sense during inventory generation.
In short, NBI is optimized for fast, repeated listing of a previously captured bucket snapshot.
A latency-vs-scale benchmark comparing NBI listing, regular AIS remote listing, and direct S3 access (boto3) from 1K to 80K objects is available at python/tests/perf/nbi/.