AIStore can scale from a single Linux machine to a rack-scale cluster or a managed Kubernetes installation in the cloud.
Before you pick a path, answer two quick questions:
For datasets below (ballpark) 50TB, a single host may suffice and should be considered a viable option.
Expecting growth or already past that mark? Plan for multi-node or cloud.
Note that you can always start small: a single-host deployment, a 3-node cluster in the Cloud or on-premises, etc. AIStore supports many options to inter-connect existing clusters - the capability called unified namespace - or migrate existing datasets (on-demand or via supported storage services). For introductions and further pointers, please refer to the AIStore Overview.
AIStore runs on commodity Linux machines with no special requirements. It is expected that within a given cluster, all AIS targets are identical, hardware-wise.
gcc, sysstat, attr, util-linuxCROSS_COMPILE)Mac is also supported albeit in a limited (development only) way.
Depending on your Linux distribution, you may or may not have GCC, sysstat, and/or attr packages. These packages must be installed.
Speaking of distributions, our current default recommendation (based on our experience) is Ubuntu Server 24.04 LTS or Ubuntu Server 22.04 LTS. However, AIStore has no special dependencies, so virtually any distribution will work.
For the local filesystem, we currently recommend xfs. But again, this default recommendation should not be interpreted as a limitation: other fine choices include zfs, ext4, f2fs and more.
Since AIS itself provides n-way mirroring and erasure coding, hardware RAID is not recommended. But it can be used and will work.
The capability called extended attributes, or xattrs, is a long-time POSIX legacy supported by all mainstream filesystems without exceptions. Unfortunately, xattrs may not always be enabled in Linux kernel configurations - which can be easily verified by running the setfattr command.
If disabled, please make sure to enable xattrs in your Linux kernel configuration. To quickly check:
For developers, there’s also macOS aka Darwin option. Certain capabilities related to querying the state and status of local hardware resources (memory, CPU, disks) may be missing. In fact, it is easy to review specifics with a quick check on the sources:
Benchmarking and stress-testing is done on Linux only - another reason to consider Linux (and only Linux) for production deployments.
This section provides the fastest way to get an AIStore cluster running on your local machine. For more detailed steps, see the Local Playground section.
Docker shortcut: If you already have Docker or Podman, you can skip everything below and get a running cluster in seconds with
make up- see Minimal Compose Deployment.
Follow the official Go installation instructions for your platform (use the Linux tab for AIStore deployments).
Set up your GOPATH environment variable when done.
At this point, it is maybe a good idea to also run (and review):
That’s it! You now have a running AIStore deployment you can experiment with. Continue reading for more detailed setup options and advanced configurations.
If you’re looking for speedy evaluation, want to experiment with supported features, get a feel of initial usage, or development - for any and all of these reasons running AIS from its GitHub source might be a good option.
Hence, we introduced (and keep maintaining) Local Playground - one of the several supported deployment options.
Some of the most popular deployment options are also summarized in this table. The list includes Local Playground, and its complementary guide here.
Local Playground is for development purposes and is not meant to provide optimal performance.
To run AIStore from source, one would typically need to have Go: compiler, linker, tools, and required packages. However:
CROSS_COMPILEoption (see below) can be used to build AIStore without having (to install) Go and its toolchain (requires Docker).
To install Go(lang) on Linux:
go1.<x.y>.linux-amd64.tar.gz from Go downloadsNext, if not done yet, export the GOPATH environment variable.
Here’s an additional 5-minute introduction that talks more in-depth about setting up the Go environment variables.
Once done, we can run AIS as follows (steps 1 through 4 below):
We want to clone the repository into the following path so we can access some of the associated binaries through the environment variables we set up earlier.
To preload dependencies, optionally, run go mod tidy (or same, make mod-tidy):
ais cliNOTE: For a local deployment, we do not need production filesystem paths. For more information, read about configuration basics. If you need a physical disk or virtual block device, you must add them to the fspaths config. See running local playground with emulated disks for more information.
Many useful commands are provided via top Makefile (for details, see Make section below).
In particular, we can use make to deploy our very first 3 nodes (and 3 gateways) cluster:
This make command executes several make targets (not to confuse with AIS targets) - in particular, it:
make kill) AIStore that may have been previously deployed in the local playground;make clean);ais) and aisloader tools (that we are using all the time);and, finally:
The cluster then can be observed as follows:
clean_deploy.shAlternatively or in addition (to make deploy), one can also use:
With no arguments, this script also builds AIStore binaries (such as aisnode and ais CLI). You can pass in arguments to configure the same options that the make deploy command above uses.
aisloader toolWe can now run the aisloader tool to benchmark our new cluster.
Here’s a quick walkthrough (with more references included below).
deploy/dev/local/aisnode_config.sh as follows:or, same:
See also:
for developers; cluster and node configuration; supported deployments: summary table and links.
AIStore (product and solution) is fully based on HTTP(S) utilizing the protocol both externally (to support both frontend interfaces and communications with remote backends) and internally, for intra-cluster streaming.
Connectivity-wise, what that means is that your local deployment at localhost:8080 can as easily run at any arbitrary HTTP(S) address.
Here’s the quick change you make to deploy Local Playground at (e.g.) 10.0.0.207, whereby the main gateway’s listening port would still remain 8080 default:
AIS comes with its own build system that we use to build both standalone binaries and container images for a variety of deployment options.
The very first make command you may want to execute could as well be:
This shows all subcommands, environment variables, and numerous usage examples, including:
For shutdown options, see
ais cluster shutdown --help
aisnode executable with GCP and AWS backendsUse
TAGSenvironment to specify any/all supported build tags that also include conditionally linked remote backends (see next).
Use
AIS_BACKEND_PROVIDERSenvironment to select remote backends that include 3 (three) Cloud providers andht://- namely: (aws,gcp,azure,ht)
For the complete list of supported build tags, please see conditional linkage.
aisnode with debug infomake kill - terminate local AIStore.make restart - shut it down and immediately restart using the existing configuration.make help - show make options and usage examples.For even more development options and tools, please refer to:
The variables include AIS_ENDPOINT, AIS_AUTHN_TOKEN_FILE, and more.
Almost in all cases, there’s an “AIS_” prefix (hint: git grep AIS_).
And in all cases with no exception, the variable takes precedence over the corresponding configuration, if exists. For instance:
overrides the default endpoint as per ais config cli or (same) ais config cli --json
Endpoints are equally provided by each and every running AIS gateway (aka “proxy”) and each endpoint can be (equally) used to access the cluster. To find out what’s currently configured, run (e.g.):
where NODE is, effectively, any clustered proxy (that’ll show up if you type ais config node and press <TAB-TAB>).
Other variables, such as AIS_PRIMARY_EP and AIS_USE_HTTPS can prove to be useful at deployment time.
For developers, CLI ais config cluster log.modules ec xs (for instance) would allow to selectively raise and/or reduce logging verbosity on a per module bases - modules EC (erasure coding) and xactions (batch jobs) in this particular case.
To list all log modules, type
ais config cluster log(orais config node NODE inherited log) and press<TAB-TAB>.
Finally, there’s also HTTPS configuration including X.509 certificates and options. For details, please refer to:
AIStore deploys anywhere anytime supporting multiple deployment options summarized and further referenced here.
All containerized deployments have their own separate Makefiles. With the exception of local playground, each specific build-able development (dev/) and production (prod/) option under the deploy folder contains a pair: {Dockerfile, Makefile}.
This separation is typically small in size and easily readable and maintainable.
Also supported is the option not to have the required Go installed and configured. To still be able to build AIS binaries without Go on your machine, make sure that you have docker and simply uncomment CROSS_COMPILE line in the top Makefile.
In the software, type of the deployment is also present in some minimal way. In particular, to overcome certain limitations of Local Playground (single disk shared by multiple targets, etc.) - we need to know the type. Which can be:
The most recently updated enumeration can be found in the source.
The type shows up in the
show clusteroutput - see example above.
For production deployments, we developed the AIS/K8s Operator. This dedicated GitHub repository contains:
This option has the unmatched convenience of requiring an absolute minimum time and resources - please see this README for details.
For development, health-checking a new deployment, or for any other (functional and performance testing) related reason you can run any/all of the included tests.
For example:
The go test above will create an AIS bucket, configure it as a two-way mirror, generate thousands of random objects, read them all several times, and then destroy the replicas and eventually the bucket as well.
Alternatively, if you happen to have Amazon and/or Google Cloud account, make sure to specify the corresponding (S3 or GCS) bucket name when running go test commands.
For example, the following will download objects from your (presumably) S3 bucket and distribute them across AIStore:
To run all tests in the category short tests:
The command randomly shuffles existing short tests and then, depending on your platform, usually takes anywhere between 15 and 30 minutes. To terminate, press Ctrl-C at any time.
Ctrl-C or any other (kind of) abnormal termination of a running test may have a side effect of leaving some test data in the test bucket.
AIStore has been around for a while; the repository has accumulated quite a bit of information that can be immediately located as follows:
search command, e.g.: ais search copygit grep, e.g.: git grep -n out-of-band -- "*.md"Any of the above will work. In particular, for any keyword or text of any kind, you can easily look up examples and descriptions via a simple find or git grep command. For instance:
Alternatively, use a combination of find, xargs, and/or grep to search through existing texts of any kind, including source comments. For example:
In addition, there’s the user-friendly CLI. For example, to search for commands related to copy, you could:
For the CLI, remember to use the --help option, which will universally show specific supported options and usage examples. For example:
To quickly set up AIStore (with AWS and GCP backends) in a Google Colab notebook, use our ready-to-use notebook:
Important Notes:
For our development and testing, we use a local Kubernetes setup (e.g. Minikube, KinD), further documented here, to run the Kubernetes cluster on a single development machine. There’s a distinct advantage that AIStore extensions that require Kubernetes - such as Extract-Transform-Load, for example - can be developed rather efficiently.
So far, all examples in this getting-started document run a bunch of local web servers that listen for plain HTTP and collaborate to provide clustered storage.
There’s a separate document that tackles HTTPS topics that, in part, include:
As noted, the project utilizes GNU make to build and run locally and remotely.
Locally - with or without K8s.
As the very first step, run make help for help on:
aisnode) deployable as both storage target or an ais gateway (most of the time referred to as “proxy”);For the complete list of AIS executables (with build and install instructions), see Tools and Utilities.
In particular, the make provides a growing number of developer-friendly commands to:
Of course, local build is intended for development only. For production, there is a separate dedicated repository noted below.
In summary:
AIStore build supports conditional linkage of the supported remote backends: S3, GCS, Azure, OCI.
For the complete list of supported build tags, please see conditional linkage.
For the most recently updated list, please see 3rd party Backend providers.
To access remote data (and store it in-cluster), AIStore utilizes the respective provider’s SDK.
For Amazon S3, that would be
aws-sdk-go-v2, for Azure -azure-storage-blob-go, and so on. Each SDK can be conditionally linked intoaisnodeexecutable - the decision to link or not to link is made prior to deployment.
But not only supported remote backends are conditionally linked. Overall, the following list of commented examples presents almost all supported build tags (with maybe one minor exception):
In addition, to build AuthN, CLI, and/or aisloader, run:
make authnmake climake aisloaderrespectively. With each of these makes, you can also use MODE=debug - debug mode is universally supported.
The following applies to all containerized deployments:
To that end, each AIS node at startup loads and parses cgroup settings for the container and, if the number of CPUs is restricted, adjusts the number of allocated system threads for its goroutines.
This adjustment is accomplished via the Go runtime GOMAXPROCS variable. For in-depth information on CPU bandwidth control and scheduling in a multi-container environment, please refer to the CFS Bandwidth Control document.
Further, given the container’s cgroup/memory limitation, each AIS node adjusts the amount of memory available for itself.
Memory limits may affect dSort performance forcing it to “spill” the content associated with in-progress resharding into local drives. The same is true for erasure-coding which also requires memory to rebuild objects from slices, etc.
For technical details on AIS memory management, please see this readme.
Some will say that using AIS CLI with aistore is an order of magnitude more convenient than curl. Or two orders.
Must be a matter of taste, though, and so here are a few curl examples.
As always,
http://localhost:8080address (below) simply indicates Local Playground and must be understood as a placeholder for an arbitrary aistore endpoint (AIS_ENDPOINT).