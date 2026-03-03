CloudAI benchmark framework aims to develop an industry standard benchmark focused on grading Data Center (DC) scale AI systems in the cloud. The primary motivation is to provide automated benchmarking on various systems.

Copy Copied! git clone git@github.com:NVIDIA/cloudai.git cd cloudai uv run cloudai --help

Note For instructions for setting up access for enroot , see Workloads Requirements Installation.

See the required Python version in the .python-version file and make sure you have it installed (for installation, see Install Custom Python Version). Follow these steps:

Copy Copied! git clone git@github.com:NVIDIA/cloudai.git cd cloudai python -m venv venv source venv/bin/activate pip install -e .

If your system Python version is not supported, you can install a custom version using the uv tool:

Copy Copied! curl -LsSf https://astral.sh/uv/install.sh | sh source $HOME/.local/bin/env uv venv --seed # picks the python version from .python-version # --seed installs pip and setuptools source .venv/bin/activate

CloudAI operates on three main schemas:

System Schema : Describes the system, including the scheduler type, node list, and global environment variables.

Test Schema : An instance of a test template with custom arguments and environment variables.

Test Scenario Schema: A set of tests with dependencies and additional descriptions about the test scenario.

These schemas enable CloudAI to be flexible and compatible with different systems and configurations.

Global options for cloudai command:

--log-file <path> : specify a file to log output; by default debug.log in the current directory is used. Contains log entries of level DEBUG and higher.

--log-level <level> : specify logging level for standard output; default is INFO .

This mode runs workloads. It automatically installs prerequisites if they are not met.

Copy Copied! cloudai run\ --system-config conf/common/system/example_slurm_cluster.toml\ --tests-dir conf/common/test\ --test-scenario conf/common/test_scenario/sleep.toml

This mode simulates running experiments without actually executing them. This is useful for verifying configurations and testing experiment setups.

Copy Copied! cloudai dry-run\ --system-config conf/common/system/example_slurm_cluster.toml\ --tests-dir conf/common/test\ --test-scenario conf/common/test_scenario/sleep.toml

This mode generates reports under the scenario directory. It automatically runs as part of the run mode after experiments are completed.

Copy Copied! cloudai generate-report\ --system-config conf/common/system/example_slurm_cluster.toml\ --tests-dir conf/common/test\ --test-scenario conf/common/test_scenario/sleep.toml\ --result-dir /path/to/result_directory

This mode installs test prerequisites. For more details, refer to the Workloads Requirements Installation guide. It automatically runs as part of the run mode if prerequisites are not met.

Copy Copied! cloudai install\ --system-config conf/common/system/example_slurm_cluster.toml\ --tests-dir conf/common/test\ --test-scenario conf/common/test_scenario/sleep.toml

The opposite to the install mode, this mode removes installed test prerequisites.

Copy Copied! cloudai uninstall\ --system-config conf/common/system/example_slurm_cluster.toml\ --tests-dir conf/common/test\ --test-scenario conf/common/test_scenario/sleep.toml

This mode lists internal components available within CloudAI.

Copy Copied! cloudai list <component_type>

This mode verifies the correctness of system, test, and test scenario configuration files.