Projects

This is a high-level conceptual overview of Projects in NVIDIA AI Workbench. For a guide to features, see the corresponding How-To topic(s) (Desktop and CLI) and for detailed reference information, see the Deep Dive topics.

  • A Workbench Project is a Git repository that contains a set of configuration files that belong in specific places in the repository.

  • These configuration files, and the metadata in them, follow a specification (Deep Dive) that is readily interpretable by humans and applications.

  • AI Workbench reads these files to manage the repository and provide a containerized development environment.

  • When you make changes in the repository, AI Workbench detects them and updates the configuration files (if relevant).

This simple structure enables the automation and streamlining required for reproducibility and portability across systems.

Projects are structured to organize the development environment into a few high-level categories:

  • Project Specification: Metadata and details on the repository structure, base image and runtime configuration

  • Content: The code, data and model files used in an application or development workflow

  • Environment: Packages, scripts, and environment variables used to build or configure the container at runtime

overview-concepts-projects-git-repo.png

  • A small set of files in the Project follows a specification to provide a manifest of contents and a set of instructions.

  • AI Workbench uses the specification and reads the files to drive automation and expose capabilities in the UI.

  • When you make changes to the Project, AI Workbench detects them and edits the corresponding files to reflect relevant changes.

  • Changes are versioned. AI Workbench reads the changed files to drive automation and expose capabilities in the UI.

The Project Specification supports metadata describing things such as:

  • Name and description

  • Manifest of the base image

  • Commands to run web applications installed in the environment

The main file for the specification is the spec.yaml file in the .project sub-folder. You can see an example spec.yaml file here.

This file specifies a few categories of metadata. Some important ones are:

  • Layout: Sub-folders and storage backend, e.g. Git for the code folder, and Git LFS for data folders.

  • Base Image: Reference to the base image and information on what is in it.

  • Runtime: Container-specific information like bind mounts, environment variables and number of GPUs.

  • Applications: Execution commands to configure and run applications that are installed in the container.

Warning

The .project folder should never be deleted. Furthermore, you can edit the entries in spec.yaml but if done incorrectly it will break things.

Layout

AI Workbench requires some files to be in fixed locations. For example, spec.yaml must be in the .project folder, and environment configuration files like requirements.txt or variables.env must be in the top-level folder.

However, most other things can be arbitrarily organized in a Project. To keep the versioning of various folders transparent, the Project Specification has a layout defining how different folders should be versioned.

AI Workbench currently supports three storage types:

  • git: For versioning code and configuration files.

  • gitlfs: For versioning relatively large files or folders with many files.

  • gitignore: For designating a folder that shouldn’t be versioned.

For example, when you create a new Project in AI Workbench, a Git repository is created on disk at the location $HOME/nvidia-workbench/<project_name> with a default layout of code, data and models. The code folder is initiated with git versioning, while the data and model folders are initiated with gitlfs.

You can add or delete folders to the repository, but to apply the desired versioning you must currently modify the entries in the layout section of spec.yaml. See here for further details.

Base Image

Projects use a base image as the foundation of the development environment. The related metadata is in the environment section of spec.yaml.

Metadata for base images includes things like:

  • The registry the image is held in

  • CUDA version if it is installed

  • Installed applications and relevant commands

In general, AI Workbench expects a base image to have metadata directly attached to it in the form of container image labels that follow a separate specification. However, you can use an un-labeled base image in a Project, though this requires manually editing various details in spec.yaml. See the Bring Your Own Container page for some guidelines and details on the procedure.

Applications

Development environments typically rely on or include applications, with JupyterLab and Tensorboard being common examples. Workbench Projects are designed to provide containerized web applications like these.

Installed applications must be properly configured for AI Workbench to manage and provide them. The relevant metadata is in the execution section of spec.yaml.

Some example fields are:

  • Application name, e.g. JupyterLab

  • Start and stop commands

  • A health check command

  • Port that it is available on

Runtime

In addition to the environment, Projects track things that dictate the container’s runtime configuration. Some examples are:

  • The number of GPUs requested to be passed into the container

  • Secrets such as API keys that are to be used

  • File mounts to be added to the container

These runtime arguments are in the relevant sections of the execution section in spec.yaml. They are editable through the Workbench UI. For example, if you change the number of GPUs or configure mounts from the UI (see How-To here), the corresponding fields in the spec.yaml file will be updated.

GPU Allocation on Windows

While Ubuntu will isolate a GPU that has been mounted into a container from access by other containers, WSL2 currently will not. This means it is possible for two different Project containers to access the same GPU, whereas on Ubuntu it is not possible. All GPUs on a multi-GPU system will be available to every project on Windows.

  • Keep clear divisions between different parts of an application or development workflow

  • Provide a default structure for new Projects but make it easy to change

  • Don’t mix configuration or environment files with the content of code, data and models.

This is the most straightforward component of Projects. On creation, the default structure for a new Project is three folders for code, data, and models. these are all in the project directory within the container, which is mounted directly into the local file system at $HOME/nvidia-workbench/<project_name>. The folder structure internal to the project folder can be easily changed.

The versioning of the content in the project folder is a combination of the details in the layout section of spec.yaml and the information in the Git configuration files like .gitignore. For example, the default folder data is versioned by gitlfs but it contains a sub-folder scratch that is included in the .gitignore file.

  • Define the environment with a consistent set of easy-to-edit files, e.g. requirements.txt and postBuild.sh.

  • AI Workbench combines that information with metadata from the Project Specification to render a Container file and build the container.

By convention, AI Workbench uses the following files to define and version the environment:

  • requirements.txt: A file to record Python dependencies

  • apt.txt: A file to record Debian package dependencies

  • variables.env: A file to record environment variables to be used in the container

  • preBuild.sh: A bash script to modify the environment in the base image before any further packages are installed

  • postBuild.sh: A bash script to modify the environment after packages are installed

With these conventions, changing the environment is simple. For example, you can update Python packages by directly editing the requirements.txt file and using the AI Workbench feature to rebuild and restart the container.

This procedure may imply some changes for Jupyter users accustomed to working with services like Google’s Colab, where the typical procedure is to do pip install directly in the notebook itself each time it is run.

Previous Integrations
Next Applications
© Copyright 2023-2024, NVIDIA. Last updated on Jan 21, 2024.