Workbench Project Specification

The Workbench Project Specification is a crucial component of AI Workbench, designed to provide a structured and organized approach to managing AI development projects or AI applications. It serves a dual purpose, acting as both a medium of communication to users and a format that can be interpreted and manipulated by software.

The Workbench Project Specification can be thought of as a dynamic specification that requires a reference implementation to be fully instantiated. This is because part of the specification must be interpreted based on the environment in which it is executing. By creating a clear and consistent framework, the Workbench Project Specification helps streamline working with complex environments and applications across diverse compute resources.

First and foremost, a Workbench Project is a git repository. To make the git repository an Workbench Project, a metadata file must exist at .project/spec.yaml which contains important metadata about the project, what it contains, and how it is configured.

The repository must also contain a set of files to support git (.gitattributes, .gitignore). These files are essential for the proper functioning of the Workbench Project within the AI Workbench environment.

Additionally, the repository may contain files to represent the compute environment (requirements.txt, apt.txt, preBuild.bash, postBuild.bash, variables.env). These files help configure the environment in which the Workbench Project will run.

The repository may also have a README.md markdown file. This file provides a brief introduction to the Workbench Project and can include important information such as the project’s purpose, usage, and any relevant details.

These files together represent the Workbench Project. In addition, AI Workbench’s implementation also generates additional files to represent and maintain the dynamic state of the project.

The .project/configpacks file is versioned in the git repository and used by AI Workbench to track 1-time automation that has run on the Workbench Project repository. All other files generated by AI Workbench are in the Workbench Project runtime file location.

In addition to the project specification files, AI Workbench creates several files to track progress and output during the build and configuration stages. These files are essential for managing a project’s runtime operation, but should not be committed to the project repository accidentally. This is specific to AI Workbench’s implementation and interpretation of the Workbench Project Specification.

To store these files, AI Workbench uses a separate directory called project-runtime-info located under the workbench service’s working directory (defaulting to ~/.nvwb). Each project has its subdirectory within project-runtime-info, and the name of the subdirectory depends on the hash of the project path.

The files stored in this directory are:

  • Containerfile - Generated on demand, during project build (also can be called a Dockerfile)

  • build-output.success - Build logs from the last build, which was successful

  • build-output.error - Build logs from the last build, which failed and resulted in an error

  • build-output.building - Build logs from the build that is in progress

  • secrets.env - Stores the project’s secret environment variables and their values

  • rebuild.cache - The cache value of the last build of the container image

  • edit.cache - The cache value of the last edit of the container image made by the library. Used to detect manual changes that were made by the user (without updating the Containerfile)

  • <application>-start.log - Application start logs are captured in this file to aid in developer debugging scenarios.

  • cache/ - The directory that acts as the build context for the container build. It contains a copy of all the files used to build the container image, so that editing the originals doesn’t invalidate the build context for the container image, making it possible for edits to the build environment to happen without forcing a full rebuild of the container image.

  • mount/ - The directory stores the mount definitions for the project’s configured mounts, as well as mount directories and unmount scripts for each mount, if any.

Here is one example of the Project Spec file:

Copy
Copied!
            

specVersion: v2 meta: name: example-project image: project-example-project description: An example project using PyTorch labels: [] createdOn: "2024-01-04T23:32:17Z" defaultBranch: main layout: - path: code/ type: code storage: git - path: models/ type: models storage: gitlfs - path: data/ type: data storage: gitlfs - path: data/scratch/ type: data storage: gitignore environment: base: registry: nvcr.io image: nvidia/ai-workbench/pytorch:1.0.2 build_timestamp: "20231212000523" name: PyTorch supported_architectures: [] cuda_version: "12.2" description: A Pytorch 2.1 Base with CUDA 12.2 entrypoint_script: "" labels: - cuda12.2 - pytorch2.1 apps: - name: jupyterlab type: jupyterlab class: webapp start_command: jupyter lab --allow-root --port 8888 --ip 0.0.0.0 --no-browser --NotebookApp.base_url=\$PROXY_PREFIX --NotebookApp.default_url=/lab --NotebookApp.allow_origin='*' health_check_command: '[ \$(echo url=\$(jupyter lab list | head -n 2 | tail -n 1 | cut -f1 -d'' '' | grep -v ''Currently'' | sed "s@/?@/lab?@g") | curl -o /dev/null -s -w ''%{http_code}'' --config -) == ''200'' ]' stop_command: jupyter lab stop 8888 user_msg: "" icon_url: "" webapp_options: autolaunch: true port: "8888" proxy: trim_prefix: false url_command: jupyter lab list | head -n 2 | tail -n 1 | cut -f1 -d' ' | grep -v 'Currently' - name: tensorboard type: tensorboard class: webapp start_command: tensorboard --logdir \$TENSORBOARD_LOGS_DIRECTORY --path_prefix=\$PROXY_PREFIX --bind_all health_check_command: '[ \$(curl -o /dev/null -s -w ''%{http_code}'' http://localhost:\$TENSORBOARD_PORT\$PROXY_PREFIX/) == ''200'' ]' stop_command: "" user_msg: "" icon_url: "" webapp_options: autolaunch: true port: "6006" proxy: trim_prefix: false url: http://localhost:6006 programming_languages: - python3 icon_url: "" image_version: 1.0.3 os: linux os_distro: ubuntu os_distro_release: "22.04" schema_version: v2 user_info: uid: "" gid: "" username: "" package_managers: - name: apt binary_path: /usr/bin/apt installed_packages: - curl - git - git-lfs - vim - name: pip binary_path: /usr/local/bin/pip installed_packages: - jupyterlab==4.0.7 package_manager_environment: name: "" target: "" execution: apps: [] resources: gpu: requested: 1 sharedMemoryMB: 1024 secrets: [] mounts: - type: project target: /project/ description: Project directory options: rw - type: volume target: /data/tensorboard/logs/ description: Tensorboard Log Files options: volumeName=tensorboard-logs-volume

SpecVersion

This field gives the schema version number of the current project spec.

Meta

This section outlines the metadata for the project to correctly display on the NVIDIA AI Workbench application, including creation time, project name, and project description.

  • name - Name of the Project

  • image - Name of the Project container image. This image name is local to the machine and isn’t pushed to a container registry.

  • description - Description of the project

  • labels - List of labels for the Project

  • createdOn - RFC 3339 formatted string containing when the Project was created.

  • defaultBranch - Name of the default GIT branch

Layout

This section provides information on the default layout and structure for the project, including core project directories as well as their storage backends. AI Workbench will reconcile configuration based on this if needed (e.g. adding a path to the .gitignore or .gitattributes files)

  • path - The relative path from the root of the repository, that is the target of the layout definition.

  • type - The type of data contained in the target directory. One of “code”, “model”, “data”. Currently unused.

  • storage - How the data in the target directory should be stored, eg. storage backend, if any.

Environment

This section provides information on the environment for the project, primarily specifying the base environment used. This data can be manually configured, but typically it is automatically populated from image labels applied to the base environment image that is selected during project creation.

Important

The base.cuda_version field tells AI Workbench what version of CUDA the host driver must support. If this value is not set correctly, you may experience runtime errors that AI Workbench fails to warn about when starting the container.

base: - registry - The container registry containing the base image. - image - The container image on top of which the Project container will be built. - build_timestamp - Timestamp of base image build in Y/m/d/H/M/S format. - name - Name of the base container - supported_architectures - List of supported architectures that the base environment image is built for. - cuda_version - Version of CUDA installed in the base environment. - description - Description of the base container. - entrypoint_script - Path to script that AI Workbench will source in the entrypoint file that it automatically generates. - labels - List of labels for the base environment. - apps - Section containing a list of objects describing the applications already installed in the base environment.

  • name - Unique name of the application, displayed to the user.

  • type - The type of application, used to determine what application-specific automation will be run.

  • class - The class of application, used to determine what optional configuration options are available, ie. web app, process, or native.

  • start_command - The shell command used to start the application.

  • health_check_command - The shell command used to check the health or status of the application. An exit code of zero means the application is running and healthy. An exit code of non-zero means that the application is not running or unhealthy.

  • stop_command - The shell command used to stop the application.

  • user_msg - A string template with a message that will be displayed to the user when the application is started. If a webapp class app, you can use the string {{URL}} that will be populated with the URL to the web app after starting.

  • icon_url - Link to the icon or image used for the application.

  • webapp_options - The following option(s) are available for webapp class apps

    • autolaunch - A boolean indicating if the AI Workbench should automatically open the application URL in a web browser after start.

    • port - The port that the application will be running on inside the container.

    • proxy - The following option(s) are available to manipulate reverse proxy behavior

      • trim_prefix - A boolean indicating if the AI Workbench reverse proxy should remove the application-specific Url prefix before forwarding the request to the application.

    • url - The static URI used to access the application in the container (e.g. http://localhost:9999). OR

    • url_command - The shell command used to get the URI for the application. The stdout from this command is considered the URL. You must provide either a static url or url_command for webapp class apps.

    • process_options - The following option(s) are available for process class apps

      • wait_until_finished - A boolean indicating if the AI Workbench should wait for the application to finish or let it run in the background. If true the Desktop app will notify the user when complete. The CLI will block until complete.

    • native_options - Future. Not in use.

  • programming_languages - List of programming languages installed in the base environment.

  • icon_url - Link to the icon or image used for the base environment.

  • image_version - The version number of the container image. This is used by AI Workbench to support automatically updating base environments.

  • os - Name of the base environment operating system.

  • os_distro - Name of the base environment operating system distribution.

  • os_distro_release - Release version of the base environment operating system distribution.

  • schema_version - Version of the base environment container label schema, which is related to the Project Spec Schema version.

  • user_info - Section containing an object describing information about what user the container processes should run as. If your container is built as root, this should be left blank. If your container includes a non-root user that you wish apps and other processes run in the Project to use, specify the uid, gid, and username.

    • uid - The configured UID of the non-root user to use within the container.

    • gid - The configured GID of the non-root user to use within the container.

    • username - The username of the non-root user to use within the container

  • package_managers - Section containing a list of objects containing information about the installed package managers. This information is used to install Project packages.

    • name - The name of the package manager.

    • binary_path - The complete path to the package manager binary, can be used to select venv or conda environment as well.

    • installed_packages - A comma-separated list of packages installed by the package manager. While this field is optional, it can be useful. Packages listed here will be considered installed and visible in the AI Workbench Desktop app.

  • package_manager_environment - Section containing an object describing the package manager environment that should be activated before installing packages and starting applications. Useful if your base environment used a virtual environment or conda when building.

    • name - The type of environment that should be activated.

    • target - Path to or name of the environment to activate.

Execution

This section provides information on the runtime execution of the project, such as commands for starting and stopping user-defined applications, compute resources needed for the project, and any project secrets and/or mounts needing configuration.

  • apps - Section containing a list of objects describing the user-defined applications installed in the Project. The application object contains the same fields as “environment.base.apps”.

  • resources - Section containing an object describing what host resources are requested or required to run the Project.

    • gpu - The following option(s) are available.

      • requested - The number of requested GPUs. This is a soft limit and if no GPUs are available the Project may be started without any GPUs.

      • sharedMemoryMB - The amount of shared memory (in MB) to be allocated to the project container.

  • secrets - Section containing a list of objects that describe what sensitive environment variables are required to be set before starting the Project. The values for the environment variables are not part of the Project’s spec and are configured at runtime.

    • variable - The sensitive environment variable that needs to be set.

    • description - A description of the sensitive environment variable.

  • mounts - Section containing a list of objects that describe what external data is used by the Project and where it should be located in the Project container. Any required values (like source directory) needed to configure a mount are not part of the Project’s spec and are configured at runtime.

    • type - The type of the mount, eg. “project”, “host”, “volume”, or “tmp”.

    • target - The target directory, within the Project container, where the data should be located. Target locations must be absolute paths and include the trailing slash.

    • description - Description of the mount.

    • options - Any options that are used when configuring the mount.

How is this different from devcontainers?

AI Projects in spirit are similar to devcontainers, but have a different focus, set of concerns, and interfaces that are tailored to a different type of user and use case. We consider AI Projects a super-set of functionality that overlaps in some ways with devcontainers, and diverges in others. It is possible in the future that it will be easy to convert or use a devcontainer representation alongside the Workbench Project representation.

Can I edit the spec file directly?

Yes - you can directly manipulate your spec.yaml if needed and AI Workbench will respond to the applied changes.

Previous Overview
Next Working with AI Projects
© Copyright © 2024, NVIDIA Corporation. Last updated on Apr 29, 2024.