Datasets and Workspaces Overview

Artificial intelligence and machine learning products are powered by data - enabling workloads to easily and reliably access data in a performant way is a core feature of Base Command. This lab will introduce how data is managed in Base Command with datasets and workspaces, and how both datasets and workspaces can be created, accessed, and deleted.

In Base Command, a dataset is a shareable, read-only artifact that can be mounted for use in a Base Command job. Many machine learning or deep learning workloads expect a specific data layout - those basic expectations should be understood before a dataset is constructed to ensure the directory layout, file names, and type of data can be consistently reusable. Datasets can be created in a variety of ways, each tailored to suit access to the data where it is currently available.

In Base Command, a workspace is a shareable read-write storage endpoint that can also be mounted in a Base Command job. A workspace is intended to be used for iterative work that is likely to need to change frequently - for example, for data that is not yet ready to be turned into a static dataset. This lab will focus on the data-oriented use cases for workspaces, but given their flexibility, workspaces can be used for source code that has not yet been version controlled, or as a shared workspace for multiple teams across different jobs concurrently. Workspace access control and sharing will be covered in depth in other lab content.

© Copyright 2022-2023, NVIDIA. Last updated on Jan 10, 2023.