NVIDIA Base Command Platform Quickstart Guide

NVIDIA Base Command Platform Quickstart Guide (PDF)

NVIDIA Base Command Platform Quickstart Guide

This document is for users and administrators of NVIDIA Base Command Platform and explains how to get started using the platform to run AI jobs.

NVIDIA Base Command™ Platform is a comprehensive platform for businesses, their data scientists, and IT teams that accelerate ROI for AI initiatives. It manages the end-to-end lifecycle of AI development including workload management and resource sharing with both a graphical user interface and command line APIs with integrated monitoring and reporting dashboards. Offered in a cloud-hosted solution that continuously delivers NVIDIA innovations directly into your AI workflow, Base Command Platform works across on-prem and cloud resources with a single pane of glass view into your AI development process.

The following is a description of the primary concepts of NVIDIA Base Command Platform.

Container Images

All applications running in NGC are containerized as Docker containers and execute in our Runtime environment. Containers are stored in the NGC Container Registry nvcr.io, accessible from both the command-line interface(CLI) and the Web UI.

Datasets

Datasets are the data inputs to a job, mounted as read-only to the location specified in the job. Datasets can contain data or code. Datasets are covered in detail in the Datasets section.

Workspaces

Workspaces are shareable read-write persistent storage mountable in jobs for concurrent use. Workspaces can be mounted to a job in read-only mode also, making that ideal for configuration/code/input use cases in the comfort of knowing that the job will not corrupt/modify any of the data. Mounting workspaces in read-write mode (which is the default) in a job works well for use as a checkpoint folder.

Jobs

A Job is the fundamental unit of computation - a container running an NVIDIA Base Command Platform instance in an accelerated computing environment (ACE). A set of attributes specified at the time of submission defines a job. Chapters 8 and 10 of the NVIDIA Base Command Platform User Guide provide details about the architecture of Base Command Platform.


This section is for org or team administrators (with User Admin role) and describes the process for inviting (adding) users to NVIDIA Base Command Platform.

As the organization administrator, you must create user accounts to allow others to use the NVIDIA Base Command Platform within the organization.

  1. Log on to the NGC web UI and and select the NGC Org associated with NVIDIA Base Command Platform.
  2. Click Organization > Users from the left navigation menu.

    image38.png

    This capability is available only to User Admins.

  3. Click Invite New User on the top right corner of the page.

    new-ngc-invite-user.png

  4. On the new page, fill out the User Information section. Enter your screen name for First Name, and the email address to receive an invitation email.

    add-user.png

  5. In the Roles section, select the appropriate context (either the organization or a specific team) and the available roles shown in the boxes below. Click Add Role to the right to save your changes. You can add or remove multiple roles before creating the user.

    user-roles.png

    The following are brief descriptions of the user roles:

    Table 1. NVIDIA Base Command Platform Roles
    Role Description
    Base Command Admin Admin persona with the capabilities to manage all artifacts available in Base Command Platform. The capabilities of the Admin role include resource allocation and access management.
    Base Command Viewer Admin persona with the read-only access to jobs, workspaces, datasets, and results within the user’s org or team.
    Registry Admin Registry Admin persona for managing NGC Private Registry artifacts and with the capability for Registry User Management. The capabilities of the Registry Admin role include the capabilities of all Registry roles.
    Registry Read Registry User persona with capabilities to only consume the Private Registry artifacts.
    Registry User Registry User persona with the capabilities to publish and consume the Private Registry artifacts.
    User Admin User Admin persona with the capabilities to only manage users.

    Refer to the section Assigning Roles in the NVIDIA Base Command Platform User Guide for additional information.

  6. After adding roles, double-check all the fields and then click Create User on the top right. An invitation email will automatically be sent to the user.

    create-user-btn.png

  7. Users that still need to accept their invitation emails are displayed in the Pending Invitations list on the Users page.

    users-pending-invitations.png


Before using NVIDIA Base Command Platform, you must have an NVIDIA Base Command Platform account created by your organization administrator. You need an email address to set up an account. Activating an account depends on whether your email domain is mapped to your organization's single sign-on (SSO). Choose one of the following processes depending on your situation for activating your NVIDIA Base Command Platform account.

3.1. Joining an NGC Org or Team Using Single Sign-on

This section describes activating an account where the domain of your email address is mapped to an organization's single sign-on.

After NVIDIA or your organization administrator adds you to a new org or team within the organization, you will receive a welcome email that invites you to continue the activation and login process.

image17.png

  1. Click the link in the email to open your organization's single sign-on page.
  2. Sign in using your single sign-on credentials.

    The Set Your Organization screen appears.

    image33.png

    This screen appears any time you log in.

  3. Select the organization and team under which you want to log in and then click Continue.

    You can always change to a different organization or team you are a member of after logging in.

    The NGC web UI opens to the Base Command dashboard.

    bcp-dashboard.png

3.2. Joining an Org or Team with a New NVIDIA Account

This section describes activating a new account where the domain of your email address is not mapped to an organization's single sign-on.

After NVIDIA or your organization administrator sets up your NVIDIA Base Command account, you will receive a welcome email that invites you to continue the activation and login process.

image17.png

  1. Click the Sign In link to open the sign in dialog in your browser.

    create-an-account.png

  2. Fill out your information, create a password, agree to the Terms and Conditions, and click Create Account.

    You will need to verify your email.

    image6.png

    The verification email is sent.

    image3.png

  3. Open the email and then click Verify Email Address.

    image11.png

    image24.png

  4. Select your options for using recommended settings and receiving developer news and announcements, and then click Submit.
  5. Agree to the NVIDIA Account Terms of Use, select desired options, and then click Continue.

    account-tou.png

  6. Click Accept at the NVIDIA GPU Cloud Terms of Use screen.

    image32.png

  7. The Set Your Organization screen appears.

    image33.png

    This screen appears any time you log in.

  8. Select the organization and team under which you want to log in and click Continue.

    You can always change to a different organization or team you are a member of after logging in.

    The NGC web UI opens to the Base Command dashboard.

    bcp-dashboard.png

3.3. Joining an Org or Team with an Existing NVIDIA Account

This section describes activating an account where the domain of your email address is not mapped to an organization's single sign-on (SSO).

After NVIDIA or your organization administrator adds you to a new org or team within the organization, you will receive a welcome email that invites you to continue the activation and login process.

image17.png

  1. Click the Sign In link to open the sign in dialog in your browser.

    image42.png

  2. Enter your password and then click Log In.

    The Set Your Organization screen appears.

    image33.png

    This screen appears any time you log in.

  3. Select the organization and team under which you want to log in and click Continue.

    You can always change to a different organization or team you are a member of after logging in.

    The NGC web UI opens to the Base Command dashboard.

    bcp-dashboard.png


During the initial account setup, you are signed into your NVIDIA Base Command Platform account on the NGC web site. This section describes the sign in process that occurs at a later time. It also describes the web UI sections of NVIDIA Base Command Platform at a high level, including the UI areas for accessing available artifacts and actions available to various user roles.

  1. Open https://ngc.nvidia.com and click Continue by one of the sign-on choices, depending on your account.
    • NVIDIA Account: Select this option if single sign-on (SSO) is not available.
    • Single Sign-on (SSO): Select this option to use your organization's SSO. You may need to verify with your organization or Base Command Platform administrator whether SSO is enabled.

    login-selection.png

  2. Continue to sign in using your organization’s single sign-on.
  3. Set the organization you wish to sign in under, then click Continue.

You can always change to a different org or team that you are a member of after logging in.

The following image and table describe the main features in the left navigation menu of the web site, including the controls for changing the org or team.

image31.png

Table 2. NGC Web UI Sections
ID Description
1 CATALOG:. Click this menu to access a curated set of GPU-optimized software. It consists of containers, pre-trained models, Helm charts for Kubernetes deployments, and industry-specific AI toolkits with software development kits (SDKs) that are periodically released by NVIDIA and are read-only for a Base Command Platform user.
2 PRIVATE REGISTRY: Click this menu to access the secure space to store and share custom containers, models, resources, and Helm charts within your enterprise.
3 BASE COMMAND:.Click this menu to access controls for creating and running Base Command Platform jobs.
4 ORGANIZATION: (User Admins only) Click this menu to manage users and teams.
5 User Info: Select this drop down list to view user information, select the org to operate under, and download the NGC CLI and API key, described later in this document.
6 Team Selection: Select this drop down list to select which team to operate under.


This chapter introduces the NGC Base Command Platform CLI, installable on your workstation for interfacing with Base Command Platform. In this section you will learn about generic features of CLI applicable to all commands as well as CLI modules that map to the Web UI areas that you have learned about in a previous chapter.

The NGC Base Command Platform CLI is a command-line interface for managing content within the NGC Registry and for interfacing with the NVIDIA Base Command Platform. The CLI operates within a shell and lets you use scripts to automate commands.

With NGC Base Command Platform CLI, you can connect with:

  • NGC Catalog

  • NGC Private Registry

  • User Management (available to org or team User Admins only)

  • NVIDIA Base Command Platform workloads and entities

5.1. Installing NGC CLI

To install NGC CLI, perform the following:

  1. Log in to your NVIDIA Base Command Platform account on the NGC website (https://ngc.nvidia.com).
  2. In the top right corner, click your user account icon and select an org that belongs to the Base Command Platform account.
  3. From the user account menu, select Setup, then click Downloads under CLI from the Setup page.
  4. From the CLI Install page, click the Windows, Linux, or macOS tab, according to the platform from which you will be running NGC CLI.
  5. Follow the Install instructions that appear on the OS section that you selected.
  6. Verify the installation by entering ngc --version. The output should be NGC CLI x.y.z where x.y.zindicates the version.

5.2. Generating Your NGC API Key

This section describes how to obtain an API key needed to configure the CLI application so you can use the CLI to access locked container images from the NGC Catalog, access content from the NGC Private Registry, manage storage entities, and launch jobs.

The NGC API key is also used for docker login to manage container images in the NGC Private Registry with the docker client.

  1. Sign in to the NGC web UI.
    1. From a browser, go to https://ngc.nvidia.com/signin/email and then enter your email
    2. Click Continue by the Sign in with Enterprise sign in option.
    3. Enter the credentials for you organization.
  2. In the top right corner, click your user account icon and then select an org that belongs to the NVIDIA Base Command Platform account.
  3. Click your user account icon again and select Setup.

    image13.png

  4. Click Get API key to open the Setup > API Key page.
  5. Click Get API Key to generate your API key. A warning message appears to let you know that your old API key will become invalid if you create a new key.
  6. Click Confirm to generate the key.

    Your API key appears.

    You only need to generate an API key once. NGC does not save your key, so store it in a secure place. (You can copy your API key to the clipboard by clicking the copy icon to the right of the API key. )

    Should you lose your API key, you can generate a new one from the NGC website. When you generate a new API Key, the old one is invalidated.

5.3. Getting Help Using NGC CLI

This section describes how to get help using NGC CLI.

5.3.1. Getting Help from the Command Line

To run an NGC CLI command, enter ngc followed by the appropriate options.

To see a description of available options and command descriptions, use the option-h after any command or option.

Example 1: To view a list of all the available options for the ngc command, enter

Copy
Copied!
            

$ ngc -h

Example 2: To view a description of all ngc batch commands and options, enter

Copy
Copied!
            

$ ngc batch -h

Example 3: To view a description of the dataset commands, enter

Copy
Copied!
            

$ ngc dataset -h

5.3.2. Viewing NGC CLI Documentation Online

The NGC Base Command Platform CLI documentation provides a reference for all the NGC Base Command Platform CLI commands and arguments. You can also access the CLI documentation from the NGC web UI by selecting Setup from the user drop down list and then clicking Documentation from the CLI pane.

5.4. Configuring the CLI for your Use

To make full use of NGC Base Command Platform CLI, you must configure it with your API key using the ngc config set command.

While there are options you can use for each command to specify org and team, as well as the output type and debug mode, you can also use the ngc config set command to establish these settings up front.

If you have a pre-existing set up, you can check the current configuration using:

Copy
Copied!
            

$ ngc config current

To configure the CLI for your use, issue the following:

Copy
Copied!
            

$ ngc config set Enter API key. Choices: [<VALID_APIKEY>, 'no-apikey']: Enter CLI output format type [ascii]. Choices: [ascii, csv, json]: Enter org [nv-eagledemo]. Choices: ['nv-eagledemo']: Enter team [nvtest-repro]. Choices: ['nvtest-repro, 'no-team']: Enter ace [nv-eagledemo-ace]. Choices: ['nv-eagledemo-ace', 'no-ace']: Successfully saved NGC configuration to C:\Users\jsmith\.ngc\config

If you are a member of several orgs or teams, be sure to select the ones associated with NVIDIA Base Command Platform.

5.5. Running the Diagnostics

Diagnostic information is available which provides details to assist in isolating issues. You can provide this information when reporting issues with the CLI to NVIDIA support.

The following diagnostic information is available for the NGC Base Command Platform CLI user:

  • Current time

  • Operating system

  • Disk usage

  • Current directory size

  • Memory usage

  • NGC CLI installation

  • NGC CLI environment variables (whether set and or not set)

  • NGC CLI configuration values

  • API gateway connectivity

  • API connectivity to the container registry and model registry

  • Data storage connectivity

  • Docker runtime information

  • External IP

  • User information (ID, name, and email)

  • User org roles

  • User team roles

Syntax

Copy
Copied!
            

$ ngc diag [all,client,install,server,user]

where

all

Produces the maximum amount of diagnostic output.

client

Produces diagnostic output only for the client machine.

install

Produces diagnostic output only for the local installation.

server

Produces diagnostic output only for the remote server.

user

Produces diagnostic output only for the user configuration.


This section contains example workflows demonstrating commonly used functionalities along with useful notes. If you have already completed the sections so far (i.e. onboarded and configured CLI), you will be able to try any of the included commands using your own account

Launching a Job from Existing Templates

  1. Click BASE COMMAND >Jobs the left navigation menu and then click Create Job.
  2. Click the Templates tab.

    create-job-templates.png

  3. Click the menu icon for the template to use, then select Apply Template.

    apply-template.png

    The create a job page opens with the fields populated with the information from the job template.

  4. Verify the pre-filled fields, enter a unique name, then click Launch.

    launch-job.png

6.2. Cloning an Existing Job

You can clone jobs, which is useful when you want to start with an existing job and make small changes for a new job.

  1. Click Jobs from the left navigation menu, then click the ellipsis menu for the job you want to copy and select Clone Job from the menu.

    clone-job.png

    The create a job page opens with the fields populated with the information from the cloned job.

  2. Edit fields as needed to create a new job, enter a unique name in the Name field, then click Launch.

    The job should appear in the job dashboard.

To clone jobs via the CLI, use the --clone flag and add other flags to override any parameters being copied from the original job.

Copy
Copied!
            

$ ngc batch run --clone <job-id> --instance dgx1v.32g.8.norm


This section describes various job management tasks.

7.1. Checking Job Name, ID, Status, and Results

Using the NGC Web UI

Log into the NGC website, then click Base Command > Jobs from the left navigation menu.

The Jobs page lists all the jobs that you have run and shows the status, job name and ID.

The Status column reports the following progress along with timestamps: Created -> Queued -> Starting -> Running -> Finish.

When a job is in the Queued state, the Status History tab in the Web UI shows the reason for the queued state. The job info command on CLI also displays this detail.

When finished, click on your job entry from the JOBS page. The Results and Log tab both show the output produced by your job.

Using the CLI

After launching a job using the CLI, the output confirms a successful launch and shows the job details.

Example:

Copy
Copied!
            

-------------------------------------------------- Job Information Id: 1854152 Name: ngc-batch-simple-job-raid-dataset-mnt Number of Replicas: 1 Job Type: BATCH Submitted By: John Smith Job Container Information Docker Image URL: nvidia/pytorch:21.02-py3 ... Job Status Created at: 2021-03-19 18:13:12 UTC Status: CREATED Preempt Class: RUNONCE ----------------------------------------

The Job Status of CREATED indicates a job that was just launched.

You can monitor the status of the job by issuing:

$ ngc batch info <job-id>

This returns the same job information that is displayed after launching the job, with updated status information.

To view the stdout/stderr of a running job, issue the following:

$ ngc batch attach <job-id>

All the NGC Base Command Platform CLI commands have additional options; issue ngc --help for details.

7.2. Monitoring Console Logs (joblog.log)

Job output (both STDOUT and STDERR) is captured in the joblog.log file.

For more information about result logging behavior, refer to the section Managing Results in the NVIDIA Base Command Platform User Guide for additional information.

Using the NGC Web UI

To view the logs for your job, select the job from the Jobs page, then select the Log tab. From here, you can view the joblog.log for each node:

job-log-output.png

Note:

If a multi-node job was run with array-type "MPI", only the log from the first node (replica 0) will contain content. The default behavior is to stream the output of STDOUT and STDERR from all nodes to the joblog.log file on the first node (replica 0). As a result, the remaining log files on the other nodes will be empty.


Using the CLI

Issue the following command:

$ ngc result download <job-id>

The joblog.log files and STDOUT/STDERR from all nodes are included with the results, which are downloaded to the current directory on your local disk in a folder labeled job-id.

To view the STDOUT/STDERR of a running job, issue the following:

$ ngc batch attach <job-id>

7.3. Downloading Results (interim and after completion)

Using the NGC Web UI

To download job results, do the following:

  1. Select the job from the Jobs page, then select the Results tab.
  2. From the Results page, select the file to download.

The file is downloaded to your Download folder.

Using the CLI

Issue the following:

$ ngc result download <job_id>

The results are downloaded to the current directory on your local disk in a folder labelled <job_id>.

7.4. Terminating Jobs

Using the NGC Web UI

To terminate a job from the NGC website, waiting until the job appears in the Jobs page, then click the menu icon for the job and select Kill Job.

image51.png

Using the CLI

Note the job ID after launching the job, then issue the following:

$ ngc batch kill <job-id>

Example:

$ ngc batch kill 1854178

Submitted job kill request for Job ID: '1854178'

You can also kill several jobs with one command by listing multiple job IDs as a combination of comma-separated IDs and ranges; for example '1-5', '333', '1, 2', '1,10-15'.

7.5. Deleting Results

Results remain in the system consuming quota until removed:

Copy
Copied!
            

$ ngc result remove <job_id>

Notice

This document is provided for information purposes only and shall not be regarded as a warranty of a certain functionality, condition, or quality of a product. NVIDIA Corporation (“NVIDIA”) makes no representations or warranties, expressed or implied, as to the accuracy or completeness of the information contained in this document and assumes no responsibility for any errors contained herein. NVIDIA shall have no liability for the consequences or use of such information or for any infringement of patents or other rights of third parties that may result from its use. This document is not a commitment to develop, release, or deliver any Material (defined below), code, or functionality.

NVIDIA reserves the right to make corrections, modifications, enhancements, improvements, and any other changes to this document, at any time without notice.

Customer should obtain the latest relevant information before placing orders and should verify that such information is current and complete.

NVIDIA products are sold subject to the NVIDIA standard terms and conditions of sale supplied at the time of order acknowledgement, unless otherwise agreed in an individual sales agreement signed by authorized representatives of NVIDIA and customer (“Terms of Sale”). NVIDIA hereby expressly objects to applying any customer general terms and conditions with regards to the purchase of the NVIDIA product referenced in this document. No contractual obligations are formed either directly or indirectly by this document.

NVIDIA products are not designed, authorized, or warranted to be suitable for use in medical, military, aircraft, space, or life support equipment, nor in applications where failure or malfunction of the NVIDIA product can reasonably be expected to result in personal injury, death, or property or environmental damage. NVIDIA accepts no liability for inclusion and/or use of NVIDIA products in such equipment or applications and therefore such inclusion and/or use is at customer’s own risk.

NVIDIA makes no representation or warranty that products based on this document will be suitable for any specified use. Testing of all parameters of each product is not necessarily performed by NVIDIA. It is customer’s sole responsibility to evaluate and determine the applicability of any information contained in this document, ensure the product is suitable and fit for the application planned by customer, and perform the necessary testing for the application in order to avoid a default of the application or the product. Weaknesses in customer’s product designs may affect the quality and reliability of the NVIDIA product and may result in additional or different conditions and/or requirements beyond those contained in this document. NVIDIA accepts no liability related to any default, damage, costs, or problem which may be based on or attributable to: (i) the use of the NVIDIA product in any manner that is contrary to this document or (ii) customer product designs.

No license, either expressed or implied, is granted under any NVIDIA patent right, copyright, or other NVIDIA intellectual property right under this document. Information published by NVIDIA regarding third-party products or services does not constitute a license from NVIDIA to use such products or services or a warranty or endorsement thereof. Use of such information may require a license from a third party under the patents or other intellectual property rights of the third party, or a license from NVIDIA under the patents or other intellectual property rights of NVIDIA.

Reproduction of information in this document is permissible only if approved in advance by NVIDIA in writing, reproduced without alteration and in full compliance with all applicable export laws and regulations, and accompanied by all associated conditions, limitations, and notices.

THIS DOCUMENT AND ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY, “MATERIALS”) ARE BEING PROVIDED “AS IS.” NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. TO THE EXTENT NOT PROHIBITED BY LAW, IN NO EVENT WILL NVIDIA BE LIABLE FOR ANY DAMAGES, INCLUDING WITHOUT LIMITATION ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, PUNITIVE, OR CONSEQUENTIAL DAMAGES, HOWEVER CAUSED AND REGARDLESS OF THE THEORY OF LIABILITY, ARISING OUT OF ANY USE OF THIS DOCUMENT, EVEN IF NVIDIA HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. Notwithstanding any damages that customer might incur for any reason whatsoever, NVIDIA’s aggregate and cumulative liability towards customer for the products described herein shall be limited in accordance with the Terms of Sale for the product.

Trademarks

NVIDIA, the NVIDIA logo, and Base Command are trademarks and/or registered trademarks of NVIDIA Corporation in the United States and other countries. Other company and product names may be trademarks of the respective companies with which they are associated.

Copyright

© 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

© Copyright 2023, NVIDIA. Last updated on Oct 3, 2023.