Step #1: Getting Started

Create a Workspace

Workspaces are shareable, persistent, mountable read-write file systems. They provide an easy way for users in a team to work together in a shared storage space. Workspaces are a good place to store code, can easily be synced with git, or even updated while a job is running, especially an interactive job. This means you can experiment rapidly in interactive mode without uploading new containers or datasets for each code change.

To create a workspace with the ngc cli and share it with your team, enter the commands below and give it a unique name. Your team can be found in the top right of the web UI.

Note

Use the --help flag to learn more about ngc specific commands

Copy
Copied!

            
            ngc workspace create --team nvbc-labs --name <unique_workspace_name>
ngc workspace share --team nvbc-labs <workspace_name>
ngc workspace list --team nvbc-labs

Create and Run a Single GPU Job

Important

This part of the lab will need to be completed using the desktop that is accessible from the left-hand navigation pane.

Log into NVIDIA NGC by visiting https://ngc.nvidia.com/signin.
Expand the Base Command section by clicking the downward facing chevron and select Dashboard.
Click Create Job.
Select your Accelerated Computing Environment (ACE).
Set the instance size to dgxa100.80g.1.norm.
Select the bert-pytorch-hdf5-wiki-data dataset and set the mount point to /data.
Select the Workspaces tab, select your workspace bert-smaple, and set the mount point to /workspace/mount.
Set the result mountpoint to /results.
Select the nv-launchpad-bc:dle-bert-pytorch container from the dropdown and the 21.11-py3 tag.

Enter the command below in Run Command to start up JupyterLab.

Copy
Copied!

            
            jupyter lab  --notebook-dir=/workspace/bert --ip=0.0.0.0 --no-browser --allow-root --NotebookApp.token='' --NotebookApp.allow_origin='*'

Adding ports here exposes them and automatically maps them to a URL provided once the container starts running. To access the JupyerLab enter 8888 and click Add.
Rename the job to bert-pretrain-single-gpu.

Note

Once you have filled in all the required fields, you can copy the command at the bottom of the Create Job page and paste it into your terminal to run the job via the NGC CLI tool.
Click the down arrow next to Launch Job and select Launch and Template.
Name the template single-node-bert-test and click Submit.
You can find your job in the jobs list by clicking on Jobs under the Base Command section in the sidebar.
Click on your job in the list to open it.
Once your job has entered the Running state, you should see a green link in the overview section.
Click the link to open the JupyterLab.
Open the terminal on the Jupyter Notebook.

Begin the pretraining benchmark by running the command below.

Copy
Copied!

            
            scripts/run_pretraining.sh

By default, the logs for the script are located in /workspace/mount/<timestamp>. Where timestamp is the utc timestamp of when you started the pretraining script.
When ready, gracefully finish the job by selecting File > Shut Down at the top left in the JupyterLab window.

Create and Run an Eight GPU Job

Navigate back to the Create Job page and choose the Templates tab. Select your template and set the instance size to 8 GPUs.
Add port 6006 so that tensorboard will be accessible.
Scroll to the bottom of the page and change the name to something more accurate like bert-pretrain-8-gpu.
Copy the cli command and run it in the console.
Wait until the job is running. You can check the job’s status with the following command. The job ID is given when first creating the job ngc batch info <job_id>.
Copy and paste the link into your browser to open JupyterLab.

As before, run the pre training script inside the terminal in JupyterLab.

Copy
Copied!

            
            scripts/run_pretraining.sh

Telemetry

From the job’s detail page, you can switch to the telemetry pane of the jobs page to view the resources being used by your job. Job telemetry is automatically generated by Base Command and provides GPU, Tensor Core, CPU, GPU Memory, and IO usage information for the job. Mousing over the graphs will give more detailed information. More information about telemetry can be found here. And the general user and quickstart guides can be found here.

Experiment with Tuning and Profiling

Within the eight GPU job, you can pass different parameters to change how the scripts run. You can change parameters such as:

Batch Size
Precision
Max Sequence Length
Seed
Learning Rate

This can be done by either changing the environment variables such as when setting num_gpus or by editing the pretraining script directly.

Copy
Copied!

            
            num_gpus=1 scripts/run_pretraining.sh

You can also use your own custom profilers to get even more detailed information as to how your model is performing. Below we are profiling our pretraining script with the Pytorch Profiler for one iteration.

Open the scripts/run_pretraining.sh file in the text editor by clicking on it in the left side bar.

Change line 15 as shown below.

Copy
Copied!

            
            pretrain_script="run_pretraining_profile.py"

Open a second terminal in JupyterLab by clicking the blue + sign in the top left.
In that second terminal start tensorboard and point it at the profilers log directory using the command below.
Copy

Copied!
```
            
            tensorboard --logdir /workspace/mount/profile_log/
        
```
Open the url mapped to port 6006. This can be found on the job’s overview page or with the cli command shown below.
Copy

Copied!
```
            
            ngc batch info <job_id>
        
```

Inference

After pretraining the model you can fine tune the model to run better for specific tasks, such as Q&A on SQuAD.

Download the SQuAD dataset into your workspace using the command below.

Copy
Copied!

            
            data/create_datasets_from_start.sh

Download a pretrained checkpoint that has trained for 7000+ iterations into your workspace using the command below.
Copy

Copied!
```
            
            data/download_checkpoints.sh
        
```
Run the SQuAD fine-tuning script. If no argument is given, the script will fine tune and evaluate the pretrained checkpoint. The evaluation will include how many inferences per second can be run.
Copy

Copied!
```
            
            scripts/run_squad.sh /workspace/mount/<timestamp>/checkpoints/<largest_checkpoint_iteration>
        
```

Once that is complete, you can test the resulting model interactively. You can use the file /workspace/mount/SQuAD/<timestamp>/pytorch_model.bin, but in this example we are using the checkpoint that has already been fine-tuned.

Copy
Copied!

            
            python inference.py --bert_model "bert-large-uncased" \
--init_checkpoint=/workspace/mount/checkpoints/bert_large_qa.pt \
--config_file="bert_config.json" \
--vocab_file=/workspace/bert/vocab/vocab \
--question="What food does Harry like?" \
--context="My name is Harry and I grew up in Canada. I love apples."

Create and Run a Multi-node Job

Go to create a new job as you did for a single node job. Only this time, select the Multi-node tab at the top.
Select your ACE and instance size again.
Set the Multi-node Type to PyTorch and the Replica Count to 3.
Set all other options the same as when creating the single node job and put 86400 in the Total Runtime field. It is the total runtime in seconds and is required for multi-node jobs.

Once logged into JupyterLab, run the command below.

Copy
Copied!

            
            scripts/run_pretraining_multinode.sh

Optional - If you would like to create a checkpoint with high accuracy comparable to the downloaded checkpoints, open the scripts/run_pretraining.sh file in the text editor.

Important

This is optional and will take days to complete.

Change the steps on lines 22, 23 and 35 to match the recommended number in the comments.

Change line 22 to:

Copy
Copied!

            
            train_steps=${train_steps:-7038}

Change line 23 to:

Copy
Copied!

            
            save_checkpoint_steps=${save_checkpoint_steps:-200}

Change line 35 to:

Copy
Copied!

            
            train_steps_phase2=${train_steps_phase2:-1563}

Run the pretraining script with the command below.

Copy
Copied!

            
            scripts/run_pretraining.sh

Export Results

The results of the runs will be found in the /workspace/mount folder by default, which is inside your workspace and is persistent and can be mounted to both containers and local systems via the ngc cli. Workspaces can also be mounted by multiple replicas and jobs and can be used to sync operations.

You can download your model to your machine using the NGC CLI command below.:abbreviation:

Copy
Copied!

            
            ngc workspace download --file /SQuAD/<timestamp>/pytorch_model.bin <workspace_name>

Or you can mount the Workspace locally to browse the files using the command below.

Copy
Copied!

            
            ngc workspace mount –mode RW <workspace_name> /tmp/workspace

Note

Additional information for all commands can always be found with the --help flag.

You can also upload the model for others to easily download using NGC CLI.

Create an empty model entry in the registry using the command below.

Copy
Copied!

            
            ngc registry model create --ace nv-launchpad-bc-iad1 \
--application "BERT" --format pt --framework pytorch \
--org nv-launchpad-bc --precision fp16 \
--short-desc "Example BERT Model" \
nv-launchpad-bc/nvbc-labs/bert_pytorch_example

Upload the first version of the model to its model entry using the command below.

Copy
Copied!

            
            ngc registry model upload-version --source <path to the model file> nv-launchpad-bc/nvbc-labs/bert_pytorch_example:1