Step #1: MegaMolBART Model Training

If you would like to monitor the MegaMolBART model training process, please set up Weights and Biases access by following the links:

This API key will be requested when launching the training job.

  1. To launch the model training on a single node interactively, please go to the Jupyter-Lab link on the left panel:

    clara-megamolbart-01.png

  2. This will open a new tab in the browser. Click on the “+” sign on the top of the left side panel, and then select the “Terminal” launcher icon.

    clara-megamolbart-02.png

  3. This will open a bash terminal tab within the Jupyter-lab tab.

    clara-megamolbart-03.png

  4. Now, execute the following command to copy the pre-configured files required to launch the training job.

    Copy
    Copied!
                

    bash /data/training_files/move_files.sh


  5. For this example, the job will utilize a single GPU for training. Users can modify the quick pretraining launcher file at examples/chem/shell/megamolbart_pretrain_quick.sh according to the compute resources. For launching the training job, execute the following command:

    Copy
    Copied!
                

    bash examples/chem/shell/megamolbart_pretrain_quick.sh


  6. Here is an example image of the terminal upon job launch. Notice that the terminal messages provide additional detailed information on the configuration of the job, its progress, and the locations of resulting files and logs.

    clara-megamolbart-04.png

  7. For this model training example, the dataset is located at /data/zinc_csv_split, a subset of the ZINC-15 dataset. It is prepared as the train, test, and validation sets, containing 100,000 compounds in training, 5000 in test, and 5000 in the validation dataset. The training job prompt will ask the user for the Weights and Biases (wandb.ai) API key for online logging and training progress visualization.

    clara-megamolbart-05.png

  8. Model training will continue from this point and may take a few hours to days, depending on the input dataset size, training parameters, and the compute resources allocated.

    clara-megamolbart-06.png

  9. As the primary objective of this exercise is to show how a user can launch a model training run, press CTRL + D to terminate the training process as appropriate and to close the terminal tab. Following are examples plots showing the model training run, as logged and plotted by weights and Biases (https://www.wandb.ai).

    clara-megamolbart-07.png
    clara-megamolbart-08.png

© Copyright 2022, NVIDIA. Last updated on Sep 28, 2022.