BioNeMo_MegaMolBART Framework

The BioNeMo_MegaMolBART’s training stack is built using CUDA-X, PyTorch, PyTorch lightning, and Apex (for performing training at scale), and NVIDIA’s NeMo framework for building and fine-tuning Large Language models.

clara-megamolbart-09.png

This framework allows distributed model training on a multi-GPU and multi-node compute architecture in a Model parallel, Pipeline parallel, and Data parallel configurations. Setting the desired training configuration is simple as BioNeMo_MegaMolBART uses configurable YAML files as shown below:

clara-megamolbart-10.png

Additionally, this framework allows for model checkpointing while training, so users can continue training with a previously trained (or even the provided pre-trained) model.

The library organization has the following key components:

  • examples/chem: It contains configuration files and training scripts

  • data: It includes classes and functions for loading and augmenting datasets

  • models: The NeMo MegaMolBART model

  • tokenizer: MegaMolBART tokenizer for processing SMILES input

  • vocab: Default vocabulary file and regular expression for tokenizer

clara-megamolbart-11.png

The model configuration file (here, megamolbart_pretrain_large_span_aug.yaml) is in a hierarchical YAML format, as shown in the image. Users can set parameters like devices, nodes, precision levels, etc.

clara-megamolbart-12.png

Similarly, the pre-training run script megamolbart_pretrain.py ``in Python; also, a set of scripts for Slurm and Shell (for example, ``megamolbart_pretrain_slurm.sh) are also provided to launch training jobs in respective settings.

clara-megamolbart-13.png

© Copyright 2022, NVIDIA. Last updated on Sep 28, 2022.