Next Steps#

Now that you have successfully launched the Docker container and entered it, this section will guide you through the container, initial steps to take within the container (such as configuration, downloading pre-trained model weights, etc.), and where to find tutorials.

NGC CLI Configuration#

NVIDIA NGC Command Line Interface (CLI) is a command-line tool for managing Docker containers in NGC. If NGC is not already installed in the container, download it as per the instructions here (note that within the container, the AMD64 Linux version should be installed).

Once installed, run ngc config set to establish NGC credentials within the container.

First-Time Setup#

First, invoke the following launch script. The first time, it will create a .env file and exit:

./launch.sh

Next, edit the .env file with the correct NGC parameters for your organization and team:

    NGC_CLI_API_KEY=<YOUR_API_KEY>
    NGC_CLI_ORG=<YOUR_ORG>
    NGC_CLI_TEAM=<YOUR_TEAM>

Download Model Weights#

You may now download all pre-trained model checkpoints from NGC through the following command:

./launch.sh download

This will download all models to the workspace/bionemo/models directory. Optionally, you may persist the models by copying them to your mounted workspace, so that they need not be redownloaded each time.

Directory Structure#

Note that workspace/bionemo is the home directory for the container. Below are a few key components:

bionemo: Contains the core BioNeMo package, which includes base classes for BioNeMo data modules, tokenizers, models, etc.
examples: Contains example scripts, datasets, YAML files, and notebooks
models: Contains all pre-trained models checkpoints in .nemo format.

Weights and Biases Setup (Optional)#

Training progress and charts of the models can be visualized through Weights and Biases. Setup your API Key to enable logging.

BioNeMo Framework Tutorials#

The best way to get started with BioNeMo Framework is with the tutorials. Below are some of the example walkthroughs which contain code snippets that you can run from within the container.

Tutorials are presented as notebooks (.ipynb format), which may contain various code snippets in formats like Python, Bash, YAML, etc. You can follow the instructions in these files, make appropriate code changes, and execute them in the container.

It is convenient to first launch the BioNeMo Framework container and copy the tutorial files to the container, either via the JupyterLab interface drag-and-drop or by mounting the files during the launch of the container (docker run -v ...).

Topic	Title
Model Pre-Training	Launching a MegaMolBART model pre-training with ZINC-15 dataset
Custom Datasets	Setting up the ZINC15 dataset used for training MolMIM
Model Pre-Training	Launching a MolMIM model pre-training with ZINC-15 dataset, both from scratch and starting from an existing checkpoint
Model Pre-Training	Launching an ESM-1nv model pre-training with UniRef50 dataset
Model Pre-Training	Launching an ESM-2nv model pre-training with curated data from UniRef50, UniRef90
Model Pre-Training	Pretraining a geneformer model for representing single cell RNA-seq data
Geneformer Benchmarking	Benchmarking pre-trained Geneformer models against a baseline with cell type classification
Model Training	Launching an EquiDock model pre-training with DIPS or DB5 datasets
Inference	Performing Inference with MegaMolBART for Generative Chemistry and Predictive Modeling with RAPIDS
Inference	Performing Inference with MolMIM for Generative Chemistry and Predictive Modeling with RAPIDS
Inference	Performing Inference with ESM1-nv and Predictive Modeling with RAPIDS
Inference	Performing Inference with ESM2-nv and Predictive Modeling with RAPIDS
Inference	Performing Property-guided Molecular Optimization with MolMIM, which internally involves inference
Inference	Performing inference and cell clustering on CELLxGENE data with a pretrained geneformer model
Model Finetuning	Overview of Finetuning pre-trained models in BioNeMo
Encoder Finetuning	Encoder Fine-tuning in BioNeMo: MegaMolBART
Downstream Tasks	Training a Retrosynthesis Model using USPTO50 Dataset
Downstream Tasks	Fine-tuning MegaMolBART for Solubility Prediction
Custom Datasets	Adding the OAS Dataset: Downloading and Preprocessing
Custom Datasets	Adding the OAS Dataset: Modifying the Dataset Class
Custom DataLoaders	Creating a Custom Dataloader

NVIDIA BioNeMo Framework

Next Steps

Contents