Connecting to Weights & Biases

Weights & Biases (W&B) is a widely-used tool for charting training metrics for machine learning jobs, such as loss curves, resource usage, accuracy scores, and more. This makes it easy to validate how a model learns over time and to compare multiple runs to determine the best models for certain outcomes.

W&B supports a simple Python API to send training information to their servers. To use the API, users will need to create an access token on W&B, install the Python package, and tell W&B which values to track.

Setup

First, create a W&B access token by navigating to https://wandb.ai and click Sign Up in the top right to create a free account if not done already. Once logged in, go to https://wandb.ai/settings and go to the bottom to create a new API key. This API key needs to be specified for jobs that use W&B.

Python Package Installation

The container used to run your job on DGX Cloud Lepton needs the W&B Python package installed. Some NGC images like the NeMo Framework container (nvcr.io/nvidia/nemo) already have the package installed, while others like the PyTorch image (nvcr.io/nvidia/pytorch) do not. If your container does not have W&B installed already, run this command as part of an entrypoint on container start or in a running container.

pip3 install wandb

You can check if your container already has W&B installed with:

pip3 freeze | grep wandb

If the above command returns nothing, W&B is not installed already.

Example W&B Job

The following is a trivial example of a job that sends metrics to W&B using the API. The key points are:

  • Import the wandb module
  • Initialize the wandb project with wandb.init and specify hyperparameters and the project name
  • Tell W&B which values to send with wandb.log()
import wandb
import random

# start a new wandb run to track this script
wandb.init(
    # set the wandb project where this run will be logged
    project="my-awesome-project",

    # track hyperparameters and run metadata
    config={
        "learning_rate": 0.02,
        "architecture": "CNN",
        "dataset": "CIFAR-100",
        "epochs": 10,
    }
)

# simulate training
epochs = 10
offset = random.random() / 5
for epoch in range(2, epochs):
    acc = 1 - 2 ** -epoch - random.random() / epoch - offset
    loss = 2 ** -epoch + random.random() / epoch + offset

    # log metrics to wandb
    wandb.log({"acc": acc, "loss": loss})

# [optional] finish the wandb run, necessary in notebooks
wandb.finish()

To authenticate with W&B, set the WANDB_API_KEY environment variable to your API key created earlier:

export WANDB_API_KEY=xxxxxxxxx

You can also set this environment variable directly in the platform when defining the job.

After running the example code, you should see a new project called my-awesome-project in your W&B account.

For your own W&B experiments, adding the API key will automate the login process so your own code should run automatically connected to your account.

Integration with NVIDIA NeMo Framework

NVIDIA NeMo Framework supports W&B natively. To use W&B with NeMo Framework, set your W&B key as an environment variable named WANDB_API_KEY. Refer to the documentation on integrating W&B for your specific NeMo Framework job.

Copyright @ 2025, NVIDIA Corporation.