Model Training on Databricks
Databricks is a widely used platform for managing data, models, applications, and compute on the cloud. This guide shows how to use AutoModel for scalable, performant model training on Databricks.
The specific example here fine-tunes a Llama-3.2-1B model using the SQuAD dataset from Hugging Face, but any AutoModel functionality (for example, model pre-training, VLMs, other supported models) can also be run on Databricks.
Provision Compute
Let’s start by provisioning a Databricks classic compute cluster with the following setup:
- Databricks runtime: 18.0 LTS (Machine Learning version)
- Worker instance type:
g6e.12xlargeon AWS (4x L40S GPUs per node) - Number of workers: 2
- Global environment variable:
GLOO_SOCKET_IFNAME=eth0(see this for details) - Cluster-scoped init script:
This will provision three compute nodes – one driver node we’ll attach a notebook to, and two worker nodes we’ll use for multi-node training.
Note that we’ve selected a small number of instances for demo purposes, but you can adjust the specific instance type and number of workers for your actual use case.
Train the Model
With the above compute resources provisioned, we’re ready to fine-tune a model using AutoModel.
AutoModel uses YAML file recipes to configure various settings for the training process (for example, model, dataset, loss function, optimizer, etc.). Here we’ll use this preconfigured recipe for fine-tuning a Llama-3.2-1B model using the SQuAD dataset from Hugging Face. In a notebook connected to our compute resource, download the configuration file:
Here’s what the model, dataset, and optimizer portions of the config file look like:
See the full file for complete details (!cat llama3_2_1b_squad.yaml).
Finally, we’ll authenticate the VM running the notebook with Hugging Face so we can download the model and dataset:
Single-Node
Since AutoModel is installed via the init script, the automodel CLI is available on all nodes.
To run training on a single GPU, use this command:
In addition to specifying the configuration file, we also use these options:
--step_scheduler.max_steps: Limits the number of training steps taken. Again, this is for example purposes – adapt for your actual use case as needed.--checkpoint.checkpoint_dir: Tells AutoModel where to save model checkpoints from training. We recommend saving model checkpoints in a Databricks Unity Catalog volume.--checkpoint.staging_dir: Specifies a temporary staging location for model checkpoints. Files will be temporarily saved to this location before being moved to the finalcheckpoint_dirlocation. This is needed when saving checkpoints in Unity Catalog.--checkpoint.is_async: Uses asynchronous checkpointing.
Looking at GPU metrics in Databricks, we see our single GPU is being well utilized (~95% utilization).

To utilize all four GPUs available on this g6e.12xlarge instance, add --nproc-per-node=4 to the same command:
The automodel CLI uses PyTorch’s Elastic Launch internally to spawn and coordinate multiple training processes on the VM. Each training process runs on a separate GPU, and we can now see all four GPUs are being used (~95% utilization for each GPU).

Multi-Node
To scale further to multi-node training, we need to submit training jobs to all instances in our Databricks cluster.
First, each instance needs to be authenticated with Hugging Face to download the model and dataset:
Next, we use PySpark’s TorchDistributor to run the same training job across multiple instances like this:
TorchDistributor uses torchrun internally, so we point it at the recipe module directly (rather than the automodel CLI, which also wraps torchrun).
We now see GPU utilization is ~95% for all GPUs on all worker nodes during training (8 GPUs in this particular case).
Track Experiments with MLflow
Databricks includes built-in MLflow integration for tracking experiments, logging metrics, and storing artifacts. To use MLflow with AutoModel on Databricks, add the MLflow configuration to your YAML file.
Configure MLflow
Edit your configuration file (e.g., llama3_2_1b_squad.yaml) to include the mlflow section:
For Databricks, the key configuration parameters are:
tracking_uri: Set to"databricks"to use Databricks’ managed MLflow tracking serverexperiment_name: Name of your experiment (will appear in the Databricks workspace)artifact_location: Leave asnullto use default Databricks artifact storage, or specify a Unity Catalog volume path like/Volumes/<catalog>/<schema>/<volume>/mlflow-artifactstags: Add custom tags to organize and filter your runs
Databricks automatically handles authentication when tracking_uri is set to "databricks". No additional credentials are needed.
Run Training with MLflow
Run training with MLflow tracking enabled using the same commands as before. The MLflow configuration will be read from your YAML file:
Single-node:
Multi-GPU:
Multi-node with TorchDistributor:
View Results
During training, you’ll see MLflow logging messages in your output:
To view your experiments and metrics:
- Navigate to the Experiments page in your Databricks workspace
- Find your experiment by name (e.g.,
automodel-databricks-llama3-squad) - Click on a run to view metrics, parameters, and artifacts
The Databricks MLflow UI displays:
- Training and validation metrics over time
- Model parameters and hyperparameters
- Custom tags for filtering and comparison
- Artifacts and model checkpoints
- System metrics (GPU utilization, memory usage)
Store Artifacts in Unity Catalog
To store MLflow artifacts in Unity Catalog volumes, specify the artifact_location:
This ensures your artifacts are stored in a governed, versioned location within Unity Catalog.
Additional Configuration
You can override MLflow settings from the command line:
For more details on MLflow configuration options and best practices, see the MLflow logging guide.
Conclusion
This guide showed how to use AutoModel for model training on Databricks-managed compute. It’s relatively straightforward to scale from a single-GPU to multi-GPU to multi-node training to best suit your needs.
While the example here fine-tunes a Llama-3.2-1B model using the SQuAD dataset, any supported AutoModel functionality (like model pre-training, VLMs, etc.) can also run, and scale, on Databricks. Check out additional recipes and end-to-end examples to learn more.