> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# Run on Your Local Workstation

Use this guide for local, single-node workflows on a workstation or an interactive Slurm allocation. For setup details, refer to our [Installation Guide](/get-started/installation).
For batch multi-node jobs, see the [Run on a Cluster](/job-launchers/slurm-cluster) or [SkyPilot](/job-launchers/skypilot) guides.

NeMo AutoModel uses recipes to run end-to-end workflows. If you're new to recipes, see the [Repository Structure](/get-started/repo-structure) guide.

## Quick Start: Choose Your Job Launch Option

* **CLI (recommended)**
  ```bash
  automodel examples/llm_finetune/llama3_2/llama3_2_1b_squad.yaml
  ```

* **Direct recipe script**
  * Single GPU
    ```bash
    python nemo_automodel/recipes/llm/train_ft.py -c examples/llm_finetune/llama3_2/llama3_2_1b_squad.yaml
    ```
  * Multi-GPU (single node)
    ```bash
    torchrun --nproc-per-node=2 nemo_automodel/recipes/llm/train_ft.py -c examples/llm_finetune/llama3_2/llama3_2_1b_squad.yaml
    ```

## Run with AutoModel CLI (Single Node)

The AutoModel CLI is the preferred method for most users. It offers a unified interface to launch training scaling from a local workstation (this guide) to large clusters (see our [cluster guide](/job-launchers/slurm-cluster)).

### Basic Usage

The CLI follows this format:

```bash
automodel [--nproc-per-node N] <config.yaml> [--key.subkey=override ...]
```

A short alias `am` is also available. Both commands also work with `uv run` (e.g., `uv run automodel <config.yaml>`).

Where:

* `<config.yaml>`: Path to your YAML configuration file (must contain a `recipe._target_` key)
* `--nproc-per-node`: Optional override for the number of GPUs to use

The recipe class is specified inside the YAML via the `recipe._target_` key:

```yaml
recipe:
  _target_: nemo_automodel.recipes.llm.train_ft.TrainFinetuneRecipeForNextTokenPrediction
```

### Train on a Single GPU

For simple fine-tuning on a single GPU:

```bash
automodel examples/llm_finetune/llama3_2/llama3_2_1b_squad.yaml
```

### Train on Multiple GPUs (Single Node)

For interactive single-node jobs, the CLI automatically detects the number of available GPUs and
uses `torchrun` for multi-GPU training. You can manually specify the number of GPUs using the `--nproc-per-node` option:

```bash
automodel --nproc-per-node 2 examples/llm_finetune/llama3_2/llama3_2_1b_squad.yaml
```

If you don't specify `--nproc-per-node`, it will use all available GPUs on your system.

Looking for Slurm or cloud training? See [Run on a Cluster](/job-launchers/slurm-cluster) or [SkyPilot](/job-launchers/skypilot).

## Run with uv (Development Mode)

When you need more control over the environment or are actively developing with the codebase, you can use `uv` to run training scripts directly. This approach gives you direct access to the underlying Python scripts and is ideal for debugging or customization.

### Train on a Single GPU

```bash
uv run nemo_automodel/recipes/llm/train_ft.py -c examples/llm_finetune/llama3_2/llama3_2_1b_squad.yaml
```

### Train on Multiple GPUs with Torchrun (Single Node)

For multi-GPU single-node training, use `torchrun` directly:

```bash
uv run torchrun --nproc-per-node=2 nemo_automodel/recipes/llm/train_ft.py -c examples/llm_finetune/llama3_2/llama3_2_1b_squad.yaml
```

### Why Use uv?

uv provides several advantages for development and experimentation:

* **Automatic environment management**: uv automatically creates and manages virtual environments, ensuring consistent dependencies without manual setup.
* **Lock file synchronization**: Keeps your local environment perfectly synchronized with the project's `uv.lock` file.
* **No installation required**: Run scripts directly from the repository without installing packages system-wide.
* **Development flexibility**: Direct access to Python scripts for debugging, profiling, and customization.
* **Dependency isolation**: Each project gets its own isolated environment, preventing conflicts.

## Run with Torchrun

If you have NeMo AutoModel installed in your environment and prefer to run recipes directly without uv, you can use `torchrun` directly:

### Train on a Single GPU

```bash
python nemo_automodel/recipes/llm/train_ft.py -c examples/llm_finetune/llama3_2/llama3_2_1b_squad.yaml
```

### Train on Multiple GPUs (Single Node)

```bash
torchrun --nproc-per-node=2 nemo_automodel/recipes/llm/train_ft.py -c examples/llm_finetune/llama3_2/llama3_2_1b_squad.yaml
```

This approach requires that you have already installed NeMo AutoModel and its dependencies in your Python environment (see the [installation guide](/get-started/installation) for details).

## Customize Configuration Settings

All approaches use the same YAML configuration files. You can easily customize training by following the steps in this section.

1. **Override config values**: Use command-line arguments to directly replace default settings.
   For example, if you want to fine-tune `Qwen/Qwen3-0.6B` instead of `meta-llama/Llama-3.2-1B`, you can use:
   ```bash
   automodel config.yaml --model.pretrained_model_name_or_path Qwen/Qwen3-0.6B
   ```

2. **Edit the config file**: Modify the YAML directly for persistent changes.

3. **Create custom configs**: Copy and modify existing configurations from the `examples/` directory.

## When to Use Which Approach

**Use the AutoModel CLI when:**

* You want a simple, unified interface
* You are running locally on a single machine
* You don't need to modify the underlying code
* You prefer a higher-level abstraction

**Use uv when:**

* You're developing or debugging the codebase
* You want automatic dependency management
* You need maximum control over the execution
* You want to avoid manual environment setup
* You're experimenting with custom modifications

**Use Torchrun when:**

* You have a stable, pre-configured environment
* You prefer explicit control over Python execution
* You're working in environments where uv is not available
* You're integrating with existing PyTorch workflows

All approaches use the same configuration files and provide the same training capabilities on a single node. For multi-node training, see [Run on a Cluster](/job-launchers/slurm-cluster).