Optimize for Tokens/GPU Throughput#

Learn how to use the NeMo Microservices Customizer to create a LoRA (Low-Rank Adaptation) customization job optimized for higher tokens/GPU throughput and lower runtime. In this tutorial, we’ll use LoRA to fine-tune a model and leverage sequence packing feature to improve GPU utilization and decrease fine-tuning runtime.

Note

The time to complete this tutorial is approximately 30 minutes. In this tutorial, you run a customization job. Job duration increases with the number of model parameters and the dataset size.

Prerequisites#

  • SQuAD dataset uploaded to NeMo MS Datastore and registered in NeMo MS Entity Store. Refer to the LoRA model customization tutorial for details how to upload and register the dataset.

  • (Optional) Weights & Biases account and API key for enhanced visualization.


Create LoRA Customization Jobs#

For this tutorial, you must create two LoRA customization jobs: one with sequence_packing_enabled set to true and another with the same field set to false.

Tip

For enhanced visualization, it’s recommended to provide a WandB API key in the wandb-api-header HTTP header. Remove wandb-api-header from the request if the WANDB_API_KEY is not set.

  1. Create the customziation job with sequence packing enabled:

    export WANDB_API_KEY=<YOUR_WANDB_API_KEY>
    
    curl --location \
    "https://${CUST_HOSTNAME}/v1/customization/jobs" \
    --header "wandb-api-key: ${WANDB_API_KEY}" \
    --header 'Accept: application/json' \
    --header 'Content-Type: application/json' \
    --data '{
        "config": "meta/llama-3.1-8b-instruct",
        "dataset": {
            "name": "test-dataset"
        },
        "hyperparameters": {
            "sequence_packing_enabled": true,
            "training_type": "sft",
            "finetuning_type": "lora",
            "epochs": 10,
            "batch_size": 32,
            "learning_rate": 0.00001,
            "lora": {
                "adapter_dim": 16
            }
        }
    }' | jq
    
  2. Note the customization_id. It will be needed later.

  3. Create another customization job with sequence packing disabled:

    export WANDB_API_KEY=<YOUR_WANDB_API_KEY>
    
    curl --location \
    "https://${CUST_HOSTNAME}/v1/customization/jobs" \
    --header "wandb-api-key: ${WANDB_API_KEY}" 
    --header 'Accept: application/json' \
    --header 'Content-Type: application/json' \
    --data '{
        "config": "meta/llama-3.1-8b-instruct",
        "dataset": {
            "name": "test-dataset"
        },
        "hyperparameters": {
            "sequence_packing_enabled": false,
            "training_type": "sft",
            "finetuning_type": "lora",
            "epochs": 10,
            "batch_size": 32,
            "learning_rate": 0.00001,
            "lora": {
                "adapter_dim": 16
            }
        }
    }' | jq
    
  4. Note the customization_id. It will be needed later.

Monitor LoRA Customization Jobs#

Use the customization_id from each job to make a GET request for status details.

curl ${CUST_HOSTNAME}/v1/customization/jobs/${customizationID}/status | jq

The response includes timestamped training and validation loss values. The expected validation loss for both jobs should be similar.

View Jobs in Weights & Biases#

To enable W&B integration, include your WandB API key when creating a customization job in the call header. Then view your results at wandb.ai under the nvidia-nemo-customizer project.

Validation Loss Curves#

The expected validation loss curves should match closely for both jobs. W&B chart - val_loss

Sequence packed version should complete significantly faster. W&B chart - runtime

GPU Utilization#

Sequence packed version should have a higher GPU utilization. W&B chart - gpu utilizaiton

GPU Memory Allocation#

Sequence packed version should have a higher GPU Memory Allocation. W&B chart - gpu memory

Sequence Packing Statistics#

Sequence packing statistics can be found under run config. W&B chart - sequence packing stats

Note

The W&B integration is optional. When enabled, we’ll send training metrics to W&B using your API key. While we encrypt your API key and don’t log it internally, please review W&B’s terms of service before use.

Next Steps#

Now that you have created an optimized customization job, you can evaluate the output using NeMo Microservices Evaluator.