Checking Your Customization Job Metrics#
After completing a customization job, you can monitor its performance through training and validation metrics. There are three ways to access these metrics:
Using the API
Through MLflow (optional)
Using Weights & Biases (optional)
Note
The time to complete this tutorial is approximately 10 minutes.
Available Metrics#
Each customization job tracks two key metrics:
Training Loss: Calculated during training, logged every 10 steps
Validation Loss: Calculated during validation, logged every epoch
Prerequisites#
Completed customization job with a valid ID
(Optional) Access to NeMo with MLflow tracking enabled
(Optional) Weights & Biases account and API key for enhanced visualization
Viewing Your Metrics#
Using the API#
Get metrics with a simple API call:
from nemo_microservices import NeMoMicroservices
# Initialize the client
client = NeMoMicroservices(
base_url=f"${CUSTOMIZER_BASE_URL}"
)
# Get job status with metrics
job_id = "your-customization-job-id"
job_status = client.customization.jobs.status(job_id)
print(f"Job ID: {job_status.id}")
print(f"Status: {job_status.status}")
print(f"Progress: {job_status.status_details.percentage_done}%")
print(f"Epochs completed: {job_status.status_details.epochs_completed}")
# Check for training metrics
if job_status.status_details.metrics:
metrics = job_status.status_details.metrics.metrics
# Display training loss
if metrics.get("train_loss"):
train_losses = metrics["train_loss"]
print(f"Training loss values: {len(train_losses)} points")
if train_losses:
print(f"Latest training loss: {train_losses[-1]}")
# Display validation loss
if metrics.get("val_loss"):
val_losses = metrics["val_loss"]
print(f"Validation loss values: {len(val_losses)} points")
if val_losses:
print(f"Latest validation loss: {val_losses[-1]}")
curl ${CUSTOMIZER_BASE_URL}/customization/jobs/${customizationID}/status | jq
The response includes timestamped training and validation loss values.
Using MLflow#
Access the MLflow UI (typically available through your cluster’s external URL)
Find your experiment using the customization ID
Select the run to view metrics under “Metrics”
Using Weights & Biases#
To enable W&B integration, include your API key when creating a customization job:
from nemo_microservices import NeMoMicroservices
import os
# Initialize the client
client = NeMoMicroservices(
base_url=f"${CUSTOMIZER_BASE_URL}"
)
# Set up WandB API key for enhanced visualization
extra_headers = {}
if os.getenv('WANDB_API_KEY'):
extra_headers['wandb-api-key'] = os.getenv('WANDB_API_KEY')
# Create a customization job with W&B integration
job = client.customization.jobs.create(
config="meta/llama-3.1-8b-instruct",
dataset={
"name": "test-dataset"
},
hyperparameters={
"training_type": "sft",
"finetuning_type": "lora",
"epochs": 10,
"batch_size": 16,
"learning_rate": 0.0001,
"lora": {
"adapter_dim": 8
}
},
extra_headers=extra_headers
)
print(f"Created job with W&B integration:")
print(f"Job ID: {job.id}")
print(f"Status: {job.status}")
curl --location "https://${CUSTOMIZER_BASE_URL}/customization/jobs" \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header 'wandb-api-key: <WANDB_API_KEY>' \
--data '{
"config": "meta/llama-3.1-8b-instruct",
"dataset": {"name": "test-dataset"},
"hyperparameters": {
"training_type": "sft",
"finetuning_type": "lora",
"epochs": 10,
"batch_size": 16,
"learning_rate": 0.0001,
"lora": {
"adapter_dim": 8
}
}
}'
Then view your results at wandb.ai under the nvidia-nemo-customizer
project.
Note
The W&B integration is optional. When enabled, we’ll send training metrics to W&B using your API key. While we encrypt your API key and don’t log it internally, please review W&B’s terms of service before use.