Create Job

View as Markdown

Customization jobs are submitted to one of two backends. Choose the backend that matches your hardware and training goal, then build that backend’s job spec and submit it. For the full per-backend hyperparameter reference, see Training Configuration.

BackendBest forMethods
Automodel (default)Production fine-tuning, larger models, multi-GPU scalingSFT, distillation; LoRA, merged-LoRA, or full-weight
UnslothMemory-constrained single-GPU LoRASFT; LoRA or full-weight, with 4-bit / 8-bit loading

Prerequisites

Before you can create a customization job, make sure that you have:

  • Obtained the base URL of your NeMo Platform.
  • Created a FileSet and Model Entity for your base model.
  • Uploaded a dataset as a FileSet.
  • Determined the training configuration you want to use for the customization job.
  • Verified that the platform has sufficient storage for the job. Full SFT jobs require approximately 3× the base model size in free disk space; LoRA jobs require approximately 1.5×. See ft-tut-understand-models for details. If you are also deploying the model from a base checkpoint fileset, plan for ~2.5× model size overall for LoRA.
  • Set the NMP_BASE_URL environment variable to your NeMo Platform endpoint.
$export NMP_BASE_URL="https://your-nemo-platform-url"

Submit an Automodel Job

Build an AutomodelJobInput spec and submit it to the automodel backend. The job runs on the platform’s GPU cluster, and create() returns a handle you can use to poll its status.

1import os
2from nemo_platform import NeMoPlatform
3from nemo_automodel_plugin.schema import AutomodelJobInput
4
5# Initialize the client
6client = NeMoPlatform(
7 base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
8 workspace="default",
9)
10
11# Build the job spec (SFT + LoRA)
12spec = AutomodelJobInput(
13 model="default/qwen3-1.7b", # Base Model Entity (workspace/name)
14 dataset={"training": "default/my-training-dataset"},
15 training={
16 "training_type": "sft",
17 "finetuning_type": "lora",
18 "lora": {"rank": 16, "alpha": 32},
19 "max_seq_length": 2048,
20 },
21 schedule={"epochs": 3},
22 batch={"global_batch_size": 32, "micro_batch_size": 1},
23 optimizer={"learning_rate": 1e-4},
24 parallelism={"num_gpus_per_node": 1},
25 output={"name": "my-custom-model"}, # Optional: auto-generated if omitted
26)
27
28# Submit the job
29job = client.customization.automodel.jobs.create(spec=spec, workspace="default", name="my-lora-job")
30
31print(f"Submitted job: {job.job.name}")
32print(f"Job status: {job.job.status}")

:open:

1{
2 "name": "automodel-a1b2c3d4e5f6",
3 "workspace": "default",
4 "id": "platform-job-2k8i3i1HqJHHPVB5M6Bk9Z",
5 "status": "queued",
6 "spec": {
7 "model": "default/qwen3-1.7b",
8 "dataset": { "training": "default/my-training-dataset" },
9 "training": {
10 "training_type": "sft",
11 "finetuning_type": "lora",
12 "lora": { "rank": 16, "alpha": 32 },
13 "max_seq_length": 2048
14 },
15 "schedule": { "epochs": 3 },
16 "batch": { "global_batch_size": 32, "micro_batch_size": 1 },
17 "optimizer": { "learning_rate": 0.0001 },
18 "output": {
19 "name": "my-custom-model",
20 "type": "adapter",
21 "fileset": "my-custom-model-a1b2c3d4e5f6"
22 }
23 }
24}

Submit an Unsloth Job

The Unsloth backend runs on a single GPU and supports 4-bit / 8-bit quantized loading. Build a UnslothJobInput spec and submit it to the unsloth backend. Note that Unsloth uses its own field names (model.name, dataset.path, batch.per_device_train_batch_size).

1import os
2from nemo_platform import NeMoPlatform
3from nemo_unsloth_plugin.schema import UnslothJobInput
4
5client = NeMoPlatform(
6 base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
7 workspace="default",
8)
9
10spec = UnslothJobInput(
11 model={"name": "default/qwen3-1.7b", "load_in_4bit": True},
12 dataset={"path": "default/my-training-dataset", "apply_chat_template": True},
13 training={"finetuning_type": "lora", "lora": {"rank": 16, "alpha": 16}},
14 schedule={"epochs": 3},
15 batch={"per_device_train_batch_size": 2, "gradient_accumulation_steps": 4},
16 optimizer={"learning_rate": 2e-4},
17 output={"save_method": "lora"},
18)
19
20job = client.customization.unsloth.jobs.create(spec=spec, workspace="default", name="my-unsloth-lora-job")
21
22print(f"Submitted job: {job.job.name}")

Knowledge Distillation (Automodel)

Knowledge distillation is an Automodel feature. Set training.training_type to "distillation" and provide a teacher_model that references a second Model Entity. The model field is the student model being trained.

1spec = AutomodelJobInput(
2 model="default/qwen3-1.7b", # Student model
3 dataset={"training": "default/my-training-dataset"},
4 training={
5 "training_type": "distillation",
6 "finetuning_type": "lora",
7 "teacher_model": "default/qwen3-4b", # Teacher model
8 "teacher_precision": "bf16",
9 "distillation_ratio": 0.5,
10 "distillation_temperature": 2.0,
11 },
12 schedule={"epochs": 2},
13 batch={"global_batch_size": 64, "micro_batch_size": 1},
14 optimizer={"learning_rate": 5e-5},
15 parallelism={"num_gpus_per_node": 1},
16)
17
18job = client.customization.automodel.jobs.create(spec=spec, workspace="default", name="my-kd-job")

See Knowledge Distillation constraints for requirements on model compatibility, tokenizer, and GPU memory.


After Submission

A submitted job runs on the platform’s Jobs service. Manage its lifecycle — polling status, listing, and cancelling — through that service, regardless of which backend you submitted to. See Get Job Status, List Active Jobs, and Cancel a Job.

For field-level details of the job spec and W&B or MLflow integration options, see Customization Job Reference.


Training Output

When training completes, the system automatically uploads the trained artifacts to a new FileSet (output.fileset) and creates an output based on the fine-tuning regime:

training.finetuning_typeOutput Created
loraAdapter attached to the parent Model Entity
lora_mergedNew Model Entity with the adapter merged into the base weights
all_weightsNew Model Entity with all model weights (complete fine-tuned model)

LoRA Adapters

For LoRA jobs, the adapter is added to the parent Model Entity’s adapters list:

1# After training completes, retrieve the model to see the adapter
2model = client.models.retrieve(workspace="default", name="qwen3-1.7b")
3
4for adapter in model.adapters or []:
5 print(f"Adapter: {adapter.name}")
6 print(f" Fileset: {adapter.fileset}")
7 print(f" Enabled: {adapter.enabled}")

Adapters are enabled by default and are automatically loaded by NIMs serving this model with LoRA support.