Create Job | NVIDIA NeMo Platform

Customization jobs are submitted to one of two backends. Choose the backend that matches your hardware and training goal, then build that backend’s job spec and submit it. For the full per-backend hyperparameter reference, see Training Configuration.

Backend	Best for	Methods
Automodel (default)	Production fine-tuning, larger models, multi-GPU scaling	SFT, distillation; LoRA, merged-LoRA, or full-weight
Unsloth	Memory-constrained single-GPU training	SFT; LoRA with optional 4-bit / 8-bit loading, or unquantized full-weight

Prerequisites

Before you can create a customization job, make sure that you have:

Obtained the base URL of your NeMo Platform.
Created a FileSet and Model Entity for your base model.
Uploaded a dataset as a FileSet.
Determined the training configuration you want to use for the customization job.
Verified that the platform has sufficient storage for the job. Budget against the downloaded base checkpoint size: Full SFT jobs require approximately 3× in free disk space, and LoRA jobs require approximately 1.5×. See ft-tut-understand-models for details. Include any retained deployment copies separately.
Set the NMP_BASE_URL environment variable to your NeMo Platform endpoint.

$ export NMP_BASE_URL="https://your-nemo-platform-url"

Submit an Automodel Job

Build an AutomodelJobInput spec and submit it to the automodel backend. The job runs on the platform’s GPU cluster, and create() returns a handle you can use to poll its status.

1 import os
2 from nemo_platform import NeMoPlatform
3 from nemo_automodel_plugin.schema import AutomodelJobInput
4 
5 # Initialize the client
6 client = NeMoPlatform(
7     base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
8     workspace="default",
9 )
10 
11 # Build the job spec (SFT + LoRA)
12 spec = AutomodelJobInput(
13     model="default/qwen3-1.7b",  # Base Model Entity (workspace/name)
14     dataset={"training": "default/my-training-dataset"},
15     training={
16         "training_type": "sft",
17         "finetuning_type": "lora",
18         "lora": {"rank": 16, "alpha": 32},
19         "max_seq_length": 2048,
20     },
21     schedule={"epochs": 3},
22     batch={"global_batch_size": 32, "micro_batch_size": 1},
23     optimizer={"learning_rate": 1e-4},
24     parallelism={"num_gpus_per_node": 1},
25     output={"name": "my-custom-model"},  # Optional: auto-generated if omitted
26 )
27 
28 # Submit the job
29 job = client.customization.automodel.jobs.create(spec=spec, workspace="default", name="my-lora-job")
30 
31 print(f"Submitted job: {job.job.name}")
32 print(f"Job status: {job.job.status}")

The response preserves the explicit name. If you omit name, the platform generates a backend-prefixed job name.

Example Response

:open:

1 {
2   "name": "my-lora-job",
3   "workspace": "default",
4   "id": "platform-job-2k8i3i1HqJHHPVB5M6Bk9Z",
5   "status": "queued",
6   "spec": {
7     "model": "default/qwen3-1.7b",
8     "dataset": { "training": "default/my-training-dataset" },
9     "training": {
10       "training_type": "sft",
11       "finetuning_type": "lora",
12       "lora": { "rank": 16, "alpha": 32 },
13       "max_seq_length": 2048
14     },
15     "schedule": { "epochs": 3 },
16     "batch": { "global_batch_size": 32, "micro_batch_size": 1 },
17     "optimizer": { "learning_rate": 0.0001 },
18     "output": {
19       "name": "my-custom-model",
20       "type": "adapter",
21       "fileset": "my-custom-model-a1b2c3d4e5f6"
22     }
23   }
24 }

Submit an Unsloth Job

The Unsloth backend runs on a single GPU and supports 4-bit / 8-bit quantized loading for LoRA. Full-weight training requires model.load_in_4bit=false and model.load_in_8bit=false. Build a UnslothJobInput spec and submit it to the unsloth backend. Note that Unsloth uses its own field names (model.name, dataset.path, batch.per_device_train_batch_size).

1 import os
2 from nemo_platform import NeMoPlatform
3 from nemo_unsloth_plugin.schema import UnslothJobInput
4 
5 client = NeMoPlatform(
6     base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
7     workspace="default",
8 )
9 
10 spec = UnslothJobInput(
11     model={"name": "default/qwen3-1.7b", "load_in_4bit": True},
12     dataset={"path": "default/my-training-dataset", "apply_chat_template": True},
13     training={"finetuning_type": "lora", "lora": {"rank": 16, "alpha": 16}},
14     schedule={"epochs": 3},
15     batch={"per_device_train_batch_size": 2, "gradient_accumulation_steps": 4},
16     optimizer={"learning_rate": 2e-4},
17     output={"save_method": "lora"},
18 )
19 
20 job = client.customization.unsloth.jobs.create(spec=spec, workspace="default", name="my-unsloth-lora-job")
21 
22 print(f"Submitted job: {job.job.name}")

Knowledge Distillation (Automodel)

Knowledge distillation is an Automodel feature. Set training.training_type to "distillation" and provide a teacher_model that references a second Model Entity. The model field is the student model being trained.

1 spec = AutomodelJobInput(
2     model="default/qwen3-1.7b",  # Student model
3     dataset={"training": "default/my-training-dataset"},
4     training={
5         "training_type": "distillation",
6         "finetuning_type": "lora",
7         "teacher_model": "default/qwen3-4b",  # Teacher model
8         "teacher_precision": "bf16",
9         "distillation_ratio": 0.5,
10         "distillation_temperature": 2.0,
11     },
12     schedule={"epochs": 2},
13     batch={"global_batch_size": 64, "micro_batch_size": 1},
14     optimizer={"learning_rate": 5e-5},
15     parallelism={"num_gpus_per_node": 1},
16 )
17 
18 job = client.customization.automodel.jobs.create(spec=spec, workspace="default", name="my-kd-job")

See Knowledge Distillation constraints for requirements on model compatibility, tokenizer, and GPU memory.

After Submission

A submitted job runs on the platform’s Jobs service. Manage its lifecycle — polling status, listing, and cancelling — through that service, regardless of which backend you submitted to. See Get Job Status, List Active Jobs, and Cancel a Job.

For field-level details of the job spec and W&B or MLflow integration options, see Customization Job Reference.

Training Output

When training completes, the system automatically uploads the trained artifacts to a new FileSet (output.fileset) and creates an output based on the fine-tuning regime:

`training.finetuning_type`	Output Created
`lora`	Adapter attached to the parent Model Entity
`lora_merged`	New Model Entity with the adapter merged into the base weights
`all_weights`	New Model Entity with all model weights (complete fine-tuned model)

LoRA Adapters

For LoRA jobs, the adapter is added to the parent Model Entity’s adapters list:

1 # After training completes, retrieve the model to see the adapter
2 model = client.models.retrieve(workspace="default", name="qwen3-1.7b")
3 
4 for adapter in model.adapters or []:
5     print(f"Adapter: {adapter.name}")
6     print(f" Fileset: {adapter.fileset}")
7     print(f" Enabled: {adapter.enabled}")

Adapters are enabled by default and are automatically loaded by NIMs serving this model with LoRA support.