Synthetic Data How-To Guides#
This section provides task-focused guides for common SDG workflows. For your first run, start with Generate Your First Synthetic Dataset.
If you are new to model training or want a calmer on-ramp before tasks, read Use the SDG Skill With Confidence for how to run a productive session with a coding agent.
Run the Pipeline
Preview, generate, and customize output path and projection.
Create a Domain Dataset
Adapt the pipeline to a custom domain with a seed file and multiple category dimensions.
Generate Tool-Call Data
Generate multi-turn conversations with OpenAI-style tool calls for tool-use SFT.
Generate Preference Data
Generate DPO preference pairs (prompt / chosen / rejected) from rl_pref.yaml.
Dispatch to a Cluster
Configure an env.toml profile and run SDG on Lepton or Slurm.