> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/gym/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/gym/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/gym/_mcp/server.

# Generating Training Data

Generate synthetic task data (user queries) for the [Workplace Assistant](/environment-tutorials/real-world-environment) environment using [NeMo Data Designer](https://github.com/NVIDIA-NeMo/DataDesigner).

This pipeline focuses on generating tasks for use with the environment. It also simulates agent trajectories, but these are used for quality filtering and validation — the environment itself produces the actual model responses during rollout collection. The Workplace Assistant uses 27 tools across 6 databases, and NeMo Data Designer can produce realistic multi-step user queries at scale.

***

## Pipeline Overview

The data generation pipeline:

1. Load tool schemas for the Workplace Assistant environment
2. Use NeMo Data Designer to generate realistic multi-step user queries
3. Simulate agent trajectories (step-by-step tool-call solutions)
4. Apply dual-level LLM judge filtering to ensure data quality
5. Export task data in NeMo Gym JSONL format

***

## Notebook

The tutorial is provided as a Jupyter notebook. See the [notebook README](https://github.com/NVIDIA-NeMo/Gym/blob/main/resources_servers/workplace_assistant/notebooks/synthetic-data-generation/) for prerequisites and setup instructions.

[View Notebook on GitHub](https://github.com/NVIDIA-NeMo/Gym/blob/main/resources_servers/workplace_assistant/notebooks/synthetic-data-generation/multistep-toolcalling-sdg.ipynb)

***

## What's Next?

After generating your tasks, let's perform [GRPO training](/training-tutorials/nemo-rl-grpo) with NeMo RL by having an agent attempt the tasks in the Workplace Assistant environment.