nemo_curator.stages.synthetic.nemotron_cc.base
nemo_curator.stages.synthetic.nemotron_cc.base
This module contains a simple stage for generating synthetic data. It takes in Empty task and a prompt and produces the output in form of a DocumentBatch.
Module Contents
Classes
API
Dataclass
Bases: ProcessingStage[DocumentBatch, DocumentBatch]
A simple stage for generating synthetic data. It takes in Empty task and a prompt and produces the output in form of a DocumentBatch.
client
generation_config
input_field
model_name
name
output_field
prompt
system_prompt
async
Generate responses asynchronously using concurrent requests.
Process samples using async client (concurrent).
This method handles both cases:
- Normal case: No event loop exists, creates one with asyncio.run()
- Edge case: Called from async context, runs in separate thread
Process the input sample to create the LLM prompt.
Process a single response from the LLM.
Process DataFrame using synchronous sequential processing.