nemo_curator.stages.synthetic.qa_multilingual_synthetic

View as Markdown

This module contains a simple stage for generating synthetic data. It takes in Empty task and a prompt and produces the output in form of a DocumentBatch.

Module Contents

Classes

NameDescription
QAMultilingualSyntheticStageA simple stage for generating synthetic data. It takes in Empty task and a prompt and produces the output in form of a DocumentBatch.

API

class nemo_curator.stages.synthetic.qa_multilingual_synthetic.QAMultilingualSyntheticStage(
prompt: str,
languages: list[str],
client: nemo_curator.models.client.llm_client.AsyncLLMClient | nemo_curator.models.client.llm_client.LLMClient, client: nemo_curator.models.client.llm_client.AsyncLLMClient | nemo_curator.models.client.llm_client.LLMClient,
model_name: str,
num_samples: int,
generation_config: nemo_curator.models.client.llm_client.GenerationConfig | None = None,
name: str = 'QAMultilingualSyntheticStage'
)
Dataclass

Bases: ProcessingStage[_EmptyTask, DocumentBatch]

A simple stage for generating synthetic data. It takes in Empty task and a prompt and produces the output in form of a DocumentBatch.

client
AsyncLLMClient | LLMClient
generation_config
GenerationConfig | None = None
languages
list[str]
model_name
str
name
str = 'QAMultilingualSyntheticStage'
num_samples
int
prompt
str
nemo_curator.stages.synthetic.qa_multilingual_synthetic.QAMultilingualSyntheticStage.__post_init__() -> None
nemo_curator.stages.synthetic.qa_multilingual_synthetic.QAMultilingualSyntheticStage._generate_responses_async() -> list[str]
async

Generate responses asynchronously using concurrent requests.

nemo_curator.stages.synthetic.qa_multilingual_synthetic.QAMultilingualSyntheticStage._process_async() -> list[str]

Process samples using async client (concurrent).

This method handles both cases:

  • Normal case: No event loop exists, creates one with asyncio.run()
  • Edge case: Called from async context, runs in separate thread
nemo_curator.stages.synthetic.qa_multilingual_synthetic.QAMultilingualSyntheticStage._process_llm_response(
response: list[str]
) -> str

Process a single response from the LLM.

nemo_curator.stages.synthetic.qa_multilingual_synthetic.QAMultilingualSyntheticStage._process_sync() -> list[str]

Process samples using synchronous client (sequential).

nemo_curator.stages.synthetic.qa_multilingual_synthetic.QAMultilingualSyntheticStage.inputs() -> tuple[list[str], list[str]]
nemo_curator.stages.synthetic.qa_multilingual_synthetic.QAMultilingualSyntheticStage.outputs() -> tuple[list[str], list[str]]
nemo_curator.stages.synthetic.qa_multilingual_synthetic.QAMultilingualSyntheticStage.process(
_: nemo_curator.tasks._EmptyTask
) -> nemo_curator.tasks.DocumentBatch
nemo_curator.stages.synthetic.qa_multilingual_synthetic.QAMultilingualSyntheticStage.setup(
_: nemo_curator.backends.base.WorkerMetadata | None = None
) -> None