synthetic.async_nemotron_cc#
Module Contents#
Classes#
| Provides a collection of methods for generating synthetic data described in the Nemotron-CC paper (https://arxiv.org/abs/2412.02595). | 
API#
- class synthetic.async_nemotron_cc.AsyncNemotronCCGenerator(
- llm_client: nemo_curator.services.AsyncLLMClient,
- Provides a collection of methods for generating synthetic data described in the Nemotron-CC paper (https://arxiv.org/abs/2412.02595). - Initialization - Initialize the AsyncNemotronCCGenerator instance. - Args: llm_client (LLMClient): The language model client used for querying the model. - async distill(
- document: str,
- model: str,
- prompt_template: str = DISTILL_PROMPT_TEMPLATE,
- system_prompt: str = NEMOTRON_CC_DISTILL_SYSTEM_PROMPT,
- prompt_kwargs: dict | None = None,
- model_kwargs: dict | None = None,
- Distills the essential content from a document. - Args: document (str): The input document text to distill. model (str): The model identifier to use. prompt_template (str, optional): The prompt template for distillation. Defaults to DISTILL_PROMPT_TEMPLATE. system_prompt (str, optional): The system prompt to use. Defaults to NEMOTRON_CC_DISTILL_SYSTEM_PROMPT. prompt_kwargs (dict, optional): Additional keyword arguments for the prompt. Defaults to {}. model_kwargs (dict, optional): Additional keyword arguments for the model invocation. Defaults to {}. - Returns: List[str]: A list of responses from the LLM. The list is only greater than length 1 if n > 1 is set in model_kwargs. 
 - async extract_knowledge(
- document: str,
- model: str,
- prompt_template: str = EXTRACT_KNOWLEDGE_PROMPT_TEMPLATE,
- system_prompt: str = NEMOTRON_CC_SYSTEM_PROMPT,
- prompt_kwargs: dict | None = None,
- model_kwargs: dict | None = None,
- Extracts knowledge from the provided document. - Args: document (str): The input document text from which to extract knowledge. model (str): The model identifier to use. prompt_template (str, optional): The prompt template for knowledge extraction. Defaults to EXTRACT_KNOWLEDGE_PROMPT_TEMPLATE. system_prompt (str, optional): The system prompt to use. Defaults to NEMOTRON_CC_SYSTEM_PROMPT. prompt_kwargs (dict, optional): Additional keyword arguments for the prompt. Defaults to {}. model_kwargs (dict, optional): Additional keyword arguments for the model invocation. Defaults to {}. - Returns: List[str]: A list of responses from the LLM. The list is only greater than length 1 if n > 1 is set in model_kwargs. 
 - async generate_diverse_qa(
- document: str,
- model: str,
- prompt_template: str = DIVERSE_QA_PROMPT_TEMPLATE,
- system_prompt: str = NEMOTRON_CC_SYSTEM_PROMPT,
- prompt_kwargs: dict | None = None,
- model_kwargs: dict | None = None,
- Generates diverse QA pairs from the provided document. - Args: document (str): The input document text used to generate QA pairs. model (str): The model identifier to use. prompt_template (str, optional): The prompt template for generating QA pairs. Defaults to DIVERSE_QA_PROMPT_TEMPLATE. system_prompt (str, optional): The system prompt to use. Defaults to NEMOTRON_CC_SYSTEM_PROMPT. prompt_kwargs (dict, optional): Additional keyword arguments for the prompt. Defaults to {}. model_kwargs (dict, optional): Additional keyword arguments for the model invocation. Defaults to {}. - Returns: List[str]: A list of responses from the LLM. The list is only greater than length 1 if n > 1 is set in model_kwargs. 
 - async generate_knowledge_list(
- document: str,
- model: str,
- prompt_template: str = KNOWLEDGE_LIST_PROMPT_TEMPLATE,
- system_prompt: str = NEMOTRON_CC_SYSTEM_PROMPT,
- prompt_kwargs: dict | None = None,
- model_kwargs: dict | None = None,
- Generates a list of knowledge items from the provided document. - Args: document (str): The input document text to process. model (str): The model identifier to use. prompt_template (str, optional): The prompt template for generating a knowledge list. Defaults to KNOWLEDGE_LIST_PROMPT_TEMPLATE. system_prompt (str, optional): The system prompt to use. Defaults to NEMOTRON_CC_SYSTEM_PROMPT. prompt_kwargs (dict, optional): Additional keyword arguments for the prompt. Defaults to {}. model_kwargs (dict, optional): Additional keyword arguments for the model invocation. Defaults to {}. - Returns: List[str]: A list of responses from the LLM. The list is only greater than length 1 if n > 1 is set in model_kwargs. 
 - async rewrite_to_wikipedia_style(
- document: str,
- model: str,
- prompt_template: str = WIKIPEDIA_REPHRASING_PROMPT_TEMPLATE,
- system_prompt: str = NEMOTRON_CC_SYSTEM_PROMPT,
- prompt_kwargs: dict | None = None,
- model_kwargs: dict | None = None,
- Rewrites a document into a Wikipedia-style narrative. - Args: document (str): The input document text to rewrite. model (str): The model identifier to use. prompt_template (str, optional): The prompt template for rewriting. Defaults to WIKIPEDIA_REPHRASING_PROMPT_TEMPLATE. system_prompt (str, optional): The system prompt to use. Defaults to NEMOTRON_CC_SYSTEM_PROMPT. prompt_kwargs (dict, optional): Additional keyword arguments for the prompt. Defaults to {}. model_kwargs (dict, optional): Additional keyword arguments for the model invocation. Defaults to {}. - Returns: List[str]: A list of responses from the LLM. The list is only greater than length 1 if n > 1 is set in model_kwargs.