synthetic.async_nemotron_cc
#
Module Contents#
Classes#
Provides a collection of methods for generating synthetic data described in the Nemotron-CC paper (https://arxiv.org/abs/2412.02595). |
API#
- class synthetic.async_nemotron_cc.AsyncNemotronCCGenerator(
- llm_client: nemo_curator.services.AsyncLLMClient,
Provides a collection of methods for generating synthetic data described in the Nemotron-CC paper (https://arxiv.org/abs/2412.02595).
Initialization
Initialize the AsyncNemotronCCGenerator instance.
Args: llm_client (LLMClient): The language model client used for querying the model.
- async distill(
- document: str,
- model: str,
- prompt_template: str = DISTILL_PROMPT_TEMPLATE,
- system_prompt: str = NEMOTRON_CC_DISTILL_SYSTEM_PROMPT,
- prompt_kwargs: dict | None = None,
- model_kwargs: dict | None = None,
Distills the essential content from a document.
Args: document (str): The input document text to distill. model (str): The model identifier to use. prompt_template (str, optional): The prompt template for distillation. Defaults to DISTILL_PROMPT_TEMPLATE. system_prompt (str, optional): The system prompt to use. Defaults to NEMOTRON_CC_DISTILL_SYSTEM_PROMPT. prompt_kwargs (dict, optional): Additional keyword arguments for the prompt. Defaults to {}. model_kwargs (dict, optional): Additional keyword arguments for the model invocation. Defaults to {}.
Returns: List[str]: A list of responses from the LLM. The list is only greater than length 1 if n > 1 is set in model_kwargs.
- async extract_knowledge(
- document: str,
- model: str,
- prompt_template: str = EXTRACT_KNOWLEDGE_PROMPT_TEMPLATE,
- system_prompt: str = NEMOTRON_CC_SYSTEM_PROMPT,
- prompt_kwargs: dict | None = None,
- model_kwargs: dict | None = None,
Extracts knowledge from the provided document.
Args: document (str): The input document text from which to extract knowledge. model (str): The model identifier to use. prompt_template (str, optional): The prompt template for knowledge extraction. Defaults to EXTRACT_KNOWLEDGE_PROMPT_TEMPLATE. system_prompt (str, optional): The system prompt to use. Defaults to NEMOTRON_CC_SYSTEM_PROMPT. prompt_kwargs (dict, optional): Additional keyword arguments for the prompt. Defaults to {}. model_kwargs (dict, optional): Additional keyword arguments for the model invocation. Defaults to {}.
Returns: List[str]: A list of responses from the LLM. The list is only greater than length 1 if n > 1 is set in model_kwargs.
- async generate_diverse_qa(
- document: str,
- model: str,
- prompt_template: str = DIVERSE_QA_PROMPT_TEMPLATE,
- system_prompt: str = NEMOTRON_CC_SYSTEM_PROMPT,
- prompt_kwargs: dict | None = None,
- model_kwargs: dict | None = None,
Generates diverse QA pairs from the provided document.
Args: document (str): The input document text used to generate QA pairs. model (str): The model identifier to use. prompt_template (str, optional): The prompt template for generating QA pairs. Defaults to DIVERSE_QA_PROMPT_TEMPLATE. system_prompt (str, optional): The system prompt to use. Defaults to NEMOTRON_CC_SYSTEM_PROMPT. prompt_kwargs (dict, optional): Additional keyword arguments for the prompt. Defaults to {}. model_kwargs (dict, optional): Additional keyword arguments for the model invocation. Defaults to {}.
Returns: List[str]: A list of responses from the LLM. The list is only greater than length 1 if n > 1 is set in model_kwargs.
- async generate_knowledge_list(
- document: str,
- model: str,
- prompt_template: str = KNOWLEDGE_LIST_PROMPT_TEMPLATE,
- system_prompt: str = NEMOTRON_CC_SYSTEM_PROMPT,
- prompt_kwargs: dict | None = None,
- model_kwargs: dict | None = None,
Generates a list of knowledge items from the provided document.
Args: document (str): The input document text to process. model (str): The model identifier to use. prompt_template (str, optional): The prompt template for generating a knowledge list. Defaults to KNOWLEDGE_LIST_PROMPT_TEMPLATE. system_prompt (str, optional): The system prompt to use. Defaults to NEMOTRON_CC_SYSTEM_PROMPT. prompt_kwargs (dict, optional): Additional keyword arguments for the prompt. Defaults to {}. model_kwargs (dict, optional): Additional keyword arguments for the model invocation. Defaults to {}.
Returns: List[str]: A list of responses from the LLM. The list is only greater than length 1 if n > 1 is set in model_kwargs.
- async rewrite_to_wikipedia_style(
- document: str,
- model: str,
- prompt_template: str = WIKIPEDIA_REPHRASING_PROMPT_TEMPLATE,
- system_prompt: str = NEMOTRON_CC_SYSTEM_PROMPT,
- prompt_kwargs: dict | None = None,
- model_kwargs: dict | None = None,
Rewrites a document into a Wikipedia-style narrative.
Args: document (str): The input document text to rewrite. model (str): The model identifier to use. prompt_template (str, optional): The prompt template for rewriting. Defaults to WIKIPEDIA_REPHRASING_PROMPT_TEMPLATE. system_prompt (str, optional): The system prompt to use. Defaults to NEMOTRON_CC_SYSTEM_PROMPT. prompt_kwargs (dict, optional): Additional keyword arguments for the prompt. Defaults to {}. model_kwargs (dict, optional): Additional keyword arguments for the model invocation. Defaults to {}.
Returns: List[str]: A list of responses from the LLM. The list is only greater than length 1 if n > 1 is set in model_kwargs.