synthetic.nemotron#
Module Contents#
Classes#
| Represents a way of formatting a conversation with an LLM such that it can response appropriately | |
| Provides a collection of methods for generating synthetic data described in the Nemotron-4 340B Technical Report (https://arxiv.org/abs/2406.11704v1) and inspired by the UltraChat paper (https://arxiv.org/abs/2305.14233) | 
API#
- class synthetic.nemotron.NemotronFormatter#
- Bases: - nemo_curator.services.conversation_formatter.ConversationFormatter- Represents a way of formatting a conversation with an LLM such that it can response appropriately - PROMPT_PREFIX = <Multiline-String>#
 - static format_conversation(conv: list[dict]) str#
- Formats a converstation between a user and assistant in the Nemotron 340B format described here: https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/nemotron-4-340b-instruct Args: conv: A conversation between a user and assistant Returns: A conversation formatted as text 
 
- class synthetic.nemotron.NemotronGenerator(
- llm_client: nemo_curator.services.model_client.LLMClient,
- Provides a collection of methods for generating synthetic data described in the Nemotron-4 340B Technical Report (https://arxiv.org/abs/2406.11704v1) and inspired by the UltraChat paper (https://arxiv.org/abs/2305.14233) - Initialization - classify_math_entity(
- entity: str,
- model: str,
- prompt_template: str = DEFAULT_MATH_CLASSIFICATION_PROMPT_TEMPLATE,
- prompt_kwargs: dict | None = None,
- model_kwargs: dict | None = None,
- Prompts an LLM to classify if an entity is related to math Args: entity: The entity to classify model: The name of the model that should be used to generate the response. Must be available in the LLMClient passed in the constructor. prompt_template: A format string of the prompt to use. It must have the following parameters: - entity: Will be populated with the entity passed in this function prompt_kwargs: Any additional keyword arguments that should be passed to the prompt template. None are needed for the default template. model_kwargs: Any additional keyword arguments that should be passed to the LLMClient.query_model call. Returns: A list of responses from the LLM. The list is only greater than length 1 if n > 1 is set in model_kwargs. 
 - classify_python_entity(
- entity: str,
- model: str,
- prompt_template: str = DEFAULT_PYTHON_CLASSIFICATION_PROMPT_TEMPLATE,
- prompt_kwargs: dict | None = None,
- model_kwargs: dict | None = None,
- Prompts an LLM to classify if an entity is related to Python Args: entity: The entity to classify model: The name of the model that should be used to generate the response. Must be available in the LLMClient passed in the constructor. prompt_template: A format string of the prompt to use. It must have the following parameters: - entity: Will be populated with the entity passed in this function prompt_kwargs: Any additional keyword arguments that should be passed to the prompt template. None are needed for the default template. model_kwargs: Any additional keyword arguments that should be passed to the LLMClient.query_model call. Returns: A list of responses from the LLM. The list is only greater than length 1 if n > 1 is set in model_kwargs. 
 - convert_response_to_yaml_list(
- llm_response: str,
- model: str,
- prompt_template: str = DEFAULT_YAML_CONVERSION_PROMPT_TEMPLATE,
- prompt_kwargs: dict | None = None,
- model_kwargs: dict | None = None,
- Converts a response of an LLM to a list of strings by querying an LLM Args: llm_response: The original unformatted response of the LLM model: The name of the model that should be used to generate the response. Must be available in the LLMClient passed in the constructor. prompt_template: A format string of the prompt to use. It must have a {llm_response} parameter that will be populated with the llm_response value passed in this function. prompt_kwargs: Any additional keyword arguments that should be passed to the prompt template. None are needed for the default template. model_kwargs: Any additional keyword arguments that should be passed to the LLMClient.query_model call. Returns: A parsed list of elements from the original LLM response 
 - generate_closed_qa_instructions(
- document: str,
- n_openlines: str | int,
- model: str,
- prompt_template: str = DEFAULT_CLOSED_QA_PROMPT_TEMPLATE,
- prompt_kwargs: dict | None = None,
- model_kwargs: dict | None = None,
- Prompts an LLM to generate a list of closed Q&A questions based on a reference document Args: document: The document to use when generating questions n_openlines: The number of questions to generate per document. model: The name of the model that should be used to generate the response. Must be available in the LLMClient passed in the constructor. prompt_template: A format string of the prompt to use. It must have the following parameters: - document: Will be populated with the document passed in this function - n_openlines: Will be populated with the n_openlines passed in this function prompt_kwargs: Any additional keyword arguments that should be passed to the prompt template. None are needed for the default template. model_kwargs: Any additional keyword arguments that should be passed to the LLMClient.query_model call. Returns: A list of responses from the LLM. The list is only greater than length 1 if n > 1 is set in model_kwargs. 
 - generate_dialogue(
- openline: str,
- user_model: str,
- assistant_model: str,
- n_user_turns: int = 3,
- prompt_template: str = DIALOGUE_NORMAL_USER_TURN_PROMPT_TEMPLATE,
- prompt_kwargs: dict | None = None,
- user_model_kwargs: dict | None = None,
- assistant_model_kwargs: dict | None = None,
- Prompts an LLM to generate a dialogue based on a given openline. The LLM will alternate impersonating the user and the assistant. Args: openline: The openline that will comprise the first user turn. user_model: The model that will be impersonating the user. Must be available in the LLMClient passed in the constructor. assistant_model: The model that will be impersonating the assistant Must be available in the LLMClient passed in the constructor. n_user_turns: The number of user turns to go through. The openline counts as 1 user turn. Therefore, if there are 3 user turns, 2 will be generated by the LLM impersonating the user. prompt_template: A format string of the prompt to use when impersonating the user. It must have the following parameters: - converstation_history: Will be populated with a formatted history of the dialogue up to that point. Some example templates found in nemo_curator.synthetic include: - DIALOGUE_NORMAL_USER_TURN_PROMPT_TEMPLATE - DIALOGUE_COMPLEX_USER_TURN_PROMPT_TEMPLATE - DIALOGUE_CONCISE_USER_TURN_PROMPT_TEMPLATE prompt_kwargs: Any additional keyword arguments that should be passed to the prompt template. None are needed for the default template. user_model_kwargs: Any additional keyword arguments that should be passed to the LLMClient.query_model call for the user. assistant_model_kwargs: Any additional keyword arguments that should be passed to the LLMClient.query_model call for the assistant. Returns: A conversation between a User and Assistant 
 - generate_macro_topics(
- n_macro_topics: int | str,
- model: str,
- prompt_template: str = DEFAULT_MACRO_TOPICS_PROMPT_TEMPLATE,
- prompt_kwargs: dict | None = None,
- model_kwargs: dict | None = None,
- Prompts an LLM to generate a list of macro topics about the world Args: n_macro_topics: The number of macro topics to generate. model: The name of the model that should be used to generate the macro topics. Must be available in the LLMClient passed in the constructor. prompt_template: A format string of the prompt to use. It must have the following parameters: - n_macro_topics: Will be populated with the n_macro_topics passed in this function prompt_kwargs: Any additional keyword arguments that should be passed to the prompt template. None are needed for the default template. model_kwargs: Any additional keyword arguments that should be passed to the LLMClient.query_model call. Returns: A list of responses from the LLM. The list is only greater than length 1 if n > 1 is set in model_kwargs. 
 - generate_math_macro_topics(
- n_macro_topics: int | str,
- school_level: str,
- model: str,
- prompt_template: str = DEFAULT_MATH_MACRO_TOPICS_PROMPT_TEMPLATE,
- prompt_kwargs: dict | None = None,
- model_kwargs: dict | None = None,
- Prompts an LLM to generate a list of macro topics about math Args: n_macro_topics: The number of macro topics to generate. Can be an integer like 5 or a string like “five”. school_level: The school level the math questions should be targeted at. model: The name of the model that should be used to generate the macro topics. Must be available in the LLMClient passed in the constructor. prompt_template: A format string of the prompt to use. It must have the following parameters: - n_macro_topics: Will be populated with the n_macro_topics passed in this function - school_level: Will be populated with the school_level passed in this function prompt_kwargs: Any additional keyword arguments that should be passed to the prompt template. None are needed for the default template. model_kwargs: Any additional keyword arguments that should be passed to the LLMClient.query_model call. Returns: A list of responses from the LLM. The list is only greater than length 1 if n > 1 is set in model_kwargs. 
 - generate_math_problem(
- topic: str,
- n_openlines: str | int,
- model: str,
- prompt_template: str = MATH_PROBLEM_GENERAL_PROMPT_TEMPLATE,
- prompt_kwargs: dict | None = None,
- model_kwargs: dict | None = None,
- Prompts an LLM to generate a list of math problems based on a topic Args: topic: The topic to generate problems for. n_openlines: The number of problems to generate per topic. model: The name of the model that should be used to generate the response. Must be available in the LLMClient passed in the constructor. prompt_template: A format string of the prompt to use. It must have the following parameters: - n_openlines: Will be populated with the n_subtopics passed in this function - topic: Will be populated with the topic passed in this function Some example templates found in nemo_curator.synthetic include: - MATH_PROBLEM_GENERAL_PROMPT_TEMPLATE - MATH_PROBLEM_BEGINNER_PROMPT_TEMPLATE prompt_kwargs: Any additional keyword arguments that should be passed to the prompt template. None are needed for the default template. model_kwargs: Any additional keyword arguments that should be passed to the LLMClient.query_model call. Returns: A list of responses from the LLM. The list is only greater than length 1 if n > 1 is set in model_kwargs. 
 - generate_math_subtopics(
- macro_topic: str,
- n_subtopics: int | str,
- model: str,
- prompt_template: str = DEFAULT_MATH_SUBTOPICS_PROMPT_TEMPLATE,
- prompt_kwargs: dict | None = None,
- model_kwargs: dict | None = None,
- Prompts an LLM to generate a list of subtopics relating to a math macro topic Args: macro_topic: The macro topic to generate subtopics for. n_subtopics: The number of subtopics to generate per macro topic model: The name of the model that should be used to generate the response. Must be available in the LLMClient passed in the constructor. prompt_template: A format string of the prompt to use. It must have the following parameters: - n_subtopics: Will be populated with the n_subtopics passed in this function - macro_topic: Will be populated with the macro_topic passed in this function prompt_kwargs: Any additional keyword arguments that should be passed to the prompt template. None are needed for the default template. model_kwargs: Any additional keyword arguments that should be passed to the LLMClient.query_model call. Returns: A list of responses from the LLM. The list is only greater than length 1 if n > 1 is set in model_kwargs. 
 - generate_open_qa_from_topic(
- topic: str,
- n_openlines: str | int,
- model: str,
- prompt_template: str = DEFAULT_OPEN_QA_FROM_TOPICS_PROMPT_TEMPLATE,
- prompt_kwargs: dict | None = None,
- model_kwargs: dict | None = None,
- Prompts an LLM to generate a list of open Q&A questions based on a topic Args: topic: The topic to generate questions for. n_openlines: The number of questions to generate per topic. model: The name of the model that should be used to generate the response. Must be available in the LLMClient passed in the constructor. prompt_template: A format string of the prompt to use. It must have the following parameters: - n_openlines: Will be populated with the n_subtopics passed in this function - topic: Will be populated with the topic passed in this function prompt_kwargs: Any additional keyword arguments that should be passed to the prompt template. None are needed for the default template. model_kwargs: Any additional keyword arguments that should be passed to the LLMClient.query_model call. Returns: A list of responses from the LLM. The list is only greater than length 1 if n > 1 is set in model_kwargs. 
 - generate_python_macro_topics(
- n_macro_topics: int | str,
- model: str,
- prompt_template: str = DEFAULT_PYTHON_MACRO_TOPICS_PROMPT_TEMPLATE,
- prompt_kwargs: dict | None = None,
- model_kwargs: dict | None = None,
- Prompts an LLM to generate a list of macro topics about the Python programming language Args: n_macro_topics: The number of macro topics to generate. Can be an integer like 5 or a string like “five”. model: The name of the model that should be used to generate the macro topics. Must be available in the LLMClient passed in the constructor. prompt_template: A format string of the prompt to use. It must have the following parameters: - n_macro_topics: Will be populated with the n_macro_topics passed in this function prompt_kwargs: Any additional keyword arguments that should be passed to the prompt template. None are needed for the default template. model_kwargs: Any additional keyword arguments that should be passed to the LLMClient.query_model call. Returns: A list of responses from the LLM. The list is only greater than length 1 if n > 1 is set in model_kwargs. 
 - generate_python_problem(
- topic: str,
- n_openlines: str | int,
- model: str,
- language: str = 'Python',
- prompt_template: str = PYTHON_PROBLEM_BEGINNER_PROMPT_TEMPLATE,
- prompt_kwargs: dict | None = None,
- model_kwargs: dict | None = None,
- Prompts an LLM to generate a list of coding problems based on a topic Args: topic: The topic to generate problems for. n_openlines: The number of problems to generate per topic. model: The name of the model that should be used to generate the response. Must be available in the LLMClient passed in the constructor. language: The programming language to target when generating these questions. prompt_template: A format string of the prompt to use. It must have the following parameters: - n_openlines: Will be populated with the n_subtopics passed in this function - topic: Will be populated with the topic passed in this function - language: Will be populated with the language passed in this function Some example templates found in nemo_curator.synthetic include: - PYTHON_PROBLEM_BEGINNER_PROMPT_TEMPLATE - PYTHON_PROBLEM_INTERMEDIATE_PROMPT_TEMPLATE - PYTHON_PROBLEM_ADVANCED_PROMPT_TEMPLATE prompt_kwargs: Any additional keyword arguments that should be passed to the prompt template. None are needed for the default template. model_kwargs: Any additional keyword arguments that should be passed to the LLMClient.query_model call. Returns: A list of responses from the LLM. The list is only greater than length 1 if n > 1 is set in model_kwargs. 
 - generate_python_subtopics(
- macro_topic: str,
- n_subtopics: int | str,
- model: str,
- prompt_template: str = DEFAULT_PYTHON_SUBTOPICS_PROMPT_TEMPLATE,
- prompt_kwargs: dict | None = None,
- model_kwargs: dict | None = None,
- Prompts an LLM to generate a list of subtopics relating to a Python macro topic Args: macro_topic: The macro topic to generate subtopics for. n_subtopics: The number of subtopics to generate per macro topic model: The name of the model that should be used to generate the response. Must be available in the LLMClient passed in the constructor. prompt_template: A format string of the prompt to use. It must have the following parameters: - n_subtopics: Will be populated with the n_subtopics passed in this function - macro_topic: Will be populated with the macro_topic passed in this function prompt_kwargs: Any additional keyword arguments that should be passed to the prompt template. None are needed for the default template. model_kwargs: Any additional keyword arguments that should be passed to the LLMClient.query_model call. Returns: A list of responses from the LLM. The list is only greater than length 1 if n > 1 is set in model_kwargs. 
 - generate_subtopics(
- macro_topic: str,
- n_subtopics: int | str,
- model: str,
- prompt_template: str = DEFAULT_SUBTOPICS_PROMPT_TEMPLATE,
- prompt_kwargs: dict | None = None,
- model_kwargs: dict | None = None,
- Prompts an LLM to generate a list of subtopics relating to a macro topic Args: macro_topic: The macro topic to generate subtopics for. n_subtopics: The number of subtopics to generate per macro topic model: The name of the model that should be used to generate the response. Must be available in the LLMClient passed in the constructor. prompt_template: A format string of the prompt to use. It must have the following parameters: - n_subtopics: Will be populated with the n_subtopics passed in this function - macro_topic: Will be populated with the macro_topic passed in this function prompt_kwargs: Any additional keyword arguments that should be passed to the prompt template. None are needed for the default template. model_kwargs: Any additional keyword arguments that should be passed to the LLMClient.query_model call. Returns: A list of responses from the LLM. The list is only greater than length 1 if n > 1 is set in model_kwargs. 
 - generate_two_turn_prompt(
- openline: str,
- user_model: str,
- assistant_model: str,
- prompt_template: str = DIALOGUE_NORMAL_USER_TURN_PROMPT_TEMPLATE,
- prompt_kwargs: dict | None = None,
- user_model_kwargs: dict | None = None,
- assistant_model_kwargs: dict | None = None,
- Prompts an LLM to generate a response as an assistant, then as the user based on a given openline. The conversation will look like “User -> Assistant -> User” Args: openline: The openline that will comprise the first user turn. user_model: The model that will be impersonating the user. Must be available in the LLMClient passed in the constructor. assistant_model: The model that will be impersonating the assistant Must be available in the LLMClient passed in the constructor. prompt_template: A format string of the prompt to use when impersonating the user. It must have the following parameters: - converstation_history: Will be populated with a formatted history of the dialogue up to that point. Some example templates found in nemo_curator.synthetic include: - DIALOGUE_NORMAL_USER_TURN_PROMPT_TEMPLATE - DIALOGUE_COMPLEX_USER_TURN_PROMPT_TEMPLATE - DIALOGUE_CONCISE_USER_TURN_PROMPT_TEMPLATE prompt_kwargs: Any additional keyword arguments that should be passed to the prompt template. None are needed for the default template. user_model_kwargs: Any additional keyword arguments that should be passed to the LLMClient.query_model call for the user. assistant_model_kwargs: Any additional keyword arguments that should be passed to the LLMClient.query_model call for the assistant. Returns: A conversation between a User and Assistant 
 - generate_writing_tasks(
- topic: str,
- text_material_type: str,
- n_openlines: str | int,
- model: str,
- prompt_template: str = DEFAULT_WRITING_TASK_PROMPT_TEMPLATE,
- prompt_kwargs: dict | None = None,
- model_kwargs: dict | None = None,
- Prompts an LLM to generate a list of writing tasks based on a topic and document type Args: topic: The topic to generate writing tasks for. text_material_type: The type of the document the question should ask to generate (e.g., “Email”, “Poem”) n_openlines: The number of tasks to generate per topic and text material pair. model: The name of the model that should be used to generate the response. Must be available in the LLMClient passed in the constructor. prompt_template: A format string of the prompt to use. It must have the following parameters: - topic: Will be populated with the topic passed in this function - text_material_type: Will be populated with the text_material_type passed in this function - n_openlines: Will be populated with the n_openlines passed in this function prompt_kwargs: Any additional keyword arguments that should be passed to the prompt template. None are needed for the default template. model_kwargs: Any additional keyword arguments that should be passed to the LLMClient.query_model call. Returns: A list of responses from the LLM. The list is only greater than length 1 if n > 1 is set in model_kwargs. 
 - revise_open_qa(
- openline: str,
- n_revisions: str | int,
- model: str,
- prompt_template: str = DEFAULT_REVISE_OPEN_QA_PROMPT_TEMPLATE,
- prompt_kwargs: dict | None = None,
- model_kwargs: dict | None = None,
- Prompts an LLM to revise an open Q&A question a given number of times Args: openline: An openline to revise n_revisions: The number of revisions to generate for the question. model: The name of the model that should be used to generate the response. Must be available in the LLMClient passed in the constructor. prompt_template: A format string of the prompt to use. It must have the following parameters: - openline: Will be populated with the openline passed in this function - n_revisions: Will be populated with the n_revisions passed in this function prompt_kwargs: Any additional keyword arguments that should be passed to the prompt template. None are needed for the default template. model_kwargs: Any additional keyword arguments that should be passed to the LLMClient.query_model call. Returns: A list of responses from the LLM. The list is only greater than length 1 if n > 1 is set in model_kwargs. 
 - revise_writing_tasks(
- openline: str,
- n_revisions: str | int,
- model: str,
- prompt_template: str = DEFAULT_REVISE_WRITING_TASK_PROMPT_TEMPLATE,
- prompt_kwargs: dict | None = None,
- model_kwargs: dict | None = None,
- Prompts an LLM to revise a writing task a given number of times Args: openline: An openline to revise n_revisions: The number of revisions to generate for the task. model: The name of the model that should be used to generate the response. Must be available in the LLMClient passed in the constructor. prompt_template: A format string of the prompt to use. It must have the following parameters: - openline: Will be populated with the openline passed in this function - n_revisions: Will be populated with the n_revisions passed in this function prompt_kwargs: Any additional keyword arguments that should be passed to the prompt template. None are needed for the default template. model_kwargs: Any additional keyword arguments that should be passed to the LLMClient.query_model call. Returns: A list of responses from the LLM. The list is only greater than length 1 if n > 1 is set in model_kwargs. 
 - run_closed_qa_pipeline(
- documents: list[str],
- n_openlines: str | int,
- model: str,
- closed_qa_prompt_template: str = DEFAULT_CLOSED_QA_PROMPT_TEMPLATE,
- yaml_conversion_prompt_template: str = DEFAULT_YAML_CONVERSION_PROMPT_TEMPLATE,
- base_model_kwargs: dict | None = None,
- conversion_model_kwargs: dict | None = None,
- ignore_conversion_failure: bool = False,
- Runs a pipeline for automatically generating closed Q&A openlines for a dialogue Args: documents: A list of documents to generate closed Q&A questions for n_openlines: The number of questions to generate per document. model: The name of the model that should be used to generate all the responses. Must be available in the LLMClient passed in the constructor. closed_qa_prompt_template: A format string of the prompt to use. It must have the following parameters: - n_openlines: Will be populated with the n_openlines passed in this function - document: Will be populated with one element of the documents list passed in this function No additional parameters may be passed to this prompt template. yaml_conversion_prompt_template: A format string of the prompt to use. It must have the following parameters: - llm_response: Will be populated with the raw LLM response from each stage of the pipeline No additional parameters may be passed to this prompt template. base_model_kwargs: Any additional keyword arguments that should be passed to the LLMClient.query_model call for the normal stages of the pipeline. conversion_model_kwargs: Any additional keyword arguments that should be passed to the LLMClient.query_model call for the yaml conversion stages of the pipeline. ignore_conversion_failure: Ignores yaml conversion failures when able and discards the data that conversion was attempted on Returns: A list of pairs where the first element represents the index of the document used to generate the question in the documents list and the second element represents a synthetically generated closed Q&A prompt. Example: [(0, “Summarize this document”), …] 
 - run_math_pipeline(
- n_macro_topics: str | int,
- school_level: str,
- n_subtopics: str | int,
- n_openlines: str | int,
- model: str,
- macro_topic_prompt_template: str = DEFAULT_MATH_MACRO_TOPICS_PROMPT_TEMPLATE,
- subtopic_prompt_template: str = DEFAULT_MATH_SUBTOPICS_PROMPT_TEMPLATE,
- math_problem_prompt_template: str = MATH_PROBLEM_GENERAL_PROMPT_TEMPLATE,
- yaml_conversion_prompt_template: str = DEFAULT_YAML_CONVERSION_PROMPT_TEMPLATE,
- base_model_kwargs: dict | None = None,
- conversion_model_kwargs: dict | None = None,
- additional_macro_topics: list[str] | None = None,
- additional_subtopics: list[str] | None = None,
- ignore_conversion_failure: bool = False,
- combine_topics: bool = True,
- Runs a pipeline for automatically generating math questions for a dialogue Args: n_macro_topics: The number of macro topics to generate. school_level: The school level to target when generating macro topics. n_subtopics: The number of subtopics to generate per macro topic. n_openlines: The number of questions to generate per topic. model: The name of the model that should be used to generate all the responses. Must be available in the LLMClient passed in the constructor. macro_topic_prompt_template: A format string of the prompt to use. It must have the following parameters: - n_macro_topics: Will be populated with the n_macro_topics passed in this function - school_level: Will be populated with the school_level passed in this function No additional parameters may be passed to this prompt template. subtopic_prompt_template: A format string of the prompt to use. It must have the following parameters: - n_subtopics: Will be populated with the n_subtopics passed in this function - macro_topic: Will be populated with a generated macro topic No additional parameters may be passed to this prompt template. math_problem_prompt_template: A format string of the prompt to use. It must have the following parameters: - n_openlines: Will be populated with the n_openlines passed in this function - topic: Will be populated with a generated topic No additional parameters may be passed to this prompt template. Some example templates found in nemo_curator.synthetic include: - MATH_PROBLEM_GENERAL_PROMPT_TEMPLATE - MATH_PROBLEM_BEGINNER_PROMPT_TEMPLATE yaml_conversion_prompt_template: A format string of the prompt to use. It must have the following parameters: - llm_response: Will be populated with the raw LLM response from each stage of the pipeline No additional parameters may be passed to this prompt template. base_model_kwargs: Any additional keyword arguments that should be passed to the LLMClient.query_model call for the normal stages of the pipeline. conversion_model_kwargs: Any additional keyword arguments that should be passed to the LLMClient.query_model call for the yaml conversion stages of the pipeline. ignore_conversion_failure: Ignores yaml conversion failures when able and discards the data that conversion was attempted on combine_topics: If True, mixes the macro topics with the subtopics when generating openlines. If False, only the subtopics are used. Returns: A list of synthetically generated math prompts 
 - run_open_qa_pipeline(
- n_macro_topics: str | int,
- n_subtopics: str | int,
- n_openlines: str | int,
- n_revisions: str | int,
- model: str,
- macro_topic_prompt_template: str = DEFAULT_MACRO_TOPICS_PROMPT_TEMPLATE,
- subtopic_prompt_template: str = DEFAULT_SUBTOPICS_PROMPT_TEMPLATE,
- open_qa_from_topics_prompt_template: str = DEFAULT_OPEN_QA_FROM_TOPICS_PROMPT_TEMPLATE,
- revise_open_qa_prompt_template: str = DEFAULT_REVISE_OPEN_QA_PROMPT_TEMPLATE,
- yaml_conversion_prompt_template: str = DEFAULT_YAML_CONVERSION_PROMPT_TEMPLATE,
- base_model_kwargs: dict | None = None,
- conversion_model_kwargs: dict | None = None,
- additional_macro_topics: list[str] | None = None,
- additional_subtopics: list[str] | None = None,
- ignore_conversion_failure: bool = False,
- combine_topics: bool = True,
- Runs a pipeline for automatically generating Open Q&A openlines for a dialogue Args: n_macro_topics: The number of macro topics to generate n_subtopics: The number of subtopics to generate per macro topic n_openlines: The number of questions to generate per topic. n_revisions: The number of revisions to generate per original question. model: The name of the model that should be used to generate all the responses. Must be available in the LLMClient passed in the constructor. macro_topic_prompt_template: A format string of the prompt to use. It must have the following parameters: - n_macro_topics: Will be populated with the n_macro_topics passed in this function No additional parameters may be passed to this prompt template. subtopic_prompt_template: A format string of the prompt to use. It must have the following parameters: - n_subtopics: Will be populated with the n_subtopics passed in this function - macro_topic: Will be populated with a generated macro topic No additional parameters may be passed to this prompt template. open_qa_from_topics_prompt_template: A format string of the prompt to use. It must have the following parameters: - n_openlines: Will be populated with the n_openlines passed in this function - topic: Will be populated with a generated topic No additional parameters may be passed to this prompt template. revise_open_qa_prompt_template: A format string of the prompt to use. It must have the following parameters: - n_revisions: Will be populated with the n_revisions passed in this function - openline: Will be populated with a generated open Q&A openline No additional parameters may be passed to this prompt template. yaml_conversion_prompt_template: A format string of the prompt to use. It must have the following parameters: - llm_response: Will be populated with the raw LLM response from each stage of the pipeline No additional parameters may be passed to this prompt template. base_model_kwargs: Any additional keyword arguments that should be passed to the LLMClient.query_model call for the normal stages of the pipeline. conversion_model_kwargs: Any additional keyword arguments that should be passed to the LLMClient.query_model call for the yaml conversion stages of the pipeline. ignore_conversion_failure: Ignores yaml conversion failures when able and discards the data that conversion was attempted on combine_topics: If True, mixes the macro topics with the subtopics when generating openlines. If False, only the subtopics are used. Returns: A list of synthetically generated open Q&A prompts 
 - run_python_pipeline(
- n_macro_topics: str | int,
- n_subtopics: str | int,
- n_openlines: str | int,
- model: str,
- macro_topic_prompt_template: str = DEFAULT_PYTHON_MACRO_TOPICS_PROMPT_TEMPLATE,
- subtopic_prompt_template: str = DEFAULT_PYTHON_SUBTOPICS_PROMPT_TEMPLATE,
- python_problem_prompt_template: str = PYTHON_PROBLEM_BEGINNER_PROMPT_TEMPLATE,
- yaml_conversion_prompt_template: str = DEFAULT_YAML_CONVERSION_PROMPT_TEMPLATE,
- base_model_kwargs: dict | None = None,
- conversion_model_kwargs: dict | None = None,
- additional_macro_topics: list[str] | None = None,
- additional_subtopics: list[str] | None = None,
- ignore_conversion_failure: bool = False,
- combine_topics: bool = True,
- Runs a pipeline for automatically generating Python questions for a dialogue Args: n_macro_topics: The number of macro topics to generate. n_subtopics: The number of subtopics to generate per macro topic. n_openlines: The number of questions to generate per topic. model: The name of the model that should be used to generate all the responses. Must be available in the LLMClient passed in the constructor. macro_topic_prompt_template: A format string of the prompt to use. It must have the following parameters: - n_macro_topics: Will be populated with the n_macro_topics passed in this function No additional parameters may be passed to this prompt template. subtopic_prompt_template: A format string of the prompt to use. It must have the following parameters: - n_subtopics: Will be populated with the n_subtopics passed in this function - macro_topic: Will be populated with a generated macro topic No additional parameters may be passed to this prompt template. python_problem_prompt_template: A format string of the prompt to use. It must have the following parameters: - n_openlines: Will be populated with the n_openlines passed in this function - language: Will be populated with “Python” - topic: Will be populated with a generated topic No additional parameters may be passed to this prompt template. Some example templates found in nemo_curator.synthetic include: - PYTHON_PROBLEM_BEGINNER_PROMPT_TEMPLATE - PYTHON_PROBLEM_INTERMEDIATE_PROMPT_TEMPLATE - PYTHON_PROBLEM_ADVANCED_PROMPT_TEMPLATE yaml_conversion_prompt_template: A format string of the prompt to use. It must have the following parameters: - llm_response: Will be populated with the raw LLM response from each stage of the pipeline No additional parameters may be passed to this prompt template. base_model_kwargs: Any additional keyword arguments that should be passed to the LLMClient.query_model call for the normal stages of the pipeline. conversion_model_kwargs: Any additional keyword arguments that should be passed to the LLMClient.query_model call for the yaml conversion stages of the pipeline. ignore_conversion_failure: Ignores yaml conversion failures when able and discards the data that conversion was attempted on combine_topics: If True, mixes the macro topics with the subtopics when generating openlines. If False, only the subtopics are used. Returns: A list of synthetically generated Python prompts 
 - run_writing_pipeline(
- topics: list[str],
- text_material_types: list[str],
- n_openlines: str | int,
- n_revisions: str | int,
- model: str,
- writing_task_prompt_template: str = DEFAULT_WRITING_TASK_PROMPT_TEMPLATE,
- revise_writing_task_prompt_template: str = DEFAULT_REVISE_WRITING_TASK_PROMPT_TEMPLATE,
- yaml_conversion_prompt_template: str = DEFAULT_YAML_CONVERSION_PROMPT_TEMPLATE,
- base_model_kwargs: dict | None = None,
- conversion_model_kwargs: dict | None = None,
- ignore_conversion_failure: bool = False,
- Runs a pipeline for automatically generating writing task openlines for a dialogue Args: topics: A list of topics to generate tasks for text_material_types: A list of writing material types, like “Essay” or “Blog post” n_openlines: The number of tasks to generate per (topic, text_material_type) pair. n_revisions: The number of revisions to generate per original task. model: The name of the model that should be used to generate all the responses. Must be available in the LLMClient passed in the constructor. writing_task_prompt_template: A format string of the prompt to use. It must have the following parameters: - n_openlines: Will be populated with the n_openlines passed in this function - topic: Will be populated with one element of the topics list passed in this function - text_material_type: Will be populated with one element of the text_material_types list passed in this function No additional parameters may be passed to this prompt template. revise_writing_task_prompt_template: A format string of the prompt to use. It must have the following parameters: - n_revisions: Will be populated with the n_revisions passed in this function - openline: Will be populated with one of the writing tasks generated in the pipeline. No additional parameters may be passed to this prompt template. yaml_conversion_prompt_template: A format string of the prompt to use. It must have the following parameters: - llm_response: Will be populated with the raw LLM response from each stage of the pipeline No additional parameters may be passed to this prompt template. base_model_kwargs: Any additional keyword arguments that should be passed to the LLMClient.query_model call for the normal stages of the pipeline. conversion_model_kwargs: Any additional keyword arguments that should be passed to the LLMClient.query_model call for the yaml conversion stages of the pipeline. ignore_conversion_failure: Ignores yaml conversion failures when able and discards the data that conversion was attempted on Returns: A list of synthetically generated writing task prompts