nemo_curator.models.client.llm_client

Module Contents

Classes

Name	Description
`AsyncLLMClient`	Interface representing a client connecting to an LLM inference server
`ConversationFormatter`	Represents a way of formatting a conversation with an LLM
`GenerationConfig`	Configuration class for LLM generation parameters.
`LLMClient`	Interface representing a client connecting to an LLM inference server

API

class nemo_curator.models.client.llm_client.AsyncLLMClient(
    max_concurrent_requests: int = 5,
    max_retries: int = 3,
    base_delay: float = 1.0
)

Abstract

Interface representing a client connecting to an LLM inference server and making requests asynchronously

nemo_curator.models.client.llm_client.AsyncLLMClient._query_model_impl(
    messages: collections.abc.Iterable,
    model: str,
    conversation_formatter: nemo_curator.models.client.llm_client.ConversationFormatter | None = None,
    generation_config: nemo_curator.models.client.llm_client.GenerationConfig | dict | None = None
) -> list[str]

asyncabstract

Internal implementation of query_model without retry/concurrency logic. Subclasses should implement this method instead of query_model.

nemo_curator.models.client.llm_client.AsyncLLMClient.query_model(
    messages: collections.abc.Iterable,
    model: str,
    conversation_formatter: nemo_curator.models.client.llm_client.ConversationFormatter | None = None,
    generation_config: nemo_curator.models.client.llm_client.GenerationConfig | dict | None = None
) -> list[str]

async

Query the model with automatic retry and concurrency control.

nemo_curator.models.client.llm_client.AsyncLLMClient.setup() -> None

abstract

Setup the client.

class nemo_curator.models.client.llm_client.ConversationFormatter()

Abstract

Represents a way of formatting a conversation with an LLM such that it can response appropriately

nemo_curator.models.client.llm_client.ConversationFormatter.format_conversation(
    conv: list[dict]
) -> str

abstract

class nemo_curator.models.client.llm_client.GenerationConfig(
    max_tokens: int | None = 2048,
    n: int | None = 1,
    seed: int | None = 0,
    stop: str | None | list[str] = None,
    stream: bool = False,
    temperature: float | None = 0.0,
    top_k: int | None = None,
    top_p: float | None = 0.95,
    extra_kwargs: dict | None = None
)

Dataclass

Configuration class for LLM generation parameters.

extra_kwargs

dict | None = None

max_tokens

int | None = 2048

int | None = 1

seed

int | None = 0

stop

str | None | list[str] = None

stream

bool = False

temperature

float | None = 0.0

top_k

int | None = None

top_p

float | None = 0.95

class nemo_curator.models.client.llm_client.LLMClient()

Abstract

Interface representing a client connecting to an LLM inference server and making requests synchronously

nemo_curator.models.client.llm_client.LLMClient.query_model(
    messages: collections.abc.Iterable,
    model: str,
    conversation_formatter: nemo_curator.models.client.llm_client.ConversationFormatter | None = None,
    generation_config: nemo_curator.models.client.llm_client.GenerationConfig | dict | None = None
) -> list[str]

abstract

nemo_curator.models.client.llm_client.LLMClient.setup() -> None

abstract

Setup the client.