nemo_curator.models.client.llm_client

View as Markdown

Module Contents

Classes

NameDescription
AsyncLLMClientInterface representing a client connecting to an LLM inference server
ConversationFormatterRepresents a way of formatting a conversation with an LLM
GenerationConfigConfiguration class for LLM generation parameters.
LLMClientInterface representing a client connecting to an LLM inference server

API

class nemo_curator.models.client.llm_client.AsyncLLMClient(
max_concurrent_requests: int = 5,
max_retries: int = 3,
base_delay: float = 1.0
)
Abstract

Interface representing a client connecting to an LLM inference server and making requests asynchronously

nemo_curator.models.client.llm_client.AsyncLLMClient._query_model_impl(
messages: collections.abc.Iterable,
model: str,
conversation_formatter: nemo_curator.models.client.llm_client.ConversationFormatter | None = None,
generation_config: nemo_curator.models.client.llm_client.GenerationConfig | dict | None = None
) -> list[str]
asyncabstract

Internal implementation of query_model without retry/concurrency logic. Subclasses should implement this method instead of query_model.

nemo_curator.models.client.llm_client.AsyncLLMClient.query_model(
messages: collections.abc.Iterable,
model: str,
conversation_formatter: nemo_curator.models.client.llm_client.ConversationFormatter | None = None,
generation_config: nemo_curator.models.client.llm_client.GenerationConfig | dict | None = None
) -> list[str]
async

Query the model with automatic retry and concurrency control.

nemo_curator.models.client.llm_client.AsyncLLMClient.setup() -> None
abstract

Setup the client.

class nemo_curator.models.client.llm_client.ConversationFormatter()
Abstract

Represents a way of formatting a conversation with an LLM such that it can response appropriately

nemo_curator.models.client.llm_client.ConversationFormatter.format_conversation(
conv: list[dict]
) -> str
abstract
class nemo_curator.models.client.llm_client.GenerationConfig(
max_tokens: int | None = 2048,
n: int | None = 1,
seed: int | None = 0,
stop: str | None | list[str] = None,
stream: bool = False,
temperature: float | None = 0.0,
top_k: int | None = None,
top_p: float | None = 0.95,
extra_kwargs: dict | None = None
)
Dataclass

Configuration class for LLM generation parameters.

extra_kwargs
dict | None = None
max_tokens
int | None = 2048
n
int | None = 1
seed
int | None = 0
stop
str | None | list[str] = None
stream
bool = False
temperature
float | None = 0.0
top_k
int | None = None
top_p
float | None = 0.95
class nemo_curator.models.client.llm_client.LLMClient()
Abstract

Interface representing a client connecting to an LLM inference server and making requests synchronously

nemo_curator.models.client.llm_client.LLMClient.query_model(
messages: collections.abc.Iterable,
model: str,
conversation_formatter: nemo_curator.models.client.llm_client.ConversationFormatter | None = None,
generation_config: nemo_curator.models.client.llm_client.GenerationConfig | dict | None = None
) -> list[str]
abstract
nemo_curator.models.client.llm_client.LLMClient.setup() -> None
abstract

Setup the client.