nemo_curator.core.serve.server

View as Markdown

Module Contents

Classes

NameDescription
InferenceServerServe one or more models behind a typed backend config.

Functions

NameDescription
is_inference_server_activeCheck whether any inference server is currently running in this process.

Data

_active_servers

API

class nemo_curator.core.serve.server.InferenceServer(
models: list[nemo_curator.core.serve.base.BaseModelConfig],
backend: nemo_curator.core.serve.base.BaseServerConfig = RayServeServerConfig(),
name: str = 'default',
port: int = DEFAULT_SERVE_PORT,
health_check_timeout_s: int = DEFAULT_SERVE_HEALTH_TIMEOUT_S,
verbose: bool = False
)
Dataclass

Serve one or more models behind a typed backend config.

_backend_impl
InferenceBackend | None = field(init=False, default=None, repr=False)
_host
str = field(init=False, default='localhost', repr=False)
_started
bool = field(init=False, default=False, repr=False)
backend
BaseServerConfig = field(default_factory=RayServeServerConfig)
endpoint
str

OpenAI-compatible base URL for the served models.

health_check_timeout_s
int = DEFAULT_SERVE_HEALTH_TIMEOUT_S
models
list[BaseModelConfig]
name
str = 'default'
port
int = DEFAULT_SERVE_PORT
verbose
bool = False
nemo_curator.core.serve.server.InferenceServer.__enter__()
nemo_curator.core.serve.server.InferenceServer.__exit__(
exc = ()
)
nemo_curator.core.serve.server.InferenceServer.__post_init__() -> None
nemo_curator.core.serve.server.InferenceServer._create_backend() -> nemo_curator.core.serve.base.InferenceBackend
nemo_curator.core.serve.server.InferenceServer._validate_model_configs() -> None

Check every model is accepted by the backend and that all models share one concrete type.

nemo_curator.core.serve.server.InferenceServer._wait_for_healthy() -> None

Poll /v1/models until all expected models appear in the response.

nemo_curator.core.serve.server.InferenceServer.start() -> None

Deploy all models and wait for them to become healthy.

nemo_curator.core.serve.server.InferenceServer.stop() -> None

Shut down the active inference backend and release resources.

nemo_curator.core.serve.server.is_inference_server_active() -> bool

Check whether any inference server is currently running in this process.

nemo_curator.core.serve.server._active_servers: set[str] = set()