> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/curator/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/curator/llms-full.txt.

# nemo_curator.core.serve.server

## Module Contents

### Classes

| Name                                                                 | Description                                             |
| -------------------------------------------------------------------- | ------------------------------------------------------- |
| [`InferenceServer`](#nemo_curator-core-serve-server-InferenceServer) | Serve one or more models behind a typed backend config. |

### Functions

| Name                                                                                       | Description                                                              |
| ------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------ |
| [`is_inference_server_active`](#nemo_curator-core-serve-server-is_inference_server_active) | Check whether any inference server is currently running in this process. |

### Data

[`_active_servers`](#nemo_curator-core-serve-server-_active_servers)

### API

<Anchor id="nemo_curator-core-serve-server-InferenceServer">
  <CodeBlock links={{"nemo_curator.core.serve.base.BaseModelConfig":"/nemo-curator/nemo_curator/core/serve/base#nemo_curator-core-serve-base-BaseModelConfig","nemo_curator.core.serve.base.BaseServerConfig":"/nemo-curator/nemo_curator/core/serve/base#nemo_curator-core-serve-base-BaseServerConfig"}} showLineNumbers={false} wordWrap={true}>
    ```python
    class nemo_curator.core.serve.server.InferenceServer(
        models: list[nemo_curator.core.serve.base.BaseModelConfig],
        backend: nemo_curator.core.serve.base.BaseServerConfig = RayServeServerConfig(),
        name: str = 'default',
        port: int = DEFAULT_SERVE_PORT,
        health_check_timeout_s: int = DEFAULT_SERVE_HEALTH_TIMEOUT_S,
        verbose: bool = False
    )
    ```
  </CodeBlock>
</Anchor>

<Indent>
  <Badge>
    Dataclass
  </Badge>

  Serve one or more models behind a typed backend config.

  <ParamField path="_backend_impl" type="InferenceBackend | None = field(init=False, default=None, repr=False)" />

  <ParamField path="_host" type="str = field(init=False, default='localhost', repr=False)" />

  <ParamField path="_started" type="bool = field(init=False, default=False, repr=False)" />

  <ParamField path="backend" type="BaseServerConfig = field(default_factory=RayServeServerConfig)" />

  <ParamField path="endpoint" type="str">
    OpenAI-compatible base URL for the served models.
  </ParamField>

  <ParamField path="health_check_timeout_s" type="int = DEFAULT_SERVE_HEALTH_TIMEOUT_S" />

  <ParamField path="models" type="list[BaseModelConfig]" />

  <ParamField path="name" type="str = 'default'" />

  <ParamField path="port" type="int = DEFAULT_SERVE_PORT" />

  <ParamField path="verbose" type="bool = False" />

  <Anchor id="nemo_curator-core-serve-server-InferenceServer-__enter__">
    <CodeBlock showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_curator.core.serve.server.InferenceServer.__enter__()
      ```
    </CodeBlock>
  </Anchor>

  <Indent />

  <Anchor id="nemo_curator-core-serve-server-InferenceServer-__exit__">
    <CodeBlock showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_curator.core.serve.server.InferenceServer.__exit__(
          exc = ()
      )
      ```
    </CodeBlock>
  </Anchor>

  <Indent />

  <Anchor id="nemo_curator-core-serve-server-InferenceServer-__post_init__">
    <CodeBlock showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_curator.core.serve.server.InferenceServer.__post_init__() -> None
      ```
    </CodeBlock>
  </Anchor>

  <Indent />

  <Anchor id="nemo_curator-core-serve-server-InferenceServer-_create_backend">
    <CodeBlock links={{"nemo_curator.core.serve.base.InferenceBackend":"/nemo-curator/nemo_curator/core/serve/base#nemo_curator-core-serve-base-InferenceBackend"}} showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_curator.core.serve.server.InferenceServer._create_backend() -> nemo_curator.core.serve.base.InferenceBackend
      ```
    </CodeBlock>
  </Anchor>

  <Indent />

  <Anchor id="nemo_curator-core-serve-server-InferenceServer-_validate_model_configs">
    <CodeBlock showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_curator.core.serve.server.InferenceServer._validate_model_configs() -> None
      ```
    </CodeBlock>
  </Anchor>

  <Indent>
    Check every model is accepted by the backend and that all models share one concrete type.
  </Indent>

  <Anchor id="nemo_curator-core-serve-server-InferenceServer-_wait_for_healthy">
    <CodeBlock showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_curator.core.serve.server.InferenceServer._wait_for_healthy() -> None
      ```
    </CodeBlock>
  </Anchor>

  <Indent>
    Poll `/v1/models` until all expected models appear in the response.
  </Indent>

  <Anchor id="nemo_curator-core-serve-server-InferenceServer-start">
    <CodeBlock showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_curator.core.serve.server.InferenceServer.start() -> None
      ```
    </CodeBlock>
  </Anchor>

  <Indent>
    Deploy all models and wait for them to become healthy.
  </Indent>

  <Anchor id="nemo_curator-core-serve-server-InferenceServer-stop">
    <CodeBlock showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_curator.core.serve.server.InferenceServer.stop() -> None
      ```
    </CodeBlock>
  </Anchor>

  <Indent>
    Shut down the active inference backend and release resources.
  </Indent>
</Indent>

<Anchor id="nemo_curator-core-serve-server-is_inference_server_active">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.core.serve.server.is_inference_server_active() -> bool
    ```
  </CodeBlock>
</Anchor>

<Indent>
  Check whether any inference server is currently running in this process.
</Indent>

<Anchor id="nemo_curator-core-serve-server-_active_servers">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.core.serve.server._active_servers: set[str] = set()
    ```
  </CodeBlock>
</Anchor>