> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/gym/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/gym/llms-full.txt.

# Overview

Model servers are stateless LLM inference endpoints. They receive a conversation and return the model's next output (text, tool calls, or code) with no memory or orchestration logic. During training, you will typically have at least one active Model server: the "policy" model being trained.

Any OpenAI-compatible inference backend can serve as a Model server. NeMo Gym provides middleware to bridge format differences (e.g., converting between Chat Completions and Responses API schemas).

Model servers implement `ResponsesAPIModel` and expose two endpoints:

* **`/v1/responses`** — [OpenAI Responses API](https://developers.openai.com/api/reference/resources/responses/methods/create)
  * This is the default input/output schema for all NeMo Gym rollouts.
* **`/v1/chat/completions`** — [OpenAI Chat Completions API](https://developers.openai.com/api/reference/resources/chat/subresources/completions/methods/create)

## Backend Guides

Guides for OpenAI and Azure OpenAI Responses API models and more are coming soon!

<Cards>
  <Card title="vLLM" href="/v0.2/model-server/vllm">
    Self-hosted inference with vLLM for maximum control.

    <Badge minimal outlined>
      self-hosted
    </Badge>

    <Badge minimal outlined>
      open-source
    </Badge>
  </Card>
</Cards>

## Server Configuration

<Note>
  [Model Server Fields](/v0.2/reference/configuration#model-server-fields) for server configuration syntax and fields.
</Note>