Overview | NeMo Gym

Model servers are stateless LLM inference endpoints. They receive a conversation and return the model’s next output (text, tool calls, or code) with no memory or orchestration logic. During training, you will typically have at least one active Model server: the “policy” model being trained.

Any OpenAI-compatible inference backend can serve as a Model server. NeMo Gym provides middleware to bridge format differences (e.g., converting between Chat Completions and Responses API schemas).

Model servers implement ResponsesAPIModel and expose two endpoints:

/v1/responses — OpenAI Responses API
- This is the default input/output schema for all NeMo Gym rollouts.
/v1/chat/completions — OpenAI Chat Completions API

Backend Guides

Guides for OpenAI and Azure OpenAI Responses API models and more are coming soon!

vLLM

Self-hosted inference with vLLM for maximum control.

self-hostedopen-source

Server Configuration

Model Server Fields for server configuration syntax and fields.

Any OpenAI-compatible inference backend can serve as a Model server. NeMo Gym provides middleware to bridge format differences (e.g., converting between Chat Completions and Responses API schemas).

Model servers implement ResponsesAPIModel and expose two endpoints:

/v1/responses — OpenAI Responses API
- This is the default input/output schema for all NeMo Gym rollouts.
/v1/chat/completions — OpenAI Chat Completions API

Backend Guides

Guides for OpenAI and Azure OpenAI Responses API models and more are coming soon!

vLLM

Self-hosted inference with vLLM for maximum control.

self-hostedopen-source

Server Configuration

Model Server Fields for server configuration syntax and fields.