Model Server#

Model servers provide stateless LLM inference via OpenAI-compatible endpoints. They implement ResponsesAPIModel and expose two endpoints:

/v1/responses — OpenAI Responses API
- This is the default input/output schema for all NeMo Gym rollouts.
/v1/chat/completions — OpenAI Chat Completions API

Backend Guides#

Guides for OpenAI and Azure OpenAI Responses API models and more are coming soon!

vLLM

Self-hosted inference with vLLM for maximum control.

self-hosted open-source