Model Server
Model servers are stateless LLM inference endpoints. They receive a conversation and return the model’s next output (text, tool calls, or code) with no memory or orchestration logic. During training, you will typically have at least one active Model server: the “policy” model being trained.
Any OpenAI-compatible inference backend can serve as a Model server. NeMo Gym provides middleware to bridge format differences (e.g., converting between Chat Completions and Responses API schemas).
Model servers implement ResponsesAPIModel and expose two endpoints:
/v1/responses— OpenAI Responses API- This is the default input/output schema for all NeMo Gym rollouts.
/v1/chat/completions— OpenAI Chat Completions API
Backend Guides
Guides for OpenAI and Azure OpenAI Responses API models and more are coming soon!
Server Configuration
Model Server Fields for server configuration syntax and fields.