Model Server#
Model servers provide stateless LLM inference via OpenAI-compatible endpoints. They implement ResponsesAPIModel and expose two endpoints:
/v1/responses— OpenAI Responses APIThis is the default input/output schema for all NeMo Gym rollouts.
/v1/chat/completions— OpenAI Chat Completions API
Backend Guides#
Guides for OpenAI and Azure OpenAI Responses API models and more are coming soon!
vLLM
Self-hosted inference with vLLM for maximum control.