Local vLLM Proxy
LocalVLLMModelProxy (in responses_api_models/local_vllm_model_proxy) is a lightweight model server that forwards requests to an existing LocalVLLMModel instead of launching its own vLLM engine.
It is a subclass of VLLMModel, so it accepts the same configuration fields, but it owns no GPUs.
When to use it
Use a proxy when you need several model servers that share one vLLM deployment but differ in their request-time configuration. For example, one server with reasoning enabled and one with reasoning disabled through the request params, or servers with different sampling parameters. Without the proxy you would have to launch a separate vLLM engine (and duplicate GPUs) for each variation.
At startup the proxy waits for its referenced LocalVLLMModel to come up, reads that server’s inner vLLM endpoint (base_url, api_key, model), and routes all of its own requests there.
If you are working with an existing vLLM endpoint that you manage outside of Gym, use VLLMModel instead.
Configuration
A proxy is a normal model server config that adds a model_server reference pointing at the LocalVLLMModel it should forward to:
Run it alongside the backing LocalVLLMModel by chaining both configs in config_paths:
base_url, api_key, and model are populated automatically from the backing server and should not be set in your config.
All other VLLMModel fields (chat_template_kwargs, extra_body, return_token_id_information, and so on) behave as documented in the VLLMModel configuration reference.