For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
  • About NVIDIA OpenShell
    • Overview
    • How It Works
    • Installation
    • Container Gateway
    • Supported Agents
    • Release Notes
  • Get Started
    • Quickstart
    • Tutorials
      • Docker Compose Setup
      • First Network Policy
      • GitHub Push Access
      • Inference with Ollama
      • Local Inference with LM Studio
      • Microsoft Graph Provider Refresh
  • Manage OpenShell
    • Sandboxes
    • Gateways
    • Providers
    • Providers v2
    • Policies
    • Policy Advisor
    • Inference Routing
  • Providers
    • Google Vertex AI
  • Observability
    • Accessing Logs
    • Logging
    • OCSF JSON Export
  • Kubernetes
    • Setup
    • Managing Certificates
    • Ingress
    • Access Control
    • OpenShift
  • Reference
    • Gateway Auth
    • Default Policy
    • Policy Schema
    • Compute Drivers
    • Gateway Config
    • Support Matrix
  • Security
    • Security Best Practices
  • Resources
    • License
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoOpenShell
On this page
  • Prerequisites
  • Start LM Studio Local Server
  • Test with a small model
  • Add LM Studio as a provider
  • Configure LM Studio as the local inference provider
  • Verify from Inside a Sandbox
  • Troubleshooting
  • Next Steps
Get StartedTutorials

Route Local Inference Requests to LM Studio

||View as Markdown|
Previous

Run Local Inference with Ollama

Next

Refresh Microsoft Graph Credentials with Providers v2

This tutorial describes how to configure OpenShell to route inference requests to a local LM Studio server.

The LM Studio server provides easy setup with both OpenAI and Anthropic compatible endpoints.

This tutorial covers:

  • Expose a local inference server to OpenShell sandboxes.
  • Verify end-to-end inference from inside a sandbox.

Prerequisites

First, complete OpenShell installation and follow the Quickstart.

Install the LM Studio app. Make sure that your LM Studio is running in the same environment as your gateway.

If you prefer to work without having to keep the LM Studio app open, download llmster (headless LM Studio) with the following command:

Linux/Mac
Windows
$curl -fsSL https://lmstudio.ai/install.sh | bash

And start llmster:

$lms daemon up
1

Start LM Studio Local Server

Start the LM Studio local server from the Developer tab, and verify the OpenAI-compatible endpoint is enabled.

LM Studio listens to 127.0.0.1:1234 by default. For use with OpenShell, configure LM Studio to listen on all interfaces (0.0.0.0).

If you use the GUI, go to the Developer Tab, select Server Settings, then enable Serve on Local Network.

If you use llmster in headless mode, run lms server start --bind 0.0.0.0.

2

Test with a small model

In the LM Studio app, head to the Model Search tab to download a small model like Qwen3.5 2B.

In the terminal, use the following command to download and load the model:

$lms get qwen/qwen3.5-2b
$lms load qwen/qwen3.5-2b
3

Add LM Studio as a provider

Choose the provider type that matches the client protocol you want to route through inference.local.

OpenAI-compatible
Anthropic-compatible

Add LM Studio as an OpenAI-compatible provider through host.openshell.internal:

$openshell provider create \
> --name lmstudio \
> --type openai \
> --credential OPENAI_API_KEY=lmstudio \
> --config OPENAI_BASE_URL=http://host.openshell.internal:1234/v1

Use this provider for clients that send OpenAI-compatible requests such as POST /v1/chat/completions or POST /v1/responses.

4

Configure LM Studio as the local inference provider

Set the managed inference route for the active gateway:

OpenAI-compatible
Anthropic-compatible
$openshell inference set --provider lmstudio --model qwen/qwen3.5-2b

If the command succeeds, OpenShell has verified that the upstream is reachable and accepts the expected OpenAI-compatible request shape.

The active inference.local route is gateway-scoped, so only one provider and model pair is active at a time. Re-run openshell inference set whenever you want to switch between OpenAI-compatible and Anthropic-compatible clients.

Confirm the saved config:

$openshell inference get

You should see either Provider: lmstudio or Provider: lmstudio-anthropic, along with Model: qwen/qwen3.5-2b.

5

Verify from Inside a Sandbox

Run a simple request through https://inference.local:

OpenAI-compatible
Anthropic-compatible
$openshell sandbox create -- \
> curl https://inference.local/v1/chat/completions \
> --json '{"messages":[{"role":"user","content":"hello"}],"max_tokens":10}'
$
$openshell sandbox create -- \
> curl https://inference.local/v1/responses \
> --json '{
> "instructions": "You are a helpful assistant.",
> "input": "hello",
> "max_output_tokens": 10
> }'

Troubleshooting

If setup fails, check these first:

  • LM Studio local server is running and reachable from the gateway host
  • OPENAI_BASE_URL uses http://host.openshell.internal:1234/v1 when you use an openai provider
  • ANTHROPIC_BASE_URL uses http://host.openshell.internal:1234 when you use an anthropic provider
  • The gateway and LM Studio run on the same machine or a reachable network path
  • The configured model name matches the model exposed by LM Studio

Useful commands:

$openshell status
$openshell inference get
$openshell provider get lmstudio
$openshell provider get lmstudio-anthropic

Next Steps

  • To learn more about using the LM Studio CLI, refer to LM Studio docs
  • To learn more about managed inference, refer to Inference Routing.
  • To configure a different self-hosted backend, refer to Inference Routing.