For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
  • About NVIDIA OpenShell
    • Overview
    • How It Works
    • Installation
    • Container Gateway
    • Supported Agents
    • Release Notes
  • Get Started
    • Quickstart
    • Tutorials
      • Docker Compose Setup
      • First Network Policy
      • GitHub Push Access
      • Inference with Ollama
      • Local Inference with LM Studio
      • Microsoft Graph Provider Refresh
  • Manage OpenShell
    • Sandboxes
    • Gateways
    • Providers
    • Providers v2
    • Policies
    • Policy Advisor
    • Inference Routing
  • Providers
    • Google Vertex AI
  • Observability
    • Accessing Logs
    • Logging
    • OCSF JSON Export
  • Kubernetes
    • Setup
    • Managing Certificates
    • Ingress
    • Access Control
    • OpenShift
  • Reference
    • Gateway Auth
    • Default Policy
    • Policy Schema
    • Compute Drivers
    • Gateway Config
    • Support Matrix
  • Security
    • Security Best Practices
  • Resources
    • License
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoOpenShell
On this page
  • Prerequisites
  • Option A: Ollama Community Sandbox (Recommended)
  • Create the Sandbox
  • Chat with a Model
  • Model Recommendations
  • Tool Calling
  • Updating Ollama
  • Option B: Host-Level Ollama
  • Install and Start Ollama
  • Pull a Model
  • Create a Provider
  • Set Inference Routing
  • Verify from a Sandbox
  • Troubleshooting
  • Next Steps
Get StartedTutorials

Run Local Inference with Ollama

||View as Markdown|
Previous

Grant GitHub Push Access to a Sandboxed Agent

Next

Route Local Inference Requests to LM Studio

This tutorial covers two ways of running Ollama with OpenShell:

  1. Ollama sandbox. This is the recommended way to run Ollama. A self-contained sandbox with Ollama, Claude Code, and Codex pre-installed. One command starts it.
  2. Host-level Ollama. This is an alternative way to run Ollama. Run Ollama on the gateway host and route sandbox inference to it. Use this option when you want a single Ollama instance shared across multiple sandboxes.

After completing this tutorial, you know how to:

  • Launch the Ollama community sandbox for a batteries-included experience.
  • Use ollama launch to start coding agents inside a sandbox.
  • Expose a host-level Ollama server to sandboxes through inference.local.

Prerequisites

  • A working OpenShell installation. Complete the Quickstart before proceeding.

Option A: Ollama Community Sandbox (Recommended)

The Ollama community sandbox bundles Ollama, Claude Code, OpenCode, and Codex into a single image. Ollama starts automatically when the sandbox launches.

1

Create the Sandbox

$openshell sandbox create --from ollama

This pulls the community sandbox image, applies the bundled policy, and drops you into a shell with Ollama running.

2

Chat with a Model

Chat with a local model

$ollama run qwen3.5

Or a cloud model

$ollama run kimi-k2.5:cloud

Or use ollama launch to start a coding agent with Ollama as the model backend:

$ollama launch claude
$ollama launch codex
$ollama launch opencode

For CI/CD and automated workflows, ollama launch supports a headless mode:

$ollama launch claude --yes --model qwen3.5

Model Recommendations

Use caseModelNotes
Smoke testqwen3.5:0.8bFast, lightweight, good for verifying setup
Coding and reasoningqwen3.5Strong tool calling support for agentic workflows
Complex tasksnemotron-3-super122B parameter model, needs 48GB+ VRAM
No local GPUqwen3.5:cloudRuns on Ollama’s cloud infrastructure, no ollama pull required

Cloud models use the :cloud tag suffix and do not require local hardware.

$openshell sandbox create --from ollama

Tool Calling

Agentic workflows (Claude Code, Codex, OpenCode) rely on tool calling. The following models have reliable tool calling support: Qwen 3.5, Nemotron-3-Super, GLM-5, and Kimi-K2.5. Check the Ollama model library for the latest models.

Updating Ollama

To update Ollama inside a running sandbox:

$update-ollama

Or auto-update on every sandbox start:

$openshell sandbox create --from ollama -e OLLAMA_UPDATE=1

Option B: Host-Level Ollama

Use this approach when you want a single Ollama instance on the gateway host, shared across multiple sandboxes through inference.local.

This approach uses Ollama because it is easy to install and run locally, but you can substitute other inference engines such as vLLM, SGLang, TRT-LLM, and NVIDIA NIM by changing the startup command, base URL, and model name.

1

Install and Start Ollama

Install Ollama on the gateway host:

$curl -fsSL https://ollama.com/install.sh | sh

Start Ollama on all interfaces so it is reachable from sandboxes:

$OLLAMA_HOST=0.0.0.0:11434 ollama serve

If you see Error: listen tcp 0.0.0.0:11434: bind: address already in use, Ollama is already running as a system service. Stop it first:

$systemctl stop ollama
$OLLAMA_HOST=0.0.0.0:11434 ollama serve
2

Pull a Model

In a second terminal, pull a model:

$ollama run qwen3.5:0.8b

Type /bye to exit the interactive session. The model stays loaded.

3

Create a Provider

Create an OpenAI-compatible provider pointing at the host Ollama:

$openshell provider create \
> --name ollama \
> --type openai \
> --credential OPENAI_API_KEY=empty \
> --config OPENAI_BASE_URL=http://host.openshell.internal:11434/v1

OpenShell injects host.openshell.internal so sandboxes and the gateway can reach the host machine. You can also use the host’s LAN IP.

4

Set Inference Routing

$openshell inference set --provider ollama --model qwen3.5:0.8b

Confirm:

$openshell inference get
5

Verify from a Sandbox

$openshell sandbox create -- \
> curl https://inference.local/v1/chat/completions \
> --json '{"messages":[{"role":"user","content":"hello"}],"max_tokens":10}'

The response should be JSON from the model.

Troubleshooting

Common issues and fixes:

  • Ollama not reachable from sandbox: Ollama must be bound to 0.0.0.0, not 127.0.0.1. This applies to host-level Ollama only; the community sandbox handles this automatically.
  • OPENAI_BASE_URL wrong: Use http://host.openshell.internal:11434/v1, not localhost or 127.0.0.1.
  • Model not found: Run ollama ps to confirm the model is loaded. Run ollama pull <model> if needed.
  • HTTPS instead of HTTP: Code inside sandboxes must call https://inference.local, not http://.
  • AMD GPU driver issues: Ollama v0.18+ requires ROCm 7 drivers for AMD GPUs. Update your drivers if you see GPU detection failures.

Useful commands:

$openshell status
$openshell inference get
$openshell provider get ollama

Next Steps

  • To learn more about managed inference, refer to Inference Routing.
  • To configure a different self-hosted backend, refer to Inference Routing.
  • To learn how sandbox containers are selected, refer to Sandboxes.