Run Local Inference with Ollama
This tutorial covers two ways to use Ollama with OpenShell:
- Ollama sandbox (recommended) β a self-contained sandbox with Ollama, Claude Code, and Codex pre-installed. One command to start.
- Host-level Ollama β run Ollama on the gateway host and route sandbox inference to it. Useful when you want a single Ollama instance shared across multiple sandboxes.
After completing this tutorial, you will know how to:
- Launch the Ollama community sandbox for a batteries-included experience.
- Use
ollama launchto start coding agents inside a sandbox. - Expose a host-level Ollama server to sandboxes through
inference.local.
Prerequisites
- A working OpenShell installation. Complete the Quickstart before proceeding.
Option A: Ollama Community Sandbox (Recommended)
The Ollama community sandbox bundles Ollama, Claude Code, OpenCode, and Codex into a single image. Ollama starts automatically when the sandbox launches.
Model Recommendations
Cloud models use the :cloud tag suffix and do not require local hardware.
Tool Calling
Agentic workflows (Claude Code, Codex, OpenCode) rely on tool calling. The following models have reliable tool calling support: Qwen 3.5, Nemotron-3-Super, GLM-5, and Kimi-K2.5. Check the Ollama model library for the latest models.
Updating Ollama
To update Ollama inside a running sandbox:
Or auto-update on every sandbox start:
Option B: Host-Level Ollama
Use this approach when you want a single Ollama instance on the gateway host, shared across multiple sandboxes through inference.local.
This approach uses Ollama because it is easy to install and run locally, but you can substitute other inference engines such as vLLM, SGLang, TRT-LLM, and NVIDIA NIM by changing the startup command, base URL, and model name.
Install and Start Ollama
Install Ollama on the gateway host:
Start Ollama on all interfaces so it is reachable from sandboxes:
If you see Error: listen tcp 0.0.0.0:11434: bind: address already in use, Ollama is already running as a system service. Stop it first:
Pull a Model
In a second terminal, pull a model:
Type /bye to exit the interactive session. The model stays loaded.
Troubleshooting
Common issues and fixes:
- Ollama not reachable from sandbox β Ollama must be bound to
0.0.0.0, not127.0.0.1. This applies to host-level Ollama only; the community sandbox handles this automatically. OPENAI_BASE_URLwrong β Usehttp://host.openshell.internal:11434/v1, notlocalhostor127.0.0.1.- Model not found β Run
ollama psto confirm the model is loaded. Runollama pull <model>if needed. - HTTPS vs HTTP β Code inside sandboxes must call
https://inference.local, nothttp://. - AMD GPU driver issues β Ollama v0.18+ requires ROCm 7 drivers for AMD GPUs. Update your drivers if you see GPU detection failures.
Useful commands:
Next Steps
- To learn more about managed inference, refer to Index.
- To configure a different self-hosted backend, refer to Configure.
- To explore more community sandboxes, refer to Community Sandboxes.