This tutorial covers two ways of running Ollama with OpenShell:
After completing this tutorial, you know how to:
ollama launch to start coding agents inside a sandbox.inference.local.The Ollama community sandbox bundles Ollama, Claude Code, OpenCode, and Codex into a single image. Ollama starts automatically when the sandbox launches.
Cloud models use the :cloud tag suffix and do not require local hardware.
Agentic workflows (Claude Code, Codex, OpenCode) rely on tool calling. The following models have reliable tool calling support: Qwen 3.5, Nemotron-3-Super, GLM-5, and Kimi-K2.5. Check the Ollama model library for the latest models.
To update Ollama inside a running sandbox:
Or auto-update on every sandbox start:
Use this approach when you want a single Ollama instance on the gateway host, shared across multiple sandboxes through inference.local.
This approach uses Ollama because it is easy to install and run locally, but you can substitute other inference engines such as vLLM, SGLang, TRT-LLM, and NVIDIA NIM by changing the startup command, base URL, and model name.
Install Ollama on the gateway host:
Start Ollama on all interfaces so it is reachable from sandboxes:
If you see Error: listen tcp 0.0.0.0:11434: bind: address already in use, Ollama is already running as a system service. Stop it first:
In a second terminal, pull a model:
Type /bye to exit the interactive session. The model stays loaded.
Common issues and fixes:
0.0.0.0, not 127.0.0.1. This applies to host-level Ollama only; the community sandbox handles this automatically.OPENAI_BASE_URL wrong: Use http://host.openshell.internal:11434/v1, not localhost or 127.0.0.1.ollama ps to confirm the model is loaded. Run ollama pull <model> if needed.https://inference.local, not http://.Useful commands: