Complete configuration and documentation for running Open Code CLI with local Ollama models.
- Quick Start
- What's Included
⚠️ Important: Tool Usage Discovery- Available Models
- Common Commands
- Performance Tips
- When to Use Local vs Cloud Models
- Documentation
- Examples
- Troubleshooting
- Resources
- Contributing
- License
- Install Ollama: ollama.ai
- Install Open Code CLI: opencode.ai
-
Clone this repository:
git clone https://github.com/YOUR-USERNAME/ollama-opencode-setup.git ~/code/ollama-opencode-setup -
Start Ollama:
ollama serve
-
Pull your first model (and build the recommended 16k variant):
ollama pull ministral-3:8b # fast, reliable tool calling # Open Code can't set Ollama's num_ctx, so bake a 16k context variant: ollama create ministral-3:8b-16k -f modelfiles/ministral-3-8b-16k.Modelfile
-
Use the configuration in your project:
# Option 1: Symlink into your project ln -s ~/code/ollama-opencode-setup/opencode.json ~/code/your-project/opencode.json # Option 2: Copy into your project cp ~/code/ollama-opencode-setup/opencode.json ~/code/your-project/opencode.json
-
Run Open Code:
cd ~/code/your-project opencode
- opencode.json - Open Code configuration for Ollama models
- docs/LOCALLLMS.md - Complete documentation on local LLM setup
- docs/AGENTS.md - Guide to using Open Code CLI agent modes
- examples/ - Example workflows and prompts
- test-opencode.md - Test suite for validating Open Code CLI setup
Tool calling requires a model trained for it — and that capability is not tied to size or recency.
All models in this config have been tested. Results (M1 16GB, 2026-05-31):
- ✅ Ministral 3 8B — full tool usage, fastest tool-caller (~4s warm), no think-mode tax — recommended daily driver
- ✅ Qwen3 models (qwen3:8b-16k, qwen3:8b, qwen3:4b) — full tool usage confirmed, but verbose think mode (~26s)
- ❌ DeepSeek-Coder-V2-Lite 16B — Ollama reports
does not support tools; fits RAM and is fast, but it's a code-completion/FIM model with no tool calling - ❌ Qwen3.5 9B / 4B — outputs bash commands instead of invoking write tool
- ❌ Phi-4 — Open Code CLI explicitly reports "does not support tools"
- ❌ Gemma 4 E4B — attempts tool call but sends malformed/incompatible call format
- ❌ Mistral Nemo & Granite — analysis only, cannot create files
Tool-call support is verified with scripts/tool-call-test.sh. See docs/LOCALLLMS.md for full test details.
| Model | Size | Context | Tool Usage | Best For |
|---|---|---|---|---|
ministral-3:8b-16k ⭐ |
6.0 GB | 16k | ✅ YES | Recommended for Open Code — 16k variant (num_ctx baked in) |
ministral-3:8b |
6.0 GB | ~4k default | ✅ YES | Base model — fast tool use (~4s); runs at Ollama's small default context in Open Code |
qwen3:8b-16k |
5.2 GB | 16k | ✅ YES | Multi-file analysis (larger context) |
qwen3:8b |
5.2 GB | 8k | ✅ YES | General file operations (~26s) |
qwen3:4b |
2.5 GB | 8k | ✅ YES | Quick file edits |
deepseek-coder-v2:16b |
8.9 GB | 128k | ❌ NO | No tool support (does not support tools) — FIM/completion only |
qwen3.5:9b |
6.6 GB | 32k | ❌ NO | Read-only analysis (too slow, 13+ min) |
qwen3.5:4b |
~2.5 GB | 32k | ❌ NO | Read-only analysis only |
phi4:latest |
~5 GB | 16k | ❌ NO | Read-only analysis only |
gemma4:e4b |
~5.5 GB | 32k | ❌ NO | Read-only analysis only |
mistral-nemo:12b-instruct-2407-q4_K_M |
7.5 GB | 8k | ❌ NO | Code review (read-only) |
granite3.1-moe |
2.0 GB | 8k | ❌ NO | Fast analysis (read-only) |
# List installed models
ollama list
# Run a model interactively
ollama run qwen3:8b
# Pull a new model
ollama pull mistral-nemo:12b-instruct-2407-q4_K_M
# Remove a model
ollama rm qwen3:4bOpen Code talks to Ollama via the OpenAI-compatible endpoint, which does not pass Ollama's num_ctx. To get a usable context window, bake it into a custom variant.
Recommended — from a committed Modelfile (reproducible):
ollama create ministral-3:8b-16k -f modelfiles/ministral-3-8b-16k.Modelfile
# Verify the context is baked in
ollama show ministral-3:8b-16k --modelfile | grep num_ctx
# PARAMETER num_ctx 16384The Modelfile is just FROM ministral-3:8b + PARAMETER num_ctx 16384.
Alternative — interactive /save (used for qwen3:8b-16k):
# Start interactive session
ollama run qwen3:8b
# Set extended context
>>> /set parameter num_ctx 16384
Set parameter 'num_ctx' to '16384'
# Save as new model
>>> /save qwen3:8b-16k
Created new model 'qwen3:8b-16k'
# Exit
>>> /bye# Run with default model
opencode run "create a todo.md file"
# Specify model
opencode run "analyze this codebase" --model ollama/qwen3:8b-16k
# Interactive session
opencodeUse the right model for the task:
File Creation/Modification (use a tool-capable model):
- Default / fastest tool use →
ministral-3:8b-16k(~4s, no think-mode tax, 16k context for Open Code) ⭐ - Multi-file changes (larger context) →
qwen3:8b-16k(extended context + tool usage) - Standard file operations →
qwen3:8b(balanced, ~26s) - Quick file edits →
qwen3:4b(fastest Qwen3 model)
Code Review/Analysis (read-only — any model works):
- Best quality review →
mistral-nemo:12b-instruct-2407-q4_K_M(excellent analysis) - Fast analysis →
granite3.1-moe(quickest) - Large context analysis →
qwen3.5:4b(32k context, read-only) — avoidqwen3.5:9b(too slow)
Performance expectations (write tool call):
| Task | ministral-3:8b-16k ⭐ | qwen3:8b | qwen3:8b-16k | Claude Sonnet 4 |
|---|---|---|---|---|
| Simple file write | ~4s | 15-30s | 45-90s | 2-5s |
| Multi-file analysis | fast | 40-90s | 90-180s | 10-30s |
Notes:
ministral-3:8bis the fastest tool-caller tested — no<think>overhead- Qwen3 models enter verbose "thinking mode" before execution (slower but successful)
- A model must be trained/templated for tools — fitting in RAM is not enough (e.g. DeepSeek-Coder-V2-Lite fits but has no tool calling)
- ✅ Working offline
- ✅ Processing sensitive/proprietary code
- ✅ Running batch operations overnight
- ✅ Learning/experimenting without API costs
- ✅ Privacy requirements mandate local processing
- ✅ Code review that doesn't require changes (any model)
⚠️ File operations (use a tool-capable model — Ministral 3 8B or Qwen3)
- ⏱️ Real-time interactive development
- ⚡ Complex multi-file operations requiring fast iteration
- 🚀 Time-sensitive tasks
- 📚 Working with very large codebases (200k+ context)
- 💰 Speed is more important than cost
- 🎯 Best code quality is critical
Complete Open Code CLI commands reference:
- All built-in slash commands (15 commands documented)
- Bash command integration (
!command) - Agent switching (Tab key for build/plan agents)
- Custom command creation (file-based and config-based)
- Navigation and workflows
- Troubleshooting command issues
Comprehensive guide to local LLM setup:
- Custom model creation
- Context window comparison (4k vs 8k vs 16k vs 200k)
- Ollama commands reference
- Model selection guidelines
- Troubleshooting guide
- Performance optimization
Guide to using Open Code CLI agent modes:
- Build and plan agents (Tab key switching)
- Model capabilities for agent workflows
- Agent workflow patterns
- Controlling agent behavior
- Performance benchmarks by model
- Best practices and troubleshooting
Critical testing results:
- ✅ Qwen3 models have full tool usage (file creation works)
- ❌ Mistral Nemo & Granite lack tool usage (analysis only)
- Model-by-model test results and recommendations
Check the examples/ directory for:
- Code review workflows
- Refactoring prompts
- Multi-file analysis examples
- Batch processing scripts
# Check if Ollama is running
curl http://localhost:11434/v1/models
# Start Ollama
ollama serve# Verify model exists
ollama list
# Pull model if missing
ollama pull qwen3:8b- Use smaller models for simple tasks (
qwen3:4b) - Use standard context when extended context isn't needed (
qwen3:8binstead ofqwen3:8b-16k) - Consider cloud models for time-sensitive work
See docs/LOCALLLMS.md#troubleshooting for more details.
Contributions welcome! Please feel free to submit issues or pull requests with:
- New model configurations
- Performance optimizations
- Example workflows
- Documentation improvements
MIT