Open Code CLI Test File

Purpose

This file is used to test Open Code CLI functionality with local Ollama models.

Test Instructions

Test 1: Simple File Creation

Prompt: "Create a todo.md file with 3 sample tasks"

Expected behavior:

Model creates a new file called todo.md
File contains 3 tasks in markdown format
Task completes in reasonable time (30-90s for qwen3:8b-16k)

Test 2: Code Review

Prompt: "Review the opencode.json file and suggest improvements"

Expected behavior:

Model reads the opencode.json file
Provides constructive feedback on configuration
Suggests potential improvements or best practices

Test 3: Multi-File Analysis

Prompt: "Analyze the documentation structure in this repository and suggest improvements"

Expected behavior:

Model reads multiple files (README.md, CLAUDE.md, docs/)
Provides comprehensive analysis
Suggests organizational improvements

Test 4: Think Mode Observation

Prompt: "Create a simple hello.py file that prints 'Hello World'"

Expected behavior:

Qwen3 models will likely enter thinking mode (verbose analysis)
Task completes successfully despite verbosity
Build mode is already default (no need to set /mode build)

Note: There is no /no_think flag in Open Code CLI. Build agent is the default. Think mode verbosity is model behavior.

Test 5: Bash Command Integration

Prompt:

!git status
Based on the output, tell me if there are any uncommitted changes

Expected behavior:

Git status output is included in conversation
Model analyzes the git status output
Provides clear answer about repository state

Test 6: Agent Switching

Instructions:

Start with build agent (default)
Press Tab to switch to plan agent
Ask: "What is the structure of this repository?"
Press Tab to switch back to build agent
Ask: "Create a CONTRIBUTING.md file based on the repository structure"

Expected behavior:

Plan agent provides detailed analysis
Build agent creates the file
Tab key successfully switches between agents

Available Commands

See docs/OPENCODE-COMMANDS.md for complete command reference.

Key commands:

Tab - Switch between build and plan agents
!command - Run bash command and include output
/help - Show help dialog
/models - List available models
/sessions - List and resume sessions
/export - Export conversation to Markdown
/undo / /redo - Undo/redo messages

Test Results

Date: 2025-11-18

Tester: jasperfrumau

Model: granite3.1-moe:latest

Test 1: Create todo.md file

Status: [X] Fail
Time taken: ~15-20s (estimated)
Notes:
- Model generated JSON task structure but did not create the file
- Created task plan with 3 items in JSON format
- Shows "planning" capability but lacks "execution" capability
- Critical issue: Model doesn't understand it needs to actually create files in Open Code CLI

Model: mistral-nemo:12b-instruct-2407-q4_K_M

Test 1: Create todo.md file

Status: [X] Fail
Time taken: ~30-60s (estimated)
Notes:
- Model generated comprehensive task descriptions
- Created detailed 3-task structure with descriptions, deadlines, dependencies, assignees
- Critical issue: Did not create the actual todo.md file
- Shows excellent "analysis" and "planning" but lacks "execution" capability
- Tasks were well-structured and professional quality

Model: qwen3:8b-16k (16k context)

Test 1: Create todo.md file

Status: [✓] PASS
Time taken: ~60-90s (estimated)
Notes:
- ✅ FILE CREATED SUCCESSFULLY!
- Model entered verbose thinking mode (as expected)
- Model eventually invoked the write tool correctly
- Tool call format: {"name": "write", "arguments": {"content": "...", "filePath": "..."}}
- Key insight: qwen3 models HAVE tool usage capabilities
- Think mode shows detailed reasoning before execution
- Default mode is "build" (can switch to "plan" with tab)

Critical findings:

Qwen3 8B 16K CAN create files - it has proper tool integration
Granite and Mistral models may lack tool usage training
Think mode is verbose but doesn't prevent execution
Open Code CLI shows modified files in sidebar after successful tool use

Model: qwen3.5:9b

Test 1: Create todo-v2.md file (tested 2026-05-31)

Status: [X] Fail
Notes:
- Model generated a bash heredoc (cat > file << 'EOF') instead of invoking the write tool
- File was NOT created on disk despite model saying it was
- No <think> verbose mode observed (cleaner output than qwen3:8b-16k)
- 32k context window is useful for large codebase analysis
- Conclusion: Read-only model — cannot create/modify files in Open Code CLI
Test 2: Code review (tested 2026-05-31)
- Status: [X] Fail (killed)
- Time: 13+ minutes, did not complete
- Notes: Heavy swap usage on M1 16GB with full desktop environment; not viable for daily use

Model: ministral-3:8b

Tool-call test (tested 2026-05-31, via scripts/tool-call-test.sh against Ollama's OpenAI-compatible endpoint)

Status: [✓] PASS
Time: ~7s cold, ~4s warm (num_ctx=16384)
Notes:
- ✅ Emitted a valid write tool_call on the first try
- Fastest tool-caller tested — no <think> overhead (vs ~26s for qwen3:8b on the same task)
- Newer model than the Qwen3 family (Mistral, Dec-2025 build), 6.0 GB, fits M1 16GB with room for 16k context
- Recommended as the new daily driver for tool-use tasks

Model: deepseek-coder-v2:16b

Tool-call test (tested 2026-05-31, via scripts/tool-call-test.sh)

Status: [X] FAIL — no tool support
Notes:
- Ollama returns an API error: registry.ollama.ai/library/deepseek-coder-v2:16b does not support tools
- The model's template does not declare tool calling — it is a code-completion/FIM model, not an agentic tool-caller
- Fits M1 16GB (8.9 GB, MoE with 2.4B active) and is fast, but cannot be used for Open Code's tool loop
- Conclusion: size/speed are fine; the capability simply isn't there

qwen3:8b baseline (tool-call test, tested 2026-05-31)

Status: [✓] PASS — emitted write tool_call, ~26s warm (think-mode overhead)

Additional Test Results Needed:

qwen3:4b
gemma4:e4b
phi4
qwen3.5:4b

Comparison Matrix

Model	Test 1	Test 2	Test 3	Test 4	Execution Capability	Notes
ministral-3:8b	✅ PASS	❓	❓	❓	✅ Full tool usage	Fastest tool-caller (~4s warm), no think mode — recommended
deepseek-coder-v2:16b	❌ FAIL	❓	❓	❓	❌ No tool usage	Ollama: "does not support tools" — code-completion/FIM only
qwen3:4b	❓	❓	❓	❓	❓	Not tested yet
qwen3:8b	✅ PASS	❓	❓	❓	✅ Full tool usage	Tool call works but ~26s (think-mode overhead)
qwen3:8b-16k	✅ PASS	❓	❓	❓	✅ Full tool usage	Verbose think mode but executes successfully
qwen3.5:9b	❌ FAIL	❓	❓	❓	❌ No tool usage	Outputs bash heredoc instead of write tool; read-only
mistral-nemo:12b-instruct-2407-q4_K_M	❌ FAIL	❓	❓	❓	❌ No tool usage	Excellent analysis, no file creation
granite3.1-moe	❌ FAIL	❓	❓	❓	❌ No tool usage	JSON output, no file creation
gemma4:e4b	❓	❓	❓	❓	❓	Not tested yet
phi4	❓	❓	❓	❓	❓	Not tested yet
qwen3.5:4b	❓	❓	❓	❓	❓	Not tested yet

Critical Findings

✅ RESOLVED: Tool Usage Requires Qwen3 Models

Discovery: Only Qwen3 models have proper tool usage capabilities with Open Code CLI!

Working models:

✅ qwen3:8b-16k - Full tool usage, creates files successfully

Non-working models (no tool usage):

❌ granite3.1-moe - Plans but doesn't execute
❌ mistral-nemo:12b-instruct-2407-q4_K_M - Excellent analysis but no file creation

Root cause:

Tool usage requires specific model training for function calling
Qwen3 family has built-in tool/function calling capabilities
Mistral Nemo and Granite models lack this training
This is NOT an Open Code CLI issue, it's a model capability gap

Tool call format used by Qwen3:

{
  "name": "write",
  "arguments": {
    "content": "# Todo List\n\n- Task 1\n- Task 2",
    "filePath": "/absolute/path/to/file.md"
  }
}

Think mode behavior:

Qwen3 8B 16K enters verbose thinking mode before execution
Thinking doesn't prevent execution - file is still created
Think mode shows detailed reasoning (can be useful for debugging)
Default mode in Open Code CLI is "build", can switch to "plan" with tab

Recommendations

UPDATED Based on Test Results:

✅ For Open Code CLI with Ollama: USE QWEN3 MODELS ONLY

Confirmed working models:

qwen3:8b-16k (16k context) - ✅ Full tool usage, file creation works
- Use for: Multi-file tasks, complex operations
- Caveat: Verbose think mode, slower response

Likely working (need testing): 2. qwen3:8b (8k context) - Probably has tool usage (same family) 3. qwen3:4b (8k context) - Probably has tool usage (same family)

❌ NON-WORKING models (analysis only, no execution):

mistral-nemo:12b-instruct-2407-q4_K_M - Excellent for planning/analysis, but cannot create files
granite3.1-moe - Fast for analysis, but cannot create files

Updated use cases:

File creation & modification:

✅ Use: qwen3:8b-16k, qwen3:8b, or qwen3:4b
❌ Avoid: mistral-nemo:12b-instruct-2407-q4_K_M, granite3.1-moe

Code review & analysis (read-only):

✅ All models work (mistral-nemo:12b-instruct-2407-q4_K_M is best quality)
Consider: Use faster models (granite, qwen3:4b) for quick reviews

Multi-file analysis with changes:

✅ Use: qwen3:8b-16k (extended context + tool usage)

Think mode:

Accept it as "free documentation" of reasoning
Doesn't prevent execution
Can be helpful for understanding model decisions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Open Code CLI Test File

Purpose

Test Instructions

Test 1: Simple File Creation

Test 2: Code Review

Test 3: Multi-File Analysis

Test 4: Think Mode Observation

Test 5: Bash Command Integration

Test 6: Agent Switching

Available Commands

Test Results

Date: 2025-11-18

Tester: jasperfrumau

Model: granite3.1-moe:latest

Model: mistral-nemo:12b-instruct-2407-q4_K_M

Model: qwen3:8b-16k (16k context)

Model: qwen3.5:9b

Model: ministral-3:8b

Model: deepseek-coder-v2:16b

qwen3:8b baseline (tool-call test, tested 2026-05-31)

Additional Test Results Needed:

Comparison Matrix

Critical Findings

✅ RESOLVED: Tool Usage Requires Qwen3 Models

Recommendations

UPDATED Based on Test Results:

FilesExpand file tree

test-opencode.md

Latest commit

History

test-opencode.md

File metadata and controls

Open Code CLI Test File

Purpose

Test Instructions

Test 1: Simple File Creation

Test 2: Code Review

Test 3: Multi-File Analysis

Test 4: Think Mode Observation

Test 5: Bash Command Integration

Test 6: Agent Switching

Available Commands

Test Results

Date: 2025-11-18

Tester: jasperfrumau

Model: granite3.1-moe:latest

Model: mistral-nemo:12b-instruct-2407-q4_K_M

Model: qwen3:8b-16k (16k context)

Model: qwen3.5:9b

Model: ministral-3:8b

Model: deepseek-coder-v2:16b

qwen3:8b baseline (tool-call test, tested 2026-05-31)

Additional Test Results Needed:

Comparison Matrix

Critical Findings

✅ RESOLVED: Tool Usage Requires Qwen3 Models

Recommendations

UPDATED Based on Test Results: