Two test suites exercise TensorSharp.Server's current public compatibility surface:
- Web UI SSE:
/api/chat - Ollama chat compatibility:
/api/chat/ollama - OpenAI Chat Completions compatibility:
/v1/chat/completions
The scripts auto-detect the loaded model architecture and skip thinking or tool-calling checks when the active model does not support those capabilities.
- Start TensorSharp.Server:
./TensorSharp.Server --model ~/models/model.gguf --backend ggml_metal- Run either suite:
# Bash suite (requires curl + jq)
bash test_multiturn.sh
# Python suite (standard library only)
python3 test_multiturn.py- Web UI multi-turn SSE streaming and done events
- Ollama chat multi-turn behavior in streaming and non-streaming modes
- OpenAI Chat Completions streaming and non-streaming behavior
- OpenAI structured outputs with both
response_format: {"type":"json_object"}andresponse_format.json_schema - Queue status endpoint shape
- Error handling for missing required fields
- Structured-output validation errors and documented request conflicts
- Thinking-mode tests run only on architectures that currently support thinking in TensorSharp: Gemma 4, Qwen 3, Qwen 3.5, GPT OSS, and Nemotron-H
- Tool-calling tests run only on architectures that currently support tool calling in TensorSharp: Gemma 4, Qwen 3, Qwen 3.5, and Nemotron-H
Unsupported architectures are reported as SKIP, not FAIL.
- System-prompt persistence in the Web UI flow
- Concurrent requests and FIFO queue behavior
- Long-conversation stress test
- Mixed Ollama/OpenAI handoff
- Abort mid-generation and queue release
- Ollama tool-call request plumbing
- Architecture-aware OpenAI tool-call validation
- Separate pass/fail/skip accounting with per-test payload dumps
- The OpenAI coverage in this folder targets Chat Completions compatibility. OpenAI's newer Responses API is not the compatibility surface TensorSharp.Server currently emulates here.
- Structured outputs follow the Chat Completions
response_formatcontract.json_schemarequests combined withtoolsorthinkare expected to return HTTP400. - The Ollama and OpenAI compatibility projects continue to evolve. These scripts are aligned with the server's current contract plus the current documented behavior around thinking, tool calling, and structured outputs.
bash test_multiturn.sh [model_name] [base_url]Examples:
bash test_multiturn.sh
bash test_multiturn.sh gemma-4-E4B-it-Q8_0.gguf
bash test_multiturn.sh gemma-4-E4B-it-Q8_0.gguf http://host:5000python3 test_multiturn.py [--model MODEL] [--url URL] [--max-tokens N]Examples:
python3 test_multiturn.py
python3 test_multiturn.py --model gemma-4-E4B-it-Q8_0.gguf
python3 test_multiturn.py --max-tokens 120