Skip to content

[GSoC 2026] docs(chatbot): latency numbers#68

Merged
berardifra merged 1 commit into
mainfrom
gsoc-2026/chatbot-latency-docs
Jun 25, 2026
Merged

[GSoC 2026] docs(chatbot): latency numbers#68
berardifra merged 1 commit into
mainfrom
gsoc-2026/chatbot-latency-docs

Conversation

@berardifra

Copy link
Copy Markdown
Contributor

Description

Follow-up to #65 (chatbot docs): adds the citable end-to-end latency numbers from the W10 latency
benchmark to the Fine-tuning & Prompting guide ("Choosing a model").

  • chatbot_tuning.md ("Choosing a model"): replaces the qualitative latency text with measured
    warm numbers (no-tool ~5–6 s, tool-backed ~30–50 s, first token ~0.3–13 s) and the one-time ~70 s
    cold-load. States the empirical finding that a 7B model (mistral) did not emit tool calls on
    this stack
    (it answered with invented data) — so qwen2.5:3b is the default for tool-calling
    reliability
    , not only speed.

The _Available from version >= 6.7.0_ availability note is already on main (added in #65), so this
PR only adds the latency numbers.

Numbers come from a live end-to-end benchmark (real Ollama, no mocks, CPU-only): qwen2.5:3b
(3.1B, Q4_K_M) vs mistral:latest (7.2B, Q4_K_M).

Refs intelowlproject/IntelOwl#3810

Checklist

@berardifra berardifra requested a review from mlodic June 24, 2026 16:45
@berardifra berardifra changed the title [GSoC 2026] docs(chatbot): latency numbers [GSoC 2026] docs(chatbot): latency numbers Jun 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants