Skip to content

docs(openai): document empty response from thinking-mode models#1062

Merged
planetf1 merged 1 commit into
generative-computing:mainfrom
planetf1:fix/1060-openai-empty-content
May 19, 2026
Merged

docs(openai): document empty response from thinking-mode models#1062
planetf1 merged 1 commit into
generative-computing:mainfrom
planetf1:fix/1060-openai-empty-content

Conversation

@planetf1
Copy link
Copy Markdown
Contributor

@planetf1 planetf1 commented May 12, 2026

Misc PR

Type of PR

  • Bug Fix
  • New Feature
  • Documentation
  • Other

Description

Document the empty-value symptom that thinking-mode models exhibit on the OpenAI backend, rather than enforcing it with a RuntimeError in post_processing().

A model returning content=None with finish_reason=stop and non-zero completion_tokens is the literal response from the API — the reasoning content is preserved on ModelOutputThunk._thinking. Raising would break legitimate thinking-mode flows and bypass Mellea's sampling and validator machinery, so the right fix is discoverability, not enforcement.

Adds an Empty value from a thinking-mode model subsection to docs/docs/integrations/openai.md covering:

  • How to diagnose: result.value, result.generation.usage, result._thinking
  • The vLLM/Qwen3 case as the most common concrete trigger
  • The chat_template_kwargs.enable_thinking=False workaround for callers who did not intend thinking mode
  • A pointer to result._thinking for callers who did

Testing

  • Tests added to the respective file if code was changed
  • New code has 100% coverage if code as added
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Docs-only change — pre-commit (including markdownlint) passes locally.

Attribution

  • AI coding assistants used

Assisted-by: Claude Code

@github-actions github-actions Bot added the bug Something isn't working label May 12, 2026
@planetf1 planetf1 force-pushed the fix/1060-openai-empty-content branch from b37b587 to 601a224 Compare May 12, 2026 13:28
@github-actions
Copy link
Copy Markdown
Contributor

The PR description has been updated. Please fill out the template for your PR to be reviewed.

@planetf1 planetf1 marked this pull request as ready for review May 13, 2026 11:25
@planetf1 planetf1 requested a review from a team as a code owner May 13, 2026 11:25
@planetf1 planetf1 requested review from ajbozarth and jakelorocco May 13, 2026 11:25
Copy link
Copy Markdown
Contributor

@ajbozarth ajbozarth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few notes inline — nothing on the error-path span cleanup before the raise since that's getting rewritten anyway.

Comment thread mellea/backends/openai.py Outdated
Comment thread mellea/backends/openai.py Outdated
Comment thread test/backends/test_openai_unit.py Outdated
@planetf1 planetf1 force-pushed the fix/1060-openai-empty-content branch from 42e83d9 to 5e01a8d Compare May 18, 2026 12:32
Copy link
Copy Markdown
Contributor

@jakelorocco jakelorocco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I disagree with this change. Users should be able to enable thinking and we should handle that gracefully. If a model only produces thinking tokens, that is the reality of the response we received.

We can maybe log a warning or have better documentation that it's a possibility; but I think this is expected behavior.

Comment thread mellea/backends/openai.py Outdated
Comment on lines +1147 to +1148
'model_options={"extra_body": {"chat_template_kwargs": '
'{"enable_thinking": False}}}.'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go this route, we should advertise the Mellea specific ModelOption.THINKING model option instead.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I was doing a re-review Claude made the following observation on this that's worth noting:


Jake's right that ModelOption.THINKING is the canonical Mellea-side knob, but there's a wrinkle worth flagging before we change the message.

The OpenAI backend currently maps ModelOption.THINKING to reasoning_effort (openai.py:672-686). For OpenAI proper / o-series that's correct, but for Qwen3 via vLLM the lever to disable thinking is chat_template_kwargs.enable_thinking, which is a different parameter — reasoning_effort=False would just get ignored (or rejected) by vLLM. So if we swap the error message to advertise model_options={ModelOption.THINKING: False} today, it'd send users at the exact failure mode this PR is trying to surface to a workaround that doesn't actually work for them.

Two options:

  1. Land this PR as-is with the raw extra_body form (which is what works on vLLM/Qwen3 today), and open a follow-up to broaden ModelOption.THINKING in the OpenAI backend so THINKING=False also emits chat_template_kwargs.enable_thinking=False for compatible providers. Then update the message.
  2. Do that broadening in this PR before advertising the abstraction.

I'd lean toward (1) to keep the fix scoped — the error message is already a big improvement over the silent empty string, and we shouldn't block it on a semantics change to THINKING.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now docs only — the runtime change is being scrapped (see my comment above). The advertised workaround moves into the integration docs rather than an error message, so the ModelOption.THINKING vs extra_body question does not apply here any more.

Closes generative-computing#1060.

When a thinking-mode model (e.g. Qwen3 via vLLM with --reasoning-parser)
emits only reasoning tokens, the OpenAI backend faithfully returns
content=None — surfacing as result.value == "" with non-zero
completion_tokens. The reasoning trace is preserved on
ModelOutputThunk._thinking.

This is expected behaviour, not a backend bug, so document the symptom
in the OpenAI integration troubleshooting section: how to diagnose,
where the reasoning content lives, and how to disable thinking via
chat_template_kwargs for vLLM/Qwen3.

Assisted-by: Claude Code
@planetf1
Copy link
Copy Markdown
Contributor Author

I'm thinking now we should not make this change. I hit it when trying qwen as I didn't appreciate the different behaviour. (We did make another change in the backend relating to the response field). I agree with Jake in that we can't change behaviour - if no real response is returned, well that's the response - and why we have validators etc. Even a warning is potentially noise so probably not right either.

I therefore am scrapping the code change, and instead just offering a small docs tweak

@planetf1 planetf1 changed the title fix(backends): raise error when OpenAI backend receives content=None docs(openai): document empty response from thinking-mode models May 19, 2026
@github-actions github-actions Bot added the documentation Improvements or additions to documentation label May 19, 2026
@planetf1 planetf1 force-pushed the fix/1060-openai-empty-content branch from 5e01a8d to ae70f9a Compare May 19, 2026 10:43
@planetf1 planetf1 requested a review from jakelorocco May 19, 2026 10:44
Copy link
Copy Markdown
Contributor

@jakelorocco jakelorocco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm; opened #1093 to handle getting ModelOption.Thinking to work in these edge cases as well

@planetf1 planetf1 added this pull request to the merge queue May 19, 2026
Merged via the queue into generative-computing:main with commit 801bbfd May 19, 2026
8 checks passed
@planetf1 planetf1 deleted the fix/1060-openai-empty-content branch May 19, 2026 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OpenAI backend silently returns empty string when model produces no text content (content=None)

3 participants