fix: drop message items orphaned by handoff function calls consuming their reasoning item#3574
Conversation
…their reasoning item When a model turn during a handoff emits [reasoning, function_call, message], providers such as Azure OpenAI treat the reasoning item as consumed by the function_call. The trailing message item then has no paired reasoning and is rejected with HTTP 400: Item 'msg_...' of type 'message' was provided without its required 'reasoning' item Add drop_orphaned_messages_after_consumed_reasoning() and call it from prepare_model_input_items() alongside the existing drop_orphan_function_calls() pass. The new function tracks whether the most-recent reasoning item has been consumed by a function_call and drops any subsequent message item that would be left without a partner. This is the inverse of drop_orphan_function_calls(), which removes function calls without outputs and their preceding reasoning items.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 737335ccc5
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| elif item_type == "message": | ||
| if had_any_reasoning and not fresh_reasoning: | ||
| pass # orphaned — no paired reasoning available; drop to prevent API rejection | ||
| else: | ||
| result.append(item) |
There was a problem hiding this comment.
Limit orphan pruning to the consumed reasoning turn
Because generated_items is cumulative across turns before it reaches prepare_model_input_items, this state machine can span multiple model responses. After any reasoning-backed function_call/computer_call, fresh_reasoning stays false until another reasoning item, so a later valid assistant message from a provider/model turn that does not emit reasoning (for example a message plus another tool/handoff that requires a following turn) is silently removed even though it is not the trailing message orphaned by the original handoff call. That loses assistant history and can change subsequent tool or handoff behavior; the pruning should be constrained to messages tied to the same consumed reasoning item/response, not every later message in the accumulated history.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Good catch. The state machine was too broad; fresh_reasoning stayed false across all accumulated turns, meaning any later agent turn emitting a message without its own reasoning item would also be silently dropped.
Fixed by replacing the two-flag approach with a single consumed_by_call flag that resets to False as soon as the first orphaned message is dropped. Pruning is now scoped to exactly one trailing message per handoff turn. Updated the test to make the cross-turn invariant explicit: the delegate's response (no reasoning) must survive and appear in final_output.
The previous state machine carried fresh_reasoning=False across all subsequent turns, incorrectly dropping valid assistant messages from later agents that legitimately emit responses without a reasoning item. Replace had_any_reasoning + fresh_reasoning with a single consumed_by_call flag that is reset to False as soon as the first orphaned message is dropped. This limits pruning to the one trailing message inside the same handoff turn and leaves all subsequent turns unaffected. Add clarifying comments to the test showing that the delegate agent response (no reasoning) must survive and reach final_output.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4be2bbeb25
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
…turn bleed When the handoff turn emits [reasoning, function_call] with no trailing message, consumed_by_call stayed True and leaked into the next accumulated turn, silently dropping the delegate agent response. The SDK appends HandoffOutputItem (function_call_output) after all model output items, so any orphaned trailing message is dropped before we reach fc_out. Resetting consumed_by_call at function_call_output therefore scopes the drop to the current handoff sequence only and keeps all subsequent turns clean. Add test_handoff_without_trailing_message_keeps_delegate_response to cover this path explicitly.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b3dcb805cd
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
…on_call_output computer_call_output, shell_call_output, and other output types were not resetting consumed_by_call, so a reasoning-backed computer_call with no trailing message would leak the flag into the next turn and silently drop the following assistant message. Extract _CALL_OUTPUT_TYPES = frozenset(_TOOL_CALL_TO_OUTPUT_TYPE.values()) and use it as the reset condition so every call output type is covered.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5ddb96218e
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
|
||
| normalized_generated_items = normalize_input_items_for_api(list(generated_items)) | ||
| filtered_generated_items = drop_orphan_function_calls(normalized_generated_items) | ||
| filtered_generated_items = drop_orphaned_messages_after_consumed_reasoning(filtered_generated_items) |
There was a problem hiding this comment.
Apply orphan-message pruning to session history
This only prunes orphaned messages when they are in the current run's generated_items; with a Session, save_result_to_session() still persists the original new run items (including the handoff agent's trailing message), and the next run's prepare_input_with_session() rebuilds history using only drop_orphan_function_calls() before sending it. In a reasoning handoff followed by another Runner.run(..., session=same_session), the saved history can therefore re-send [reasoning, function_call, message] and hit the same provider 400 this PR is fixing; the pruned message should either not be saved or the new pruning pass should also be applied to session history.
Useful? React with 👍 / 👎.
save_result_to_session() persists raw run items including any orphaned trailing message from a reasoning handoff turn. On the next Runner.run() with the same session, prepare_input_with_session() rebuilt history using only drop_orphan_function_calls(), so the orphaned message was re-sent to the provider and triggered the same HTTP 400. Import drop_orphaned_messages_after_consumed_reasoning into session_persistence.py and call it immediately after drop_orphan_function_calls() in prepare_input_with_session(), mirroring the existing pattern for function-call orphan pruning. Add test_session_history_drops_orphaned_message_on_next_run to verify the session replay path explicitly.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 702595022f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| fresh_reasoning = True | ||
| consumed_by_call = False | ||
| result.append(item) | ||
| elif item_type in ("function_call", "computer_call"): |
There was a problem hiding this comment.
Treat every tool call as consuming reasoning
For reasoning-capable Responses runs where the model emits a supported non-function_call tool call such as custom_tool_call, shell_call, apply_patch_call, local_shell_call, or tool_search_call followed by a message, this branch leaves fresh_reasoning set and never marks the call as consuming the reasoning item. Those call types are already replayed and matched to outputs via _TOOL_CALL_TO_OUTPUT_TYPE, so the same invalid history this helper is meant to prevent can still be sent back as [reasoning, <tool_call>, message] and trigger the provider error for an assistant message without its own reasoning item. Consider basing the consumption check on _TOOL_CALL_TO_OUTPUT_TYPE rather than only function_call/computer_call.
Useful? React with 👍 / 👎.
…ion_call/computer_call custom_tool_call, shell_call, apply_patch_call, local_shell_call, and tool_search_call were not setting consumed_by_call, so a reasoning item followed by any of those call types and then a message would still be sent as [reasoning, <tool_call>, message] and trigger the same provider 400 this PR is fixing. Replace the hardcoded (function_call, computer_call) tuple with _TOOL_CALL_TO_OUTPUT_TYPE, which already enumerates every call type that the runtime tracks and matches to outputs.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 04a4072db9
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if consumed_by_call: | ||
| # Orphaned: reasoning was consumed by the preceding function_call and no | ||
| # function_call_output has reset the flag yet. Drop and reset. | ||
| consumed_by_call = False |
There was a problem hiding this comment.
Keep dropping all orphaned messages before the call output
If a model response contains more than one message item after a reasoning-backed tool call, resetting consumed_by_call after dropping the first one leaves the remaining message in input[] until the later *_call_output item is encountered. Because the Responses output is a list and the runner preserves each ResponseOutputMessage, a turn like [reasoning, function_call, message, message] still resends the second orphaned message and can hit the same provider 400 this helper is meant to avoid; the consumed state should stay active until a call output or a new reasoning item resets it.
Useful? React with 👍 / 👎.
…ion tracker Map of every path that assembles input[] for the model, cross-checked against which ones already call drop_orphan_function_calls (all of them should also call drop_orphaned_messages_after_consumed_reasoning): Path Before After prepare_model_input_items done done prepare_input_with_session done done normalize_resumed_input (RunState resume) miss fixed OpenAIServerConversationTracker.prepare miss fixed Changes: - items.py: normalize_resumed_input chains drop_orphaned_messages_after_consumed_reasoning after drop_orphan_function_calls (same pattern as the other call sites) - oai_conversation.py: import + one-line call after drop_orphan_function_calls in OpenAIServerConversationTracker.prepare_input; id() tracking is safe because the function returns items from the input list without copying Tests added: - test_normalize_resumed_input_drops_orphaned_message_after_consumed_reasoning - test_server_conversation_tracker_drops_orphaned_message_after_consumed_reasoning All 155 tests pass.
Resetting consumed_by_call after the first dropped message meant a second orphaned message in the same turn — e.g. [reasoning, function_call, msg1, msg2, function_call_output] — would slip through unchecked. Remove the reset from the message branch entirely. The only correct reset point is _CALL_OUTPUT_TYPES (the call output item), which marks the actual turn boundary. Messages that arrive before any call output while consumed_by_call is True are all orphaned and are all dropped; messages that arrive after the call output (delegate agent, next turn) are unaffected because the flag has already been cleared. Add test_normalize_resumed_input_drops_multiple_orphaned_messages_in_same_turn to cover this case explicitly.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 20da2dce0a
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| filtered_generated_items = drop_orphan_function_calls(normalized_generated_items) | ||
| filtered_generated_items = drop_orphaned_messages_after_consumed_reasoning(filtered_generated_items) |
There was a problem hiding this comment.
Drop messages before removing their orphaned call context
When a saved/generated turn is [reasoning, function_call, message] with no function_call_output (for example, a resumed or failed tool call before any output was persisted), drop_orphan_function_calls() removes the call and its preceding reasoning before the new message-pruning pass runs. The remaining assistant message no longer has the consumed-call context, so it is preserved and can still be sent without its required reasoning item, which is the provider 400 this PR is trying to prevent; either prune these trailing messages before removing orphan calls or have the orphan-call pass remove messages tied to the dropped call as well.
Useful? React with 👍 / 👎.
Problem
When a reasoning-enabled agent hands off to another agent, the model turn can emit
[reasoning, function_call, message]in a single response. Providers such as Azure OpenAItreat the
reasoningitem as consumed by thefunction_call. The trailingmessageitemthen has no paired reasoning and is rejected with HTTP 400:
Item 'msg_...' of type 'message' was provided without its required 'reasoning' item 'rs_...'
The SDK faithfully forwards all three items as
input[]to the next API call, causing everyhandoff in a reasoning-enabled multi-agent pipeline to fail.
What the invalid input looks like
[user question]
[rs_111] ← reasoning item
[fc_222] ← transfer_to_nextagent (consumes rs_111 per provider rules)
[fc_output]
[msg_333] ← orphaned — no paired reasoning → HTTP 400
Fix
Adds
drop_orphaned_messages_after_consumed_reasoning()insrc/agents/run_internal/items.py, called fromprepare_model_input_items()alongsidethe existing
drop_orphan_function_calls().The function uses a simple state machine: a reasoning item is marked fresh when emitted
and consumed when a
function_callfollows it. Any subsequentmessageitem without afresh reasoning partner is dropped before the payload reaches the API.
This is the inverse of
drop_orphan_function_calls(), which removes function callswithout outputs and their preceding reasoning items.
Test
test_handoff_drops_orphaned_message_after_consumed_reasoningintests/test_agent_runner.py:[reasoning, handoff_call, message]input[]for the delegate's turn