Skip to content

fix: drop message items orphaned by handoff function calls consuming their reasoning item#3574

Open
utkarshkr100 wants to merge 8 commits into
openai:mainfrom
utkarshkr100:fix/drop-orphaned-message-after-handoff-reasoning
Open

fix: drop message items orphaned by handoff function calls consuming their reasoning item#3574
utkarshkr100 wants to merge 8 commits into
openai:mainfrom
utkarshkr100:fix/drop-orphaned-message-after-handoff-reasoning

Conversation

@utkarshkr100
Copy link
Copy Markdown

Problem

When a reasoning-enabled agent hands off to another agent, the model turn can emit
[reasoning, function_call, message] in a single response. Providers such as Azure OpenAI
treat the reasoning item as consumed by the function_call. The trailing message item
then has no paired reasoning and is rejected with HTTP 400:

Item 'msg_...' of type 'message' was provided without its required 'reasoning' item 'rs_...'

The SDK faithfully forwards all three items as input[] to the next API call, causing every
handoff in a reasoning-enabled multi-agent pipeline to fail.

What the invalid input looks like

[user question]
[rs_111] ← reasoning item
[fc_222] ← transfer_to_nextagent (consumes rs_111 per provider rules)
[fc_output]
[msg_333] ← orphaned — no paired reasoning → HTTP 400

Fix

Adds drop_orphaned_messages_after_consumed_reasoning() in
src/agents/run_internal/items.py, called from prepare_model_input_items() alongside
the existing drop_orphan_function_calls().

The function uses a simple state machine: a reasoning item is marked fresh when emitted
and consumed when a function_call follows it. Any subsequent message item without a
fresh reasoning partner is dropped before the payload reaches the API.

This is the inverse of drop_orphan_function_calls(), which removes function calls
without outputs and their preceding reasoning items.

Test

test_handoff_drops_orphaned_message_after_consumed_reasoning in tests/test_agent_runner.py:

  • Sets up a triage → delegate handoff where the first turn emits [reasoning, handoff_call, message]
  • Asserts the orphaned message is absent from input[] for the delegate's turn
  • Fails before this fix, passes after
  • All 151 existing tests continue to pass

…their reasoning item

When a model turn during a handoff emits [reasoning, function_call, message], providers
such as Azure OpenAI treat the reasoning item as consumed by the function_call. The
trailing message item then has no paired reasoning and is rejected with HTTP 400:

  Item 'msg_...' of type 'message' was provided without its required 'reasoning' item

Add drop_orphaned_messages_after_consumed_reasoning() and call it from
prepare_model_input_items() alongside the existing drop_orphan_function_calls() pass.
The new function tracks whether the most-recent reasoning item has been consumed by a
function_call and drops any subsequent message item that would be left without a partner.

This is the inverse of drop_orphan_function_calls(), which removes function calls
without outputs and their preceding reasoning items.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 737335ccc5

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +216 to +220
elif item_type == "message":
if had_any_reasoning and not fresh_reasoning:
pass # orphaned — no paired reasoning available; drop to prevent API rejection
else:
result.append(item)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Limit orphan pruning to the consumed reasoning turn

Because generated_items is cumulative across turns before it reaches prepare_model_input_items, this state machine can span multiple model responses. After any reasoning-backed function_call/computer_call, fresh_reasoning stays false until another reasoning item, so a later valid assistant message from a provider/model turn that does not emit reasoning (for example a message plus another tool/handoff that requires a following turn) is silently removed even though it is not the trailing message orphaned by the original handoff call. That loses assistant history and can change subsequent tool or handoff behavior; the pruning should be constrained to messages tied to the same consumed reasoning item/response, not every later message in the accumulated history.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. The state machine was too broad; fresh_reasoning stayed false across all accumulated turns, meaning any later agent turn emitting a message without its own reasoning item would also be silently dropped.

Fixed by replacing the two-flag approach with a single consumed_by_call flag that resets to False as soon as the first orphaned message is dropped. Pruning is now scoped to exactly one trailing message per handoff turn. Updated the test to make the cross-turn invariant explicit: the delegate's response (no reasoning) must survive and appear in final_output.

The previous state machine carried fresh_reasoning=False across all subsequent
turns, incorrectly dropping valid assistant messages from later agents that
legitimately emit responses without a reasoning item.

Replace had_any_reasoning + fresh_reasoning with a single consumed_by_call flag
that is reset to False as soon as the first orphaned message is dropped. This
limits pruning to the one trailing message inside the same handoff turn and
leaves all subsequent turns unaffected.

Add clarifying comments to the test showing that the delegate agent response
(no reasoning) must survive and reach final_output.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4be2bbeb25

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/agents/run_internal/items.py Outdated
…turn bleed

When the handoff turn emits [reasoning, function_call] with no trailing message,
consumed_by_call stayed True and leaked into the next accumulated turn, silently
dropping the delegate agent response.

The SDK appends HandoffOutputItem (function_call_output) after all model output
items, so any orphaned trailing message is dropped before we reach fc_out.
Resetting consumed_by_call at function_call_output therefore scopes the drop to
the current handoff sequence only and keeps all subsequent turns clean.

Add test_handoff_without_trailing_message_keeps_delegate_response to cover
this path explicitly.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b3dcb805cd

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/agents/run_internal/items.py Outdated
…on_call_output

computer_call_output, shell_call_output, and other output types were not
resetting consumed_by_call, so a reasoning-backed computer_call with no
trailing message would leak the flag into the next turn and silently drop
the following assistant message.

Extract _CALL_OUTPUT_TYPES = frozenset(_TOOL_CALL_TO_OUTPUT_TYPE.values())
and use it as the reset condition so every call output type is covered.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5ddb96218e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".


normalized_generated_items = normalize_input_items_for_api(list(generated_items))
filtered_generated_items = drop_orphan_function_calls(normalized_generated_items)
filtered_generated_items = drop_orphaned_messages_after_consumed_reasoning(filtered_generated_items)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Apply orphan-message pruning to session history

This only prunes orphaned messages when they are in the current run's generated_items; with a Session, save_result_to_session() still persists the original new run items (including the handoff agent's trailing message), and the next run's prepare_input_with_session() rebuilds history using only drop_orphan_function_calls() before sending it. In a reasoning handoff followed by another Runner.run(..., session=same_session), the saved history can therefore re-send [reasoning, function_call, message] and hit the same provider 400 this PR is fixing; the pruned message should either not be saved or the new pruning pass should also be applied to session history.

Useful? React with 👍 / 👎.

save_result_to_session() persists raw run items including any orphaned
trailing message from a reasoning handoff turn. On the next Runner.run()
with the same session, prepare_input_with_session() rebuilt history using
only drop_orphan_function_calls(), so the orphaned message was re-sent
to the provider and triggered the same HTTP 400.

Import drop_orphaned_messages_after_consumed_reasoning into
session_persistence.py and call it immediately after drop_orphan_function_calls()
in prepare_input_with_session(), mirroring the existing pattern for
function-call orphan pruning.

Add test_session_history_drops_orphaned_message_on_next_run to verify the
session replay path explicitly.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 702595022f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/agents/run_internal/items.py Outdated
fresh_reasoning = True
consumed_by_call = False
result.append(item)
elif item_type in ("function_call", "computer_call"):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Treat every tool call as consuming reasoning

For reasoning-capable Responses runs where the model emits a supported non-function_call tool call such as custom_tool_call, shell_call, apply_patch_call, local_shell_call, or tool_search_call followed by a message, this branch leaves fresh_reasoning set and never marks the call as consuming the reasoning item. Those call types are already replayed and matched to outputs via _TOOL_CALL_TO_OUTPUT_TYPE, so the same invalid history this helper is meant to prevent can still be sent back as [reasoning, <tool_call>, message] and trigger the provider error for an assistant message without its own reasoning item. Consider basing the consumption check on _TOOL_CALL_TO_OUTPUT_TYPE rather than only function_call/computer_call.

Useful? React with 👍 / 👎.

…ion_call/computer_call

custom_tool_call, shell_call, apply_patch_call, local_shell_call, and
tool_search_call were not setting consumed_by_call, so a reasoning item
followed by any of those call types and then a message would still be
sent as [reasoning, <tool_call>, message] and trigger the same provider
400 this PR is fixing.

Replace the hardcoded (function_call, computer_call) tuple with
_TOOL_CALL_TO_OUTPUT_TYPE, which already enumerates every call type
that the runtime tracks and matches to outputs.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 04a4072db9

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/agents/run_internal/items.py Outdated
if consumed_by_call:
# Orphaned: reasoning was consumed by the preceding function_call and no
# function_call_output has reset the flag yet. Drop and reset.
consumed_by_call = False
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep dropping all orphaned messages before the call output

If a model response contains more than one message item after a reasoning-backed tool call, resetting consumed_by_call after dropping the first one leaves the remaining message in input[] until the later *_call_output item is encountered. Because the Responses output is a list and the runner preserves each ResponseOutputMessage, a turn like [reasoning, function_call, message, message] still resends the second orphaned message and can hit the same provider 400 this helper is meant to avoid; the consumed state should stay active until a call output or a new reasoning item resets it.

Useful? React with 👍 / 👎.

…ion tracker

Map of every path that assembles input[] for the model, cross-checked against
which ones already call drop_orphan_function_calls (all of them should also
call drop_orphaned_messages_after_consumed_reasoning):

  Path                                      Before  After
  prepare_model_input_items                  done    done
  prepare_input_with_session                 done    done
  normalize_resumed_input (RunState resume)  miss    fixed
  OpenAIServerConversationTracker.prepare    miss    fixed

Changes:
- items.py: normalize_resumed_input chains drop_orphaned_messages_after_consumed_reasoning
  after drop_orphan_function_calls (same pattern as the other call sites)
- oai_conversation.py: import + one-line call after drop_orphan_function_calls in
  OpenAIServerConversationTracker.prepare_input; id() tracking is safe because the
  function returns items from the input list without copying

Tests added:
- test_normalize_resumed_input_drops_orphaned_message_after_consumed_reasoning
- test_server_conversation_tracker_drops_orphaned_message_after_consumed_reasoning

All 155 tests pass.
Resetting consumed_by_call after the first dropped message meant a second
orphaned message in the same turn — e.g. [reasoning, function_call, msg1,
msg2, function_call_output] — would slip through unchecked.

Remove the reset from the message branch entirely. The only correct reset
point is _CALL_OUTPUT_TYPES (the call output item), which marks the actual
turn boundary. Messages that arrive before any call output while
consumed_by_call is True are all orphaned and are all dropped; messages
that arrive after the call output (delegate agent, next turn) are
unaffected because the flag has already been cleared.

Add test_normalize_resumed_input_drops_multiple_orphaned_messages_in_same_turn
to cover this case explicitly.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 20da2dce0a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 275 to +276
filtered_generated_items = drop_orphan_function_calls(normalized_generated_items)
filtered_generated_items = drop_orphaned_messages_after_consumed_reasoning(filtered_generated_items)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Drop messages before removing their orphaned call context

When a saved/generated turn is [reasoning, function_call, message] with no function_call_output (for example, a resumed or failed tool call before any output was persisted), drop_orphan_function_calls() removes the call and its preceding reasoning before the new message-pruning pass runs. The remaining assistant message no longer has the consumed-call context, so it is preserved and can still be sent without its required reasoning item, which is the provider 400 this PR is trying to prevent; either prune these trailing messages before removing orphan calls or have the orphan-call pass remove messages tied to the dropped call as well.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant