Python: Fix _deduplicate_messages catch-all branch dropping valid repeated messages#4716
Python: Fix _deduplicate_messages catch-all branch dropping valid repeated messages#4716moonbox3 wants to merge 6 commits intomicrosoft:mainfrom
Conversation
…ssages (microsoft#4682) Remove the catch-all dedup branch that used (role, hash(content_str)) as a dedup key. This incorrectly treated any two messages with the same role and identical content as duplicates, dropping valid repeated messages (e.g., a user saying 'yes' to confirm two separate things). The tool-specific dedup branches (tool results by call_id, assistant tool calls by call_id tuple) remain unchanged as they correctly identify true protocol-level duplicates. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
moonbox3
left a comment
There was a problem hiding this comment.
Automated Code Review
Reviewers: 4 | Confidence: 87%
✓ Correctness
This diff removes content-based deduplication for general (non-tool-call, non-tool-result) messages, so identical user/system/assistant messages at different points in a conversation are preserved rather than collapsed. The change is logically correct: the
seen_keysdictionary remains properly used by the two tool-related branches, and the else branch now simply appends. The updated and new tests correctly validate the new behavior. No correctness issues found.
✓ Security Reliability
The diff removes hash-based deduplication of general (non-tool) messages, which previously could silently drop legitimate repeated user messages (e.g., repeated confirmations). The change improves reliability by preserving all general messages and removes use of Python's non-stable
hash()on stringified content. No new security or reliability issues are introduced. Tests are updated to reflect the corrected behavior.
✓ Test Coverage
The diff removes general-message deduplication logic and updates tests accordingly. The three tests covering the new behaviour (duplicate user messages preserved, interleaved confirmations preserved, repeated system messages preserved) are reasonable in scope, but all of them assert only on list length without verifying message content or ordering. For
test_deduplicate_preserves_repeated_confirmationsin particular, which mixes roles, a length-only check does not prove the messages are in the correct order or that the right messages survived. Additionally, there is no test confirming that messages withcontents=None(empty content) still pass through the now-simplified else branch without error, which is a reachable edge case.
✗ Design Approach
The diff removes deduplication logic for the 'else' branch of
_deduplicate_messages, which previously deduplicated general messages (user, system, etc.) by content hash. The fix is justified if the original deduplication was causing legitimate messages to be dropped (e.g., repeated user confirmations). However, the approach is a complete removal rather than a targeted fix — it swings from 'always deduplicate same-content messages' to 'never deduplicate same-content messages'. The deeper question is: what is the correct deduplication scope? The tool-call branches deduplicate by structural identity (call_id, tool_call_ids), which is meaningful because duplicate tool results/calls are usually protocol artifacts. For regular messages, the same-content check was catching a different class of duplicates (e.g., the same message appended twice due to a bug upstream). Silently dropping that check means any upstream bug that sends duplicate messages will now silently propagate. A better approach would be to narrow the scope — e.g., only deduplicate consecutive identical messages, or only deduplicate when role is 'system', rather than removing the check entirely. Additionally,seen_keysis now populated only for tool and assistant-tool-call branches, but it is still declared and passed around unused for the else branch, which is a latent confusion.
Flagged Issues
- The fix completely removes content-based deduplication for non-tool messages rather than narrowing it. Any upstream source that accidentally sends duplicate user/system messages (a real scenario during retries or reconnects) will silently propagate duplicates to the LLM, potentially causing unexpected behavior. A scoped fix—e.g., deduplicating only consecutive identical messages—would address the reported regression without losing all protection.
Suggestions
- Consider replacing the blanket removal with consecutive-duplicate detection: only skip a message if the immediately preceding message has the same role and content hash. This preserves protection against the common upstream-resend artifact while allowing the same message to appear legitimately at different points in the conversation (e.g., repeated user confirmations after different assistant turns).
- The
seen_keysdict is now unused for the else branch but is still declared and maintained for tool branches. Add a comment or restructure to make clear why the else branch intentionally does no deduplication, so future readers don't treat it as an oversight. - In the new tests, assert on message identity and order (e.g.,
assert result == [confirm1, assistant, confirm2]) rather than justlen(result). A length-only assertion would still pass if the function returned the wrong messages or reordered them. - Consider adding a test for messages with
contents=Nonegoing through the simplified else branch, to verify noAttributeErroror unexpected filtering occurs. - Consider whether unbounded preservation of duplicate general messages could be a resource-exhaustion vector if message lists originate from untrusted input. A configurable upper bound on total message count may be warranted as a defense-in-depth measure.
Automated review by moonbox3's agents
There was a problem hiding this comment.
Pull request overview
Fixes AG-UI message history handling by ensuring _deduplicate_messages no longer drops legitimate repeated non-tool messages (e.g., repeated user confirmations or identical system messages), while keeping the tool-result and assistant-tool-call deduplication behavior.
Changes:
- Removed catch-all
(role, hash(content))deduplication so general messages are always preserved. - Updated/added regression tests verifying repeated user/system messages are retained.
- Kept existing dedup rules for tool results (by
call_id) and assistant tool calls (by tool call IDs).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| python/packages/ag-ui/agent_framework_ag_ui/_message_adapters.py | Removes content-hash-based deduplication in the catch-all branch so repeated general messages are preserved. |
| python/packages/ag-ui/tests/ag_ui/test_message_adapters.py | Updates existing test expectation and adds new regression tests for repeated confirmations and system messages. |
You can also share your feedback on Copilot code review. Take the survey.
Python Test Coverage Report •
Python Unit Test Overview
|
||||||||||||||||||||||||||||||
…microsoft#4682) - Replace blanket dedup removal with consecutive-duplicate detection: only skip a message if the immediately preceding message has the same role and content, preserving protection against upstream replays while allowing identical messages at different conversation points. - Strengthen test assertions to verify message identity and order, not just list length. - Add tests for consecutive duplicate skipping, non-consecutive preservation, and messages with contents=None. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
moonbox3
left a comment
There was a problem hiding this comment.
Automated Code Review
Reviewers: 4 | Confidence: 96% | Result: All clear
Reviewed: Correctness, Security Reliability, Test Coverage, Design Approach
Automated review by moonbox3's agents
Deduplicate general messages by message_id when available, replacing the consecutive-duplicate content check. Two messages with the same id are definitively the same message (upstream replay), while identical content with distinct ids (e.g. repeated "yes" confirmations) is preserved. Messages without a message_id are always kept.
There was a problem hiding this comment.
Pull request overview
Updates AG-UI message deduplication to avoid dropping legitimate repeated non-tool messages by removing content-hash-based deduplication and switching to message-id-based deduplication when available.
Changes:
- Change
_deduplicate_messagesto deduplicate general messages bymessage_id(when present) rather than(role, hash(content)). - Replace/expand regression tests to ensure repeated user/system messages with distinct IDs are preserved and same-ID replays are deduped.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
python/packages/ag-ui/agent_framework_ag_ui/_message_adapters.py |
Adjusts catch-all deduplication logic to rely on message_id (when present) instead of content hashing. |
python/packages/ag-ui/tests/ag_ui/test_message_adapters.py |
Adds regression tests covering repeated messages with distinct IDs and same-ID replays. |
You can also share your feedback on Copilot code review. Take the survey.
python/packages/ag-ui/agent_framework_ag_ui/_message_adapters.py
Outdated
Show resolved
Hide resolved
moonbox3
left a comment
There was a problem hiding this comment.
Automated Code Review
Reviewers: 4 | Confidence: 85%
✓ Correctness
The change replaces content-hash-based deduplication with message_id-based deduplication in the else branch. The logic is correct: messages with a message_id are deduplicated by that id, and messages without a message_id are always preserved (no false-positive dedup of repeated identical content like 'yes' confirmations). The key tuple ('id', msg.message_id) avoids collisions with keys from the tool-result and assistant-tool-call branches (which use role_value as the first element). Tests comprehensively cover the new behavior. No correctness bugs found.
✓ Security Reliability
The change replaces content-hash deduplication with message_id-based deduplication. This is a sound design improvement that avoids false-positive dedup of intentionally repeated messages. However, it introduces a reliability gap: messages without a message_id are now never deduplicated, meaning a misbehaving or replaying client that omits message_ids can cause unbounded message accumulation. The logging of raw message_id values also presents a minor log-injection surface. No blocking security issues found.
✓ Test Coverage
Test coverage for the new message_id-based deduplication logic is thorough. The tests cover the main branches well: dedup by same message_id, preservation with distinct ids, no-id passthrough, None contents, and mixed id/no-id scenarios. One minor gap: there is no test verifying behavior when
message_idis an empty string (which is not None and would be used as a dedup key, potentially causing unintended collisions).
✗ Design Approach
The change replaces content-hash-based deduplication with ID-based deduplication for general messages. The motivation is correct — content hashing caused false positives (deduplicating legitimate repeated messages like "yes" confirmations). However, the fix goes too far in the other direction: when
message_idis absent (None), the new code performs no deduplication at all, silently abandoning the protection the function was designed to provide. The testtest_deduplicate_without_message_id_always_preservesexplicitly enshrines this as intended behavior, but it means the function's deduplication guarantee is now conditional on callers settingmessage_id. A better approach is to use ID-based deduplication when IDs are present and fall back to content-based deduplication when they are not, preserving best-effort protection without regressing on the no-ID case.
Flagged Issues
- When message_id is None, all deduplication is silently disabled. Any call path that does not populate message_id (e.g., older integrations, third-party adapters) will accumulate genuine duplicates the old code correctly suppressed. Prefer a two-tier strategy: use ID-based deduplication when message_id is present, and fall back to content-based deduplication when it is absent.
Suggestions
- Guard against empty-string message_id values:
msg.message_id is not Nonetreatsmessage_id=""as a valid dedup key, collapsing all such messages into one. A truthy check (if msg.message_id:) would be safer. Add a corresponding test for the empty-string edge case to document expected behavior. - Sanitise message_id before logging (line 255). If message_id originates from an untrusted source and contains newlines or control characters, it can split or corrupt log output. Use
repr(msg.message_id)or strip control characters. - Consider logging a debug-level warning when a message lacks a message_id, so operators can detect integrations that silently lose deduplication and risk unbounded context growth.
Automated review by moonbox3's agents
- Use truthy check (`if msg.message_id`) instead of `is not None` so empty-string IDs fall through to content-hash dedup rather than collapsing unrelated messages. - Add content-hash fallback for messages without message_id, preventing false negatives from integrations that don't set IDs. - Remove raw message_id from log format string (addresses log-injection surface with control characters). - Add tests for empty-string message_id edge cases. - Update existing tests to reflect content-hash dedup behavior. Fixes microsoft#4682 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Thanks for adding the fallback! One concern: using a global seen_keys for content hashes might accidentally remove intentional repeats (e.g., a user saying "Yes" or "Confirm" multiple times in one session). What if we limit the fallback check to consecutive messages only? This catches UI glitches while preserving legitimate repeats later in the chat: last_key = None
for msg in messages:
# ... (existing logic) ...
elif msg.message_id:
# ... (message_id logic)
# Fallback: check only if it matches the immediately preceding message
else:
key = (role_value, hash(content_str))
if key == last_key:
continue
last_key = key
unique_messages.append(msg)This seems safer for natural conversation flow. What do you think? |
Motivation and Context
The catch-all
elsebranch in_deduplicate_messageshashed message content and skipped any message whose role+content matched a previously seen message. This incorrectly dropped legitimate repeated user messages (e.g., repeated "yes" confirmations) and repeated system messages, silently altering conversation history.Fixes #4682
Description
The root cause was that the
elsebranch applied content-based deduplication to all message types not handled by the earliertool_call_id/ assistant-tool-call branches, using a(role, hash(content))key. This is wrong because identical content at different positions in a conversation carries distinct semantic meaning. The fix removes the hash-based deduplication from the catch-all branch so that non-tool, non-assistant-tool-call messages are always preserved. Regression tests verify that repeated user confirmations, identical user messages, and duplicate system messages are all kept.Contribution Checklist