Skip to content

Python: Fix _deduplicate_messages catch-all branch dropping valid repeated messages#4716

Open
moonbox3 wants to merge 6 commits intomicrosoft:mainfrom
moonbox3:agent/fix-4682-1
Open

Python: Fix _deduplicate_messages catch-all branch dropping valid repeated messages#4716
moonbox3 wants to merge 6 commits intomicrosoft:mainfrom
moonbox3:agent/fix-4682-1

Conversation

@moonbox3
Copy link
Contributor

Motivation and Context

The catch-all else branch in _deduplicate_messages hashed message content and skipped any message whose role+content matched a previously seen message. This incorrectly dropped legitimate repeated user messages (e.g., repeated "yes" confirmations) and repeated system messages, silently altering conversation history.

Fixes #4682

Description

The root cause was that the else branch applied content-based deduplication to all message types not handled by the earlier tool_call_id / assistant-tool-call branches, using a (role, hash(content)) key. This is wrong because identical content at different positions in a conversation carries distinct semantic meaning. The fix removes the hash-based deduplication from the catch-all branch so that non-tool, non-assistant-tool-call messages are always preserved. Regression tests verify that repeated user confirmations, identical user messages, and duplicate system messages are all kept.

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • All unit tests pass, and I have added new tests where possible
  • Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

Note: PR autogenerated by moonbox3's agent

…ssages (microsoft#4682)

Remove the catch-all dedup branch that used (role, hash(content_str)) as a
dedup key. This incorrectly treated any two messages with the same role and
identical content as duplicates, dropping valid repeated messages (e.g., a
user saying 'yes' to confirm two separate things).

The tool-specific dedup branches (tool results by call_id, assistant tool
calls by call_id tuple) remain unchanged as they correctly identify true
protocol-level duplicates.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 16, 2026 03:05
@moonbox3 moonbox3 self-assigned this Mar 16, 2026
Copy link
Contributor Author

@moonbox3 moonbox3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 4 | Confidence: 87%

✓ Correctness

This diff removes content-based deduplication for general (non-tool-call, non-tool-result) messages, so identical user/system/assistant messages at different points in a conversation are preserved rather than collapsed. The change is logically correct: the seen_keys dictionary remains properly used by the two tool-related branches, and the else branch now simply appends. The updated and new tests correctly validate the new behavior. No correctness issues found.

✓ Security Reliability

The diff removes hash-based deduplication of general (non-tool) messages, which previously could silently drop legitimate repeated user messages (e.g., repeated confirmations). The change improves reliability by preserving all general messages and removes use of Python's non-stable hash() on stringified content. No new security or reliability issues are introduced. Tests are updated to reflect the corrected behavior.

✓ Test Coverage

The diff removes general-message deduplication logic and updates tests accordingly. The three tests covering the new behaviour (duplicate user messages preserved, interleaved confirmations preserved, repeated system messages preserved) are reasonable in scope, but all of them assert only on list length without verifying message content or ordering. For test_deduplicate_preserves_repeated_confirmations in particular, which mixes roles, a length-only check does not prove the messages are in the correct order or that the right messages survived. Additionally, there is no test confirming that messages with contents=None (empty content) still pass through the now-simplified else branch without error, which is a reachable edge case.

✗ Design Approach

The diff removes deduplication logic for the 'else' branch of _deduplicate_messages, which previously deduplicated general messages (user, system, etc.) by content hash. The fix is justified if the original deduplication was causing legitimate messages to be dropped (e.g., repeated user confirmations). However, the approach is a complete removal rather than a targeted fix — it swings from 'always deduplicate same-content messages' to 'never deduplicate same-content messages'. The deeper question is: what is the correct deduplication scope? The tool-call branches deduplicate by structural identity (call_id, tool_call_ids), which is meaningful because duplicate tool results/calls are usually protocol artifacts. For regular messages, the same-content check was catching a different class of duplicates (e.g., the same message appended twice due to a bug upstream). Silently dropping that check means any upstream bug that sends duplicate messages will now silently propagate. A better approach would be to narrow the scope — e.g., only deduplicate consecutive identical messages, or only deduplicate when role is 'system', rather than removing the check entirely. Additionally, seen_keys is now populated only for tool and assistant-tool-call branches, but it is still declared and passed around unused for the else branch, which is a latent confusion.

Flagged Issues

  • The fix completely removes content-based deduplication for non-tool messages rather than narrowing it. Any upstream source that accidentally sends duplicate user/system messages (a real scenario during retries or reconnects) will silently propagate duplicates to the LLM, potentially causing unexpected behavior. A scoped fix—e.g., deduplicating only consecutive identical messages—would address the reported regression without losing all protection.

Suggestions

  • Consider replacing the blanket removal with consecutive-duplicate detection: only skip a message if the immediately preceding message has the same role and content hash. This preserves protection against the common upstream-resend artifact while allowing the same message to appear legitimately at different points in the conversation (e.g., repeated user confirmations after different assistant turns).
  • The seen_keys dict is now unused for the else branch but is still declared and maintained for tool branches. Add a comment or restructure to make clear why the else branch intentionally does no deduplication, so future readers don't treat it as an oversight.
  • In the new tests, assert on message identity and order (e.g., assert result == [confirm1, assistant, confirm2]) rather than just len(result). A length-only assertion would still pass if the function returned the wrong messages or reordered them.
  • Consider adding a test for messages with contents=None going through the simplified else branch, to verify no AttributeError or unexpected filtering occurs.
  • Consider whether unbounded preservation of duplicate general messages could be a resource-exhaustion vector if message lists originate from untrusted input. A configurable upper bound on total message count may be warranted as a defense-in-depth measure.

Automated review by moonbox3's agents

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes AG-UI message history handling by ensuring _deduplicate_messages no longer drops legitimate repeated non-tool messages (e.g., repeated user confirmations or identical system messages), while keeping the tool-result and assistant-tool-call deduplication behavior.

Changes:

  • Removed catch-all (role, hash(content)) deduplication so general messages are always preserved.
  • Updated/added regression tests verifying repeated user/system messages are retained.
  • Kept existing dedup rules for tool results (by call_id) and assistant tool calls (by tool call IDs).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
python/packages/ag-ui/agent_framework_ag_ui/_message_adapters.py Removes content-hash-based deduplication in the catch-all branch so repeated general messages are preserved.
python/packages/ag-ui/tests/ag_ui/test_message_adapters.py Updates existing test expectation and adds new regression tests for repeated confirmations and system messages.

You can also share your feedback on Copilot code review. Take the survey.

@markwallace-microsoft
Copy link
Member

markwallace-microsoft commented Mar 16, 2026

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
packages/ag-ui/agent_framework_ag_ui
   _message_adapters.py5925590%102–103, 112–115, 118–122, 124–129, 132, 141–147, 187, 318, 408–411, 413–415, 462–464, 518, 521, 523, 526, 529, 545, 562, 584, 680, 696–697, 768, 790, 860, 895–896, 964, 1007
TOTAL24000264288% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
5230 20 💤 0 ❌ 0 🔥 1m 24s ⏱️

Copilot and others added 2 commits March 16, 2026 03:09
…microsoft#4682)

- Replace blanket dedup removal with consecutive-duplicate detection:
  only skip a message if the immediately preceding message has the same
  role and content, preserving protection against upstream replays while
  allowing identical messages at different conversation points.
- Strengthen test assertions to verify message identity and order, not
  just list length.
- Add tests for consecutive duplicate skipping, non-consecutive
  preservation, and messages with contents=None.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Contributor Author

@moonbox3 moonbox3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 4 | Confidence: 96% | Result: All clear

Reviewed: Correctness, Security Reliability, Test Coverage, Design Approach


Automated review by moonbox3's agents

@moonbox3 moonbox3 added the ag-ui label Mar 16, 2026
@moonbox3 moonbox3 enabled auto-merge March 16, 2026 03:18
Deduplicate general messages by message_id when available, replacing
the consecutive-duplicate content check. Two messages with the same id
are definitively the same message (upstream replay), while identical
content with distinct ids (e.g. repeated "yes" confirmations) is
preserved. Messages without a message_id are always kept.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates AG-UI message deduplication to avoid dropping legitimate repeated non-tool messages by removing content-hash-based deduplication and switching to message-id-based deduplication when available.

Changes:

  • Change _deduplicate_messages to deduplicate general messages by message_id (when present) rather than (role, hash(content)).
  • Replace/expand regression tests to ensure repeated user/system messages with distinct IDs are preserved and same-ID replays are deduped.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
python/packages/ag-ui/agent_framework_ag_ui/_message_adapters.py Adjusts catch-all deduplication logic to rely on message_id (when present) instead of content hashing.
python/packages/ag-ui/tests/ag_ui/test_message_adapters.py Adds regression tests covering repeated messages with distinct IDs and same-ID replays.

You can also share your feedback on Copilot code review. Take the survey.

Copy link
Contributor Author

@moonbox3 moonbox3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 4 | Confidence: 85%

✓ Correctness

The change replaces content-hash-based deduplication with message_id-based deduplication in the else branch. The logic is correct: messages with a message_id are deduplicated by that id, and messages without a message_id are always preserved (no false-positive dedup of repeated identical content like 'yes' confirmations). The key tuple ('id', msg.message_id) avoids collisions with keys from the tool-result and assistant-tool-call branches (which use role_value as the first element). Tests comprehensively cover the new behavior. No correctness bugs found.

✓ Security Reliability

The change replaces content-hash deduplication with message_id-based deduplication. This is a sound design improvement that avoids false-positive dedup of intentionally repeated messages. However, it introduces a reliability gap: messages without a message_id are now never deduplicated, meaning a misbehaving or replaying client that omits message_ids can cause unbounded message accumulation. The logging of raw message_id values also presents a minor log-injection surface. No blocking security issues found.

✓ Test Coverage

Test coverage for the new message_id-based deduplication logic is thorough. The tests cover the main branches well: dedup by same message_id, preservation with distinct ids, no-id passthrough, None contents, and mixed id/no-id scenarios. One minor gap: there is no test verifying behavior when message_id is an empty string (which is not None and would be used as a dedup key, potentially causing unintended collisions).

✗ Design Approach

The change replaces content-hash-based deduplication with ID-based deduplication for general messages. The motivation is correct — content hashing caused false positives (deduplicating legitimate repeated messages like "yes" confirmations). However, the fix goes too far in the other direction: when message_id is absent (None), the new code performs no deduplication at all, silently abandoning the protection the function was designed to provide. The test test_deduplicate_without_message_id_always_preserves explicitly enshrines this as intended behavior, but it means the function's deduplication guarantee is now conditional on callers setting message_id. A better approach is to use ID-based deduplication when IDs are present and fall back to content-based deduplication when they are not, preserving best-effort protection without regressing on the no-ID case.

Flagged Issues

  • When message_id is None, all deduplication is silently disabled. Any call path that does not populate message_id (e.g., older integrations, third-party adapters) will accumulate genuine duplicates the old code correctly suppressed. Prefer a two-tier strategy: use ID-based deduplication when message_id is present, and fall back to content-based deduplication when it is absent.

Suggestions

  • Guard against empty-string message_id values: msg.message_id is not None treats message_id="" as a valid dedup key, collapsing all such messages into one. A truthy check (if msg.message_id:) would be safer. Add a corresponding test for the empty-string edge case to document expected behavior.
  • Sanitise message_id before logging (line 255). If message_id originates from an untrusted source and contains newlines or control characters, it can split or corrupt log output. Use repr(msg.message_id) or strip control characters.
  • Consider logging a debug-level warning when a message lacks a message_id, so operators can detect integrations that silently lose deduplication and risk unbounded context growth.

Automated review by moonbox3's agents

Copilot and others added 2 commits March 16, 2026 06:12
- Use truthy check (`if msg.message_id`) instead of `is not None` so
  empty-string IDs fall through to content-hash dedup rather than
  collapsing unrelated messages.
- Add content-hash fallback for messages without message_id, preventing
  false negatives from integrations that don't set IDs.
- Remove raw message_id from log format string (addresses log-injection
  surface with control characters).
- Add tests for empty-string message_id edge cases.
- Update existing tests to reflect content-hash dedup behavior.

Fixes microsoft#4682

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@SogoKato
Copy link

Thanks for adding the fallback! One concern: using a global seen_keys for content hashes might accidentally remove intentional repeats (e.g., a user saying "Yes" or "Confirm" multiple times in one session).

What if we limit the fallback check to consecutive messages only? This catches UI glitches while preserving legitimate repeats later in the chat:

last_key = None

for msg in messages:
    # ... (existing logic) ...

    elif msg.message_id:
        # ... (message_id logic)

    # Fallback: check only if it matches the immediately preceding message
    else:
        key = (role_value, hash(content_str))
        if key == last_key:
            continue

        last_key = key
        unique_messages.append(msg)

This seems safer for natural conversation flow. What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python: ag-ui _deduplicate_messages catch-all branch can drop valid repeated messages

5 participants