Python: Fix _deduplicate_messages catch-all branch dropping valid repeated messages by moonbox3 · Pull Request #4716 · microsoft/agent-framework

moonbox3 · 2026-03-16T03:05:56Z

Motivation and Context

The catch-all else branch in _deduplicate_messages hashed message content and skipped any message whose role+content matched a previously seen message. This incorrectly dropped legitimate repeated user messages (e.g., repeated "yes" confirmations) and repeated system messages, silently altering conversation history.

Fixes #4682

Description

The root cause was that the else branch applied content-based deduplication to all message types not handled by the earlier tool_call_id / assistant-tool-call branches, using a (role, hash(content)) key. This is wrong because identical content at different positions in a conversation carries distinct semantic meaning. The fix removes the hash-based deduplication from the catch-all branch so that non-tool, non-assistant-tool-call messages are always preserved. Regression tests verify that repeated user confirmations, identical user messages, and duplicate system messages are all kept.

Contribution Checklist

The code builds clean without any errors or warnings
The PR follows the Contribution Guidelines
All unit tests pass, and I have added new tests where possible
Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

Note: PR autogenerated by moonbox3's agent

…ssages (microsoft#4682) Remove the catch-all dedup branch that used (role, hash(content_str)) as a dedup key. This incorrectly treated any two messages with the same role and identical content as duplicates, dropping valid repeated messages (e.g., a user saying 'yes' to confirm two separate things). The tool-specific dedup branches (tool results by call_id, assistant tool calls by call_id tuple) remain unchanged as they correctly identify true protocol-level duplicates. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

moonbox3

Automated Code Review

Reviewers: 4 | Confidence: 87%

✓ Correctness

This diff removes content-based deduplication for general (non-tool-call, non-tool-result) messages, so identical user/system/assistant messages at different points in a conversation are preserved rather than collapsed. The change is logically correct: the seen_keys dictionary remains properly used by the two tool-related branches, and the else branch now simply appends. The updated and new tests correctly validate the new behavior. No correctness issues found.

✓ Security Reliability

The diff removes hash-based deduplication of general (non-tool) messages, which previously could silently drop legitimate repeated user messages (e.g., repeated confirmations). The change improves reliability by preserving all general messages and removes use of Python's non-stable hash() on stringified content. No new security or reliability issues are introduced. Tests are updated to reflect the corrected behavior.

✓ Test Coverage

The diff removes general-message deduplication logic and updates tests accordingly. The three tests covering the new behaviour (duplicate user messages preserved, interleaved confirmations preserved, repeated system messages preserved) are reasonable in scope, but all of them assert only on list length without verifying message content or ordering. For test_deduplicate_preserves_repeated_confirmations in particular, which mixes roles, a length-only check does not prove the messages are in the correct order or that the right messages survived. Additionally, there is no test confirming that messages with contents=None (empty content) still pass through the now-simplified else branch without error, which is a reachable edge case.

✗ Design Approach

The diff removes deduplication logic for the 'else' branch of _deduplicate_messages, which previously deduplicated general messages (user, system, etc.) by content hash. The fix is justified if the original deduplication was causing legitimate messages to be dropped (e.g., repeated user confirmations). However, the approach is a complete removal rather than a targeted fix — it swings from 'always deduplicate same-content messages' to 'never deduplicate same-content messages'. The deeper question is: what is the correct deduplication scope? The tool-call branches deduplicate by structural identity (call_id, tool_call_ids), which is meaningful because duplicate tool results/calls are usually protocol artifacts. For regular messages, the same-content check was catching a different class of duplicates (e.g., the same message appended twice due to a bug upstream). Silently dropping that check means any upstream bug that sends duplicate messages will now silently propagate. A better approach would be to narrow the scope — e.g., only deduplicate consecutive identical messages, or only deduplicate when role is 'system', rather than removing the check entirely. Additionally, seen_keys is now populated only for tool and assistant-tool-call branches, but it is still declared and passed around unused for the else branch, which is a latent confusion.

Flagged Issues

The fix completely removes content-based deduplication for non-tool messages rather than narrowing it. Any upstream source that accidentally sends duplicate user/system messages (a real scenario during retries or reconnects) will silently propagate duplicates to the LLM, potentially causing unexpected behavior. A scoped fix—e.g., deduplicating only consecutive identical messages—would address the reported regression without losing all protection.

Suggestions

Consider replacing the blanket removal with consecutive-duplicate detection: only skip a message if the immediately preceding message has the same role and content hash. This preserves protection against the common upstream-resend artifact while allowing the same message to appear legitimately at different points in the conversation (e.g., repeated user confirmations after different assistant turns).
The seen_keys dict is now unused for the else branch but is still declared and maintained for tool branches. Add a comment or restructure to make clear why the else branch intentionally does no deduplication, so future readers don't treat it as an oversight.
In the new tests, assert on message identity and order (e.g., assert result == [confirm1, assistant, confirm2]) rather than just len(result). A length-only assertion would still pass if the function returned the wrong messages or reordered them.
Consider adding a test for messages with contents=None going through the simplified else branch, to verify no AttributeError or unexpected filtering occurs.
Consider whether unbounded preservation of duplicate general messages could be a resource-exhaustion vector if message lists originate from untrusted input. A configurable upper bound on total message count may be warranted as a defense-in-depth measure.

Automated review by moonbox3's agents

python/packages/ag-ui/agent_framework_ag_ui/_message_adapters.py

python/packages/ag-ui/tests/ag_ui/test_message_adapters.py

Copilot

Pull request overview

Fixes AG-UI message history handling by ensuring _deduplicate_messages no longer drops legitimate repeated non-tool messages (e.g., repeated user confirmations or identical system messages), while keeping the tool-result and assistant-tool-call deduplication behavior.

Changes:

Removed catch-all (role, hash(content)) deduplication so general messages are always preserved.
Updated/added regression tests verifying repeated user/system messages are retained.
Kept existing dedup rules for tool results (by call_id) and assistant tool calls (by tool call IDs).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
python/packages/ag-ui/agent_framework_ag_ui/_message_adapters.py	Removes content-hash-based deduplication in the catch-all branch so repeated general messages are preserved.
python/packages/ag-ui/tests/ag_ui/test_message_adapters.py	Updates existing test expectation and adds new regression tests for repeated confirmations and system messages.

You can also share your feedback on Copilot code review. Take the survey.

markwallace-microsoft · 2026-03-16T03:08:17Z

Python Test Coverage Report •

File	Stmts	Miss	Cover	Missing
packages/ag-ui/agent_framework_ag_ui
_message_adapters.py	592	55	90%	102–103, 112–115, 118–122, 124–129, 132, 141–147, 187, 318, 408–411, 413–415, 462–464, 518, 521, 523, 526, 529, 545, 562, 584, 680, 696–697, 768, 790, 860, 895–896, 964, 1007
TOTAL	24000	2642	88%

Python Unit Test Overview

Tests	Skipped	Failures	Errors	Time
5230	20 💤	0 ❌	0 🔥	1m 24s ⏱️

…microsoft#4682) - Replace blanket dedup removal with consecutive-duplicate detection: only skip a message if the immediately preceding message has the same role and content, preserving protection against upstream replays while allowing identical messages at different conversation points. - Strengthen test assertions to verify message identity and order, not just list length. - Add tests for consecutive duplicate skipping, non-consecutive preservation, and messages with contents=None. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

moonbox3

Automated Code Review

Reviewers: 4 | Confidence: 96% | Result: All clear

Reviewed: Correctness, Security Reliability, Test Coverage, Design Approach

Automated review by moonbox3's agents

Deduplicate general messages by message_id when available, replacing the consecutive-duplicate content check. Two messages with the same id are definitively the same message (upstream replay), while identical content with distinct ids (e.g. repeated "yes" confirmations) is preserved. Messages without a message_id are always kept.

Copilot

Pull request overview

Updates AG-UI message deduplication to avoid dropping legitimate repeated non-tool messages by removing content-hash-based deduplication and switching to message-id-based deduplication when available.

Changes:

Change _deduplicate_messages to deduplicate general messages by message_id (when present) rather than (role, hash(content)).
Replace/expand regression tests to ensure repeated user/system messages with distinct IDs are preserved and same-ID replays are deduped.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`python/packages/ag-ui/agent_framework_ag_ui/_message_adapters.py`	Adjusts catch-all deduplication logic to rely on `message_id` (when present) instead of content hashing.
`python/packages/ag-ui/tests/ag_ui/test_message_adapters.py`	Adds regression tests covering repeated messages with distinct IDs and same-ID replays.

You can also share your feedback on Copilot code review. Take the survey.

python/packages/ag-ui/agent_framework_ag_ui/_message_adapters.py

moonbox3

Automated Code Review

Reviewers: 4 | Confidence: 85%

✓ Correctness

The change replaces content-hash-based deduplication with message_id-based deduplication in the else branch. The logic is correct: messages with a message_id are deduplicated by that id, and messages without a message_id are always preserved (no false-positive dedup of repeated identical content like 'yes' confirmations). The key tuple ('id', msg.message_id) avoids collisions with keys from the tool-result and assistant-tool-call branches (which use role_value as the first element). Tests comprehensively cover the new behavior. No correctness bugs found.

✓ Security Reliability

The change replaces content-hash deduplication with message_id-based deduplication. This is a sound design improvement that avoids false-positive dedup of intentionally repeated messages. However, it introduces a reliability gap: messages without a message_id are now never deduplicated, meaning a misbehaving or replaying client that omits message_ids can cause unbounded message accumulation. The logging of raw message_id values also presents a minor log-injection surface. No blocking security issues found.

✓ Test Coverage

Test coverage for the new message_id-based deduplication logic is thorough. The tests cover the main branches well: dedup by same message_id, preservation with distinct ids, no-id passthrough, None contents, and mixed id/no-id scenarios. One minor gap: there is no test verifying behavior when message_id is an empty string (which is not None and would be used as a dedup key, potentially causing unintended collisions).

✗ Design Approach

The change replaces content-hash-based deduplication with ID-based deduplication for general messages. The motivation is correct — content hashing caused false positives (deduplicating legitimate repeated messages like "yes" confirmations). However, the fix goes too far in the other direction: when message_id is absent (None), the new code performs no deduplication at all, silently abandoning the protection the function was designed to provide. The test test_deduplicate_without_message_id_always_preserves explicitly enshrines this as intended behavior, but it means the function's deduplication guarantee is now conditional on callers setting message_id. A better approach is to use ID-based deduplication when IDs are present and fall back to content-based deduplication when they are not, preserving best-effort protection without regressing on the no-ID case.

Flagged Issues

When message_id is None, all deduplication is silently disabled. Any call path that does not populate message_id (e.g., older integrations, third-party adapters) will accumulate genuine duplicates the old code correctly suppressed. Prefer a two-tier strategy: use ID-based deduplication when message_id is present, and fall back to content-based deduplication when it is absent.

Suggestions

Guard against empty-string message_id values: msg.message_id is not None treats message_id="" as a valid dedup key, collapsing all such messages into one. A truthy check (if msg.message_id:) would be safer. Add a corresponding test for the empty-string edge case to document expected behavior.
Sanitise message_id before logging (line 255). If message_id originates from an untrusted source and contains newlines or control characters, it can split or corrupt log output. Use repr(msg.message_id) or strip control characters.
Consider logging a debug-level warning when a message lacks a message_id, so operators can detect integrations that silently lose deduplication and risk unbounded context growth.

Automated review by moonbox3's agents

python/packages/ag-ui/agent_framework_ag_ui/_message_adapters.py

- Use truthy check (`if msg.message_id`) instead of `is not None` so empty-string IDs fall through to content-hash dedup rather than collapsing unrelated messages. - Add content-hash fallback for messages without message_id, preventing false negatives from integrations that don't set IDs. - Remove raw message_id from log format string (addresses log-injection surface with control characters). - Add tests for empty-string message_id edge cases. - Update existing tests to reflect content-hash dedup behavior. Fixes microsoft#4682 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

SogoKato · 2026-03-16T07:35:51Z

Thanks for adding the fallback! One concern: using a global seen_keys for content hashes might accidentally remove intentional repeats (e.g., a user saying "Yes" or "Confirm" multiple times in one session).

What if we limit the fallback check to consecutive messages only? This catches UI glitches while preserving legitimate repeats later in the chat:

last_key = None

for msg in messages:
    # ... (existing logic) ...

    elif msg.message_id:
        # ... (message_id logic)

    # Fallback: check only if it matches the immediately preceding message
    else:
        key = (role_value, hash(content_str))
        if key == last_key:
            continue

        last_key = key
        unique_messages.append(msg)

This seems safer for natural conversation flow. What do you think?

Copilot AI review requested due to automatic review settings March 16, 2026 03:05

moonbox3 self-assigned this Mar 16, 2026

markwallace-microsoft added the python label Mar 16, 2026

Copilot started reviewing on behalf of moonbox3 March 16, 2026 03:06 View session

moonbox3 commented Mar 16, 2026

View reviewed changes

python/packages/ag-ui/agent_framework_ag_ui/_message_adapters.py Show resolved Hide resolved

python/packages/ag-ui/tests/ag_ui/test_message_adapters.py Show resolved Hide resolved

python/packages/ag-ui/tests/ag_ui/test_message_adapters.py Show resolved Hide resolved

Copilot AI reviewed Mar 16, 2026

View reviewed changes

Copilot and others added 2 commits March 16, 2026 03:09

Apply pre-commit auto-fixes

83cfd01

moonbox3 commented Mar 16, 2026

View reviewed changes

moonbox3 added the ag-ui label Mar 16, 2026

moonbox3 enabled auto-merge March 16, 2026 03:18

moonbox3 requested a review from Copilot March 16, 2026 05:43

Copilot started reviewing on behalf of moonbox3 March 16, 2026 05:43 View session

Copilot AI reviewed Mar 16, 2026

View reviewed changes

python/packages/ag-ui/agent_framework_ag_ui/_message_adapters.py Outdated Show resolved Hide resolved

moonbox3 commented Mar 16, 2026

View reviewed changes

python/packages/ag-ui/agent_framework_ag_ui/_message_adapters.py Show resolved Hide resolved

python/packages/ag-ui/agent_framework_ag_ui/_message_adapters.py Show resolved Hide resolved

python/packages/ag-ui/agent_framework_ag_ui/_message_adapters.py Show resolved Hide resolved

Copilot and others added 2 commits March 16, 2026 06:12

Merge remote-tracking branch 'origin/main' into agent/fix-4682-1

02861e9

eavanvalkenburg approved these changes Mar 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: Fix _deduplicate_messages catch-all branch dropping valid repeated messages#4716

Python: Fix _deduplicate_messages catch-all branch dropping valid repeated messages#4716
moonbox3 wants to merge 6 commits intomicrosoft:mainfrom
moonbox3:agent/fix-4682-1

moonbox3 commented Mar 16, 2026

Uh oh!

moonbox3 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

markwallace-microsoft commented Mar 16, 2026 •

edited

Loading

Uh oh!

moonbox3 left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

moonbox3 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SogoKato commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

moonbox3 commented Mar 16, 2026

Motivation and Context

Description

Contribution Checklist

Uh oh!

moonbox3 left a comment

Choose a reason for hiding this comment

Automated Code Review

✓ Correctness

✓ Security Reliability

✓ Test Coverage

✗ Design Approach

Flagged Issues

Suggestions

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

markwallace-microsoft commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Python Unit Test Overview

Uh oh!

moonbox3 left a comment

Choose a reason for hiding this comment

Automated Code Review

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

moonbox3 left a comment

Choose a reason for hiding this comment

Automated Code Review

✓ Correctness

✓ Security Reliability

✓ Test Coverage

✗ Design Approach

Flagged Issues

Suggestions

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SogoKato commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

markwallace-microsoft commented Mar 16, 2026 •

edited

Loading