Python: Fix HTML-like tags being dropped in ChatHistory.from_rendered_prompt by jgarrison929 · Pull Request #13659 · microsoft/semantic-kernel

jgarrison929 · 2026-03-13T14:21:31Z

Motivation and Context

When a prompt contains valid XML elements like <p>, <div>, <b>, etc., the XML parser in ChatHistory.from_rendered_prompt treats them as elements but silently discards their content.

Description

This PR fixes the issue by serializing unrecognized XML elements back to their string representation using tostring().

After the fix:

Prompts with HTML-like tags are preserved exactly as written
No change in behavior for prompts using recognized chat message tags

Contribution Checklist

The code builds clean without any errors or warnings
Tests have been added that prove the fix works
All existing tests pass

…_prompt When a prompt contained valid XML elements like <p>, <div>, etc., the XML parser in from_rendered_prompt treated them as elements but silently discarded their content because they weren't recognized as chat message tags (message, history). This fix serializes unrecognized elements back to their XML string representation using tostring(), preserving both the tags and their content in the final prompt text. Added regression tests for: - Single HTML-like tag (<p>) - Nested HTML-like tags (<div><p>...</p></div>) Fixes microsoft#13632

Copilot

Pull request overview

Fixes ChatHistory.from_rendered_prompt so valid XML elements that are not recognized as chat message tags (for example <p>, <div>, <b>) are preserved in the resulting user prompt, instead of being silently dropped.

Changes:

Update ChatHistory.from_rendered_prompt to accumulate free text and serialize unrecognized XML elements back into text using tostring().
Add unit tests to cover prompts containing HTML like tags and nested HTML like tags.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
python/semantic_kernel/contents/chat_history.py	Preserve unrecognized XML elements by serializing them back into the prompt text while parsing rendered prompts.
python/tests/unit/contents/test_chat_history.py	Add regression coverage for prompts containing HTML like tags.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-20T02:39:39Z

python/semantic_kernel/contents/chat_history.py

+                flush_pending_text()
                messages.append(ChatMessageContent.from_element(item))
+                # Tail text after a recognized message element is treated as USER content
+                if item.tail and item.tail.strip():
+                    messages.append(ChatMessageContent(role=AuthorRole.USER, content=unescape(item.tail.strip())))
            elif item.tag == CHAT_HISTORY_TAG:
+                flush_pending_text()


flush_pending_text() defaults to SYSTEM, so when an unrecognized element (for example
) appears between already-parsed chat messages and a subsequent element, the preserved HTML gets emitted as a SYSTEM message. That is inconsistent with how inter-message text (item.tail) is treated as USER, and can unintentionally escalate user-provided content into the system role. Consider flushing pending text with role=USER when messages already exist (and only SYSTEM when no messages have been emitted yet).

Suggested change

flush_pending_text()

messages.append(ChatMessageContent.from_element(item))

# Tail text after a recognized message element is treated as USER content

if item.tail and item.tail.strip():

messages.append(ChatMessageContent(role=AuthorRole.USER, content=unescape(item.tail.strip())))

elif item.tag == CHAT_HISTORY_TAG:

flush_pending_text()

flush_pending_text(role=AuthorRole.USER if messages else AuthorRole.SYSTEM)

messages.append(ChatMessageContent.from_element(item))

# Tail text after a recognized message element is treated as USER content

if item.tail and item.tail.strip():

messages.append(ChatMessageContent(role=AuthorRole.USER, content=unescape(item.tail.strip())))

elif item.tag == CHAT_HISTORY_TAG:

# Flush any pending text before a structured history block

flush_pending_text(role=AuthorRole.USER if messages else AuthorRole.SYSTEM)

Copilot · 2026-03-20T02:39:40Z

python/semantic_kernel/contents/chat_history.py

+                if item.tail and item.tail.strip():
+                    messages.append(ChatMessageContent(role=AuthorRole.USER, content=unescape(item.tail.strip())))
            elif item.tag == CHAT_HISTORY_TAG:
+                flush_pending_text()


Same role issue as the branch: flushing pending_text_parts before a <chat_history> element currently uses the default SYSTEM role, even if earlier messages have already been parsed. If pending text contains preserved HTML that occurs between message blocks, it will become a SYSTEM message; it should likely be USER whenever messages already exist (to match tail handling).

Suggested change

flush_pending_text()

flush_pending_text(role=AuthorRole.USER if messages else AuthorRole.SYSTEM)

Copilot · 2026-03-20T02:39:40Z

python/tests/unit/contents/test_chat_history.py

+    # Prompt with HTML-like tags
+    prompt_with_html = 'Translate this: "<p>What is your name?</p>"'
+    # Same prompt without HTML tags
+    prompt_without_html = 'Translate this: "What is your name?"'
+
+    history_with_html = ChatHistory.from_rendered_prompt(prompt_with_html)
+    history_without_html = ChatHistory.from_rendered_prompt(prompt_without_html)
+
+    # Both should produce a single user message
+    assert len(history_with_html.messages) == 1
+    assert len(history_without_html.messages) == 1
+
+    # Both should contain the question text
+    assert "What is your name?" in history_with_html.messages[0].content
+    assert "What is your name?" in history_without_html.messages[0].content
+
+    # The HTML version should preserve the <p> tags
+    assert "<p>" in history_with_html.messages[0].content
+    assert "</p>" in history_with_html.messages[0].content


These tests only assert that the HTML tags and inner text appear somewhere in the parsed content, but they do not verify that content surrounding the tags (for example, trailing text after
such as the closing quote in this prompt) is preserved. Strengthen the regression by asserting the full expected message content (or at least asserting the suffix/prefix around the tag), so a future regression that drops element tail text or punctuation will be caught.

jgarrison929 requested a review from a team as a code owner March 13, 2026 14:21

moonbox3 requested a review from Copilot March 20, 2026 02:34

Copilot started reviewing on behalf of moonbox3 March 20, 2026 02:35 View session

Copilot AI reviewed Mar 20, 2026

View reviewed changes

jgarrison929 closed this Mar 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: Fix HTML-like tags being dropped in ChatHistory.from_rendered_prompt#13659

Python: Fix HTML-like tags being dropped in ChatHistory.from_rendered_prompt#13659
jgarrison929 wants to merge 1 commit intomicrosoft:mainfrom
jgarrison929:fix/python-html-tag-template-rendering

jgarrison929 commented Mar 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	flush_pending_text()
	flush_pending_text(role=AuthorRole.USER if messages else AuthorRole.SYSTEM)

Conversation

jgarrison929 commented Mar 13, 2026

Motivation and Context

Description

Contribution Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants