Skip to content

Python: Fix HTML-like tags being dropped in ChatHistory.from_rendered_prompt#13659

Closed
jgarrison929 wants to merge 1 commit intomicrosoft:mainfrom
jgarrison929:fix/python-html-tag-template-rendering
Closed

Python: Fix HTML-like tags being dropped in ChatHistory.from_rendered_prompt#13659
jgarrison929 wants to merge 1 commit intomicrosoft:mainfrom
jgarrison929:fix/python-html-tag-template-rendering

Conversation

@jgarrison929
Copy link

Motivation and Context

Fixes #13632

When a prompt contains valid XML elements like <p>, <div>, <b>, etc., the XML parser in ChatHistory.from_rendered_prompt treats them as elements but silently discards their content.

Description

This PR fixes the issue by serializing unrecognized XML elements back to their string representation using tostring().

After the fix:

  • Prompts with HTML-like tags are preserved exactly as written
  • No change in behavior for prompts using recognized chat message tags

Contribution Checklist

  • The code builds clean without any errors or warnings
  • Tests have been added that prove the fix works
  • All existing tests pass

…_prompt

When a prompt contained valid XML elements like <p>, <div>, etc., the XML
parser in from_rendered_prompt treated them as elements but silently
discarded their content because they weren't recognized as chat message
tags (message, history).

This fix serializes unrecognized elements back to their XML string
representation using tostring(), preserving both the tags and their
content in the final prompt text.

Added regression tests for:
- Single HTML-like tag (<p>)
- Nested HTML-like tags (<div><p>...</p></div>)

Fixes microsoft#13632
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes ChatHistory.from_rendered_prompt so valid XML elements that are not recognized as chat message tags (for example <p>, <div>, <b>) are preserved in the resulting user prompt, instead of being silently dropped.

Changes:

  • Update ChatHistory.from_rendered_prompt to accumulate free text and serialize unrecognized XML elements back into text using tostring().
  • Add unit tests to cover prompts containing HTML like tags and nested HTML like tags.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
python/semantic_kernel/contents/chat_history.py Preserve unrecognized XML elements by serializing them back into the prompt text while parsing rendered prompts.
python/tests/unit/contents/test_chat_history.py Add regression coverage for prompts containing HTML like tags.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +366 to +372
flush_pending_text()
messages.append(ChatMessageContent.from_element(item))
# Tail text after a recognized message element is treated as USER content
if item.tail and item.tail.strip():
messages.append(ChatMessageContent(role=AuthorRole.USER, content=unescape(item.tail.strip())))
elif item.tag == CHAT_HISTORY_TAG:
flush_pending_text()
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flush_pending_text() defaults to SYSTEM, so when an unrecognized element (for example

) appears between already-parsed chat messages and a subsequent element, the preserved HTML gets emitted as a SYSTEM message. That is inconsistent with how inter-message text (item.tail) is treated as USER, and can unintentionally escalate user-provided content into the system role. Consider flushing pending text with role=USER when messages already exist (and only SYSTEM when no messages have been emitted yet).

Suggested change
flush_pending_text()
messages.append(ChatMessageContent.from_element(item))
# Tail text after a recognized message element is treated as USER content
if item.tail and item.tail.strip():
messages.append(ChatMessageContent(role=AuthorRole.USER, content=unescape(item.tail.strip())))
elif item.tag == CHAT_HISTORY_TAG:
flush_pending_text()
flush_pending_text(role=AuthorRole.USER if messages else AuthorRole.SYSTEM)
messages.append(ChatMessageContent.from_element(item))
# Tail text after a recognized message element is treated as USER content
if item.tail and item.tail.strip():
messages.append(ChatMessageContent(role=AuthorRole.USER, content=unescape(item.tail.strip())))
elif item.tag == CHAT_HISTORY_TAG:
# Flush any pending text before a structured history block
flush_pending_text(role=AuthorRole.USER if messages else AuthorRole.SYSTEM)

Copilot uses AI. Check for mistakes.
if item.tail and item.tail.strip():
messages.append(ChatMessageContent(role=AuthorRole.USER, content=unescape(item.tail.strip())))
elif item.tag == CHAT_HISTORY_TAG:
flush_pending_text()
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same role issue as the branch: flushing pending_text_parts before a <chat_history> element currently uses the default SYSTEM role, even if earlier messages have already been parsed. If pending text contains preserved HTML that occurs between message blocks, it will become a SYSTEM message; it should likely be USER whenever messages already exist (to match tail handling).

Suggested change
flush_pending_text()
flush_pending_text(role=AuthorRole.USER if messages else AuthorRole.SYSTEM)

Copilot uses AI. Check for mistakes.
Comment on lines +603 to +621
# Prompt with HTML-like tags
prompt_with_html = 'Translate this: "<p>What is your name?</p>"'
# Same prompt without HTML tags
prompt_without_html = 'Translate this: "What is your name?"'

history_with_html = ChatHistory.from_rendered_prompt(prompt_with_html)
history_without_html = ChatHistory.from_rendered_prompt(prompt_without_html)

# Both should produce a single user message
assert len(history_with_html.messages) == 1
assert len(history_without_html.messages) == 1

# Both should contain the question text
assert "What is your name?" in history_with_html.messages[0].content
assert "What is your name?" in history_without_html.messages[0].content

# The HTML version should preserve the <p> tags
assert "<p>" in history_with_html.messages[0].content
assert "</p>" in history_with_html.messages[0].content
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests only assert that the HTML tags and inner text appear somewhere in the parsed content, but they do not verify that content surrounding the tags (for example, trailing text after

such as the closing quote in this prompt) is preserved. Strengthen the regression by asserting the full expected message content (or at least asserting the suffix/prefix around the tag), so a future regression that drops element tail text or punctuation will be caught.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python: Bug: HTML tag <p> getting blocked in Semantic Kernel version 1.39.4

2 participants