Skip to content

HTML API: Preserve raw text contents in serialize#54

Open
sirreal wants to merge 3 commits into
trunkfrom
fix/iframe-noembed-noframes-serialize
Open

HTML API: Preserve raw text contents in serialize#54
sirreal wants to merge 3 commits into
trunkfrom
fix/iframe-noembed-noframes-serialize

Conversation

@sirreal

@sirreal sirreal commented Jun 11, 2026

Copy link
Copy Markdown
Owner

Summary

  • Preserve parser-derived raw-text contents when serializing IFRAME, NOEMBED, and NOFRAMES.
  • Keep markup-like text, character references, and NULL-byte replacement behavior intact.
  • Add fragment normalization and full-document serialization coverage.

Testing

  • Focused HTML API serialization PHPUnit coverage passes.
  • PHPCS pass for the changed HTML API files.
  • codex review --base trunk.

Trac ticket: https://core.trac.wordpress.org/ticket/65372

Use of AI Tools

AI assistance: Yes
Tool(s): Codex
Model(s): GPT-5.5
Used for: PR description cleanup and code review.


This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.

The serializer was discarding the raw-text contents of IFRAME, NOEMBED, and NOFRAMES even though get_modifiable_text() already returns the browser-equivalent raw text for those elements.

Let those elements follow the same raw emission path as SCRIPT and STYLE, preserving contents while retaining existing NUL and newline normalization.

See #65372.
@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Core Committers: Use this line as a base for the props when committing in SVN:

Props jonsurrell.

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

@sirreal sirreal added this to the HTML API confirmed fuzz PRs milestone Jun 17, 2026
@sirreal

sirreal commented Jun 17, 2026

Copy link
Copy Markdown
Owner Author

This seems legitimate:

Input Normalized
<noembed>content</noembed> <noembed></noembed>
<iframe>content</iframe> <iframe></iframe>
<noframes>content</noframes> <noframes></noframes>

The trees are different after normalization, for example:

└─IFRAME
  └─#text content

becomes

└─IFRAME

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant