Skip to content

[Repo Assist] Encode emoji/high-Unicode as HTML entities in all HTML output paths#986

Closed
github-actions[bot] wants to merge 1 commit intomainfrom
repo-assist/fix-965-encode-unicode-fa4cb6620bc76382
Closed

[Repo Assist] Encode emoji/high-Unicode as HTML entities in all HTML output paths#986
github-actions[bot] wants to merge 1 commit intomainfrom
repo-assist/fix-965-encode-unicode-fa4cb6620bc76382

Conversation

@github-actions
Copy link
Contributor

🤖 This is an automated PR from Repo Assist, implementing the code review suggestions from PR #965.

Addresses the three unresolved review threads in #965.

What this PR does

Implements the code review suggestions from PR #965:

1. Consistent emoji encoding across all HTML output paths (Thread 3)

PR #965 added encodeHighUnicode but only applied it to Literal spans. This PR applies it inside htmlEncode itself so that InlineCode, CodeBlock, attribute values (alt, title), and all other paths automatically encode emoji/high-Unicode as numeric HTML entities — not just Literal spans.

Literal spans (which bypass htmlEncode) also get encodeHighUnicode applied directly.

2. Fix misleading comment (Thread 4)

The original comment said "Encode all characters outside BMP (>= 0x10000) as they're typically emojis". Rewrote to clarify the actual rationale: encoding non-BMP code points and selected BMP ranges to avoid output encoding issues, since many non-emoji scripts also live outside the BMP.

3. Variation selector encoding + test assertion (Thread 5)

The ⚠️ sequence consists of (U+26A0) followed by variation selector VS16 (U+FE0F). The review noted that the test checked for (⚠) but not for (VS16). This PR:

  • Extends the encoded BMP ranges to include variation selectors U+FE00–U+FE0F
  • Adds the assertion html |> should contain "️" to the emoji test

Note: The PR also narrows the broad U+2000–U+2BFF range from #965 to targeted emoji/symbol blocks only (U+2600–U+26FF, U+2700–U+27BF, U+2B00–U+2BFF). This avoids breaking CommonMark spec tests that expect box-drawing characters (e.g., U+255C ) to pass through unchanged in <pre><code> blocks.

Tests

All 182 markdown tests pass, including the CommonMark spec suite.

New tests added:

Test Status

dotnet test tests/FSharp.Markdown.Tests/FSharp.Markdown.Tests.fsproj182 passed, 0 failed

Generated by Repo Assist

To install this workflow, run gh aw add githubnext/agentics/workflows/repo-assist.md@828ac109efb43990f59475cbfce90ede5546586c. View source at https://github.com/githubnext/agentics/tree/828ac109efb43990f59475cbfce90ede5546586c/workflows/repo-assist.md.

- Add encodeHighUnicode function encoding non-BMP code points,
  emoji BMP blocks (U+2600-U+26FF, U+2700-U+27BF, U+2B00-U+2BFF),
  and variation selectors (U+FE00-U+FE0F) as numeric HTML entities
- Apply encodeHighUnicode inside htmlEncode so InlineCode, CodeBlock,
  and attribute values all encode emoji consistently (not just Literal spans)
- Apply encodeHighUnicode to Literal spans which bypass htmlEncode
- Fix misleading comment: encode is for output encoding safety, not
  just emojis
- Add test asserting variation selector U+FE0F is also encoded
- Add tests for emoji encoding, regular text passthrough, and list
  after heading

Addresses review suggestions from PR #965.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Contributor Author

✅ Pull request created: #986

@nojaf
Copy link
Collaborator

nojaf commented Feb 23, 2026

@dsyme as this is in a draft state, what should I make out of this?
Does this need a review? Is this unfinished?
It is a little unclear from afar who is supposed to make the next move here.

@dsyme
Copy link
Contributor

dsyme commented Feb 23, 2026

@nojaf This PR was the result of me asking Repo Assist to fix up #965. This comment

#965 (comment)

It's not indicative of what Repo Assist would normally do autonomously (it would usually start with an issue if at all, and never work on something so obscure)

It also incorrectly made a new PR targeting main, effectively a replacement PR, instead of fixing up the PR itself. I'll fix that for future.

Next step is I'll manually merge this into OG PR #965 and we can decide what to do there.

@dsyme dsyme closed this Feb 23, 2026
@dsyme dsyme reopened this Feb 23, 2026
@dsyme
Copy link
Contributor

dsyme commented Feb 23, 2026

Oh I see, the OG PR came from a fork so Repo Assist couldn't fix it up.

@dsyme dsyme closed this Feb 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants