[Repo Assist] Encode emoji/high-Unicode as HTML entities in all HTML output paths by github-actions[bot] · Pull Request #986 · fsprojects/FSharp.Formatting

github-actions · 2026-02-23T02:24:33Z

🤖 This is an automated PR from Repo Assist, implementing the code review suggestions from PR #965.

Addresses the three unresolved review threads in #965.

What this PR does

Implements the code review suggestions from PR #965:

1. Consistent emoji encoding across all HTML output paths (Thread 3)

PR #965 added encodeHighUnicode but only applied it to Literal spans. This PR applies it inside htmlEncode itself so that InlineCode, CodeBlock, attribute values (alt, title), and all other paths automatically encode emoji/high-Unicode as numeric HTML entities — not just Literal spans.

Literal spans (which bypass htmlEncode) also get encodeHighUnicode applied directly.

2. Fix misleading comment (Thread 4)

The original comment said "Encode all characters outside BMP (>= 0x10000) as they're typically emojis". Rewrote to clarify the actual rationale: encoding non-BMP code points and selected BMP ranges to avoid output encoding issues, since many non-emoji scripts also live outside the BMP.

3. Variation selector encoding + test assertion (Thread 5)

The ⚠️ sequence consists of ⚠ (U+26A0) followed by variation selector VS16 (U+FE0F). The review noted that the test checked for ⚠ (⚠) but not for ️ (VS16). This PR:

Extends the encoded BMP ranges to include variation selectors U+FE00–U+FE0F
Adds the assertion html |> should contain "️" to the emoji test

Note: The PR also narrows the broad U+2000–U+2BFF range from #965 to targeted emoji/symbol blocks only (U+2600–U+26FF, U+2700–U+27BF, U+2B00–U+2BFF). This avoids breaking CommonMark spec tests that expect box-drawing characters (e.g., U+255C ╜) to pass through unchanged in <pre><code> blocks.

Tests

All 182 markdown tests pass, including the CommonMark spec suite.

New tests added:

Emojis are encoded as HTML numeric entities — verifies 🎉🚧⭐⚠️✅ all encode correctly, including the variation selector
Regular text without emojis is not modified — verifies Cyrillic and Chinese characters are not touched
List without blank line after heading — covers the Conversions from fsx comments markdown to html containing emoji (on AI-generated documents) #964 scenario

Test Status

✅ dotnet test tests/FSharp.Markdown.Tests/FSharp.Markdown.Tests.fsproj — 182 passed, 0 failed

Generated by Repo Assist

To install this workflow, run gh aw add githubnext/agentics/workflows/repo-assist.md@828ac109efb43990f59475cbfce90ede5546586c. View source at https://github.com/githubnext/agentics/tree/828ac109efb43990f59475cbfce90ede5546586c/workflows/repo-assist.md.

- Add encodeHighUnicode function encoding non-BMP code points, emoji BMP blocks (U+2600-U+26FF, U+2700-U+27BF, U+2B00-U+2BFF), and variation selectors (U+FE00-U+FE0F) as numeric HTML entities - Apply encodeHighUnicode inside htmlEncode so InlineCode, CodeBlock, and attribute values all encode emoji consistently (not just Literal spans) - Apply encodeHighUnicode to Literal spans which bypass htmlEncode - Fix misleading comment: encode is for output encoding safety, not just emojis - Add test asserting variation selector U+FE0F is also encoded - Add tests for emoji encoding, regular text passthrough, and list after heading Addresses review suggestions from PR #965. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions · 2026-02-23T02:24:35Z

✅ Pull request created: #986

nojaf · 2026-02-23T09:25:28Z

@dsyme as this is in a draft state, what should I make out of this?
Does this need a review? Is this unfinished?
It is a little unclear from afar who is supposed to make the next move here.

dsyme · 2026-02-23T11:18:09Z

@nojaf This PR was the result of me asking Repo Assist to fix up #965. This comment

#965 (comment)

It's not indicative of what Repo Assist would normally do autonomously (it would usually start with an issue if at all, and never work on something so obscure)

It also incorrectly made a new PR targeting main, effectively a replacement PR, instead of fixing up the PR itself. I'll fix that for future.

Next step is I'll manually merge this into OG PR #965 and we can decide what to do there.

dsyme · 2026-02-23T11:20:52Z

Oh I see, the OG PR came from a fork so Repo Assist couldn't fix it up.

github-actions bot added automation repo-assist labels Feb 23, 2026

This was referenced Feb 23, 2026

Unicode symbols in Markdown #90

Closed

[Repo Assist] Monthly Activity 2026-02 #973

Open

dsyme mentioned this pull request Feb 23, 2026

Markdown to html improvement #965

Closed

dsyme closed this Feb 23, 2026

dsyme reopened this Feb 23, 2026

dsyme closed this Feb 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Repo Assist] Encode emoji/high-Unicode as HTML entities in all HTML output paths#986

[Repo Assist] Encode emoji/high-Unicode as HTML entities in all HTML output paths#986
github-actions[bot] wants to merge 1 commit intomainfrom
repo-assist/fix-965-encode-unicode-fa4cb6620bc76382

github-actions bot commented Feb 23, 2026

Uh oh!

github-actions bot commented Feb 23, 2026

Uh oh!

nojaf commented Feb 23, 2026

Uh oh!

dsyme commented Feb 23, 2026 •

edited

Loading

Uh oh!

dsyme commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

github-actions bot commented Feb 23, 2026

What this PR does

1. Consistent emoji encoding across all HTML output paths (Thread 3)

2. Fix misleading comment (Thread 4)

3. Variation selector encoding + test assertion (Thread 5)

Tests

Test Status

Uh oh!

github-actions bot commented Feb 23, 2026

Uh oh!

nojaf commented Feb 23, 2026

Uh oh!

dsyme commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dsyme commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dsyme commented Feb 23, 2026 •

edited

Loading