Add Thai shaper, ported from HarfBuzz#376
Open
lehni wants to merge 1 commit into
Open
Conversation
9b49c12 to
13766bc
Compare
- Decompose SARA AM (U+0E33) into NIKHAHIT + SARA AA and reorder NIKHAHIT past above-base marks, matching `preprocess_text_thai` - Apply PUA tone/vowel shift fallback for legacy fonts without Thai GSUB, mirroring `do_thai_pua_shaping` (above/below state machines + Windows/Mac PUA mapping tables) - Fall back to the buffer's Unicode script in OTLayoutEngine when neither GSUB nor GPOS picks an OT script — lets script-specific shapers run on fonts without matching GSUB/GPOS so the Thai PUA fallback is actually reachable - Fix typo in `UnicodeLayoutEngine` Thai mark classification: `0x0E3D` (unassigned) → `0x0E4D` (NIKHAHIT) - Register `thai` and `'lao '` (the 4-char OT tag with trailing space), add Noto Sans Thai + Noto Sans Lao (OFL) as test fixtures with 8 shaping tests
13766bc to
61bffba
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
fontkit currently has no dedicated Thai shaper — Thai (
thaiscript) falls through toDefaultShaper, which leaves SARA AM (U+0E33) intact. This breaks the GSUB chain rules every modern Thai font ships: with the buffer in[base, tone, SARA_AM]order the tone-mark-shiftingccmplookups don't fire, so e.g.น้ำkeeps the regularuni0E49instead of theuni0E49.smallHarfBuzz produces.This PR ports
hb-ot-shaper-thai.cc:[base, tone, SARA_AM]becomes[base, NIKHAHIT, tone, SARA_AA], which is the shape the font'sccmpwas designed to match.do_thai_pua_shaping: an above/below state machine assigns one ofNOP/SD/SL/SDL/RDactions to each tone or vowel mark, then we remap the codepoint to its Windows or Mac PUA variant if the font ships one. Gated on the absence of a Thai script in GSUB.Also folds in a one-character typo fix in
UnicodeLayoutEngine.js's Thai mark classification:0x0E3D(an unassigned codepoint in the Thai block) was sitting in theAbove_Rightswitch where0x0E4D(NIKHAHIT) belongs. Both NIKHAHIT and the surrounding cases (MAI HAN-AKAT, SARA I/II/UE/UEE, MAITAIKHU, THANTHAKHAT, YAMAKKAN) are classified asTopin Unicode's IndicPositionalCategory.Tests
Four tests added under
test/shaping.js, ported from HarfBuzz'ssara-am.testsand hand-picked Thai phrases. Test fixture is the hinted Noto Sans Thai from googlefonts (SIL OFL). Three of the four exercise the SARA AM path and fail without this shaper registered.Lao (
laoscript) is mapped to the same shaper since the reorder applies identically (codepoints offset by 0x80).Closes #134, closes #133.