-
Notifications
You must be signed in to change notification settings - Fork 154
fix(super-editor): emit OOXML property children in canonical schema order #3702
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
dbrogan-OSI
wants to merge
1
commit into
superdoc-dev:main
Choose a base branch
from
dbrogan-OSI:fix/exporter-ooxml-property-child-order
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
237 changes: 237 additions & 0 deletions
237
...ages/super-editor/src/editors/v1/core/super-converter/v3/handlers/ooxml-property-order.js
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,237 @@ | ||
| // @ts-check | ||
| /** | ||
| * Canonical child-element order for OOXML fixed-sequence property containers. | ||
| * | ||
| * ECMA-376 models these property containers (`w:rPr`, `w:pPr`, `w:numPr`, ...) | ||
| * as `xsd:sequence` complex types (`CT_RPr`, `CT_PPr`, `CT_NumPr`, ...), which | ||
| * means their child elements MUST appear in a fixed order. Desktop Word | ||
| * silently repairs out-of-order children on open, but Word for the web rejects | ||
| * the whole package and opens the document read-only as "corrupt". | ||
| * | ||
| * The export pipeline emits children in JavaScript object/array insertion order | ||
| * (`Object.keys(attrs)`, mark-array order), which is not guaranteed to match the | ||
| * schema sequence. This table lets the emit paths stable-sort children back into | ||
| * canonical order regardless of how the ProseMirror attributes were built. | ||
| * | ||
| * Keys are container local names (namespace prefix stripped); values are the | ||
| * ECMA-376 child sequences as local names. Sequences are sourced from ECMA-376 | ||
| * (CT_RPr §17.3.2.28, CT_PPr §17.3.1.26, CT_NumPr §17.3.1.19, CT_TrPr §17.4.82, | ||
| * CT_TcPr §17.4.70, CT_TblPr §17.4.60, CT_TblPrEx §17.4.61, CT_SectPr §17.6.17). | ||
| * | ||
| * @type {Readonly<Record<string, string[]>>} | ||
| */ | ||
| export const OOXML_PROPERTY_CHILD_ORDER = Object.freeze({ | ||
| rPr: [ | ||
| 'rStyle', | ||
| 'rFonts', | ||
| 'b', | ||
| 'bCs', | ||
| 'i', | ||
| 'iCs', | ||
| 'caps', | ||
| 'smallCaps', | ||
| 'strike', | ||
| 'dstrike', | ||
| 'outline', | ||
| 'shadow', | ||
| 'emboss', | ||
| 'imprint', | ||
| 'noProof', | ||
| 'snapToGrid', | ||
| 'vanish', | ||
| 'webHidden', | ||
| 'color', | ||
| 'spacing', | ||
| 'w', | ||
| 'kern', | ||
| 'position', | ||
| 'sz', | ||
| 'szCs', | ||
| 'highlight', | ||
| 'u', | ||
| 'effect', | ||
| 'bdr', | ||
| 'shd', | ||
| 'fitText', | ||
| 'vertAlign', | ||
| 'rtl', | ||
| 'cs', | ||
| 'em', | ||
| 'lang', | ||
| 'eastAsianLayout', | ||
| 'specVanish', | ||
| 'oMath', | ||
| 'rPrChange', | ||
| ], | ||
| pPr: [ | ||
| 'pStyle', | ||
| 'keepNext', | ||
| 'keepLines', | ||
| 'pageBreakBefore', | ||
| 'framePr', | ||
| 'widowControl', | ||
| 'numPr', | ||
| 'suppressLineNumbers', | ||
| 'pBdr', | ||
| 'shd', | ||
| 'tabs', | ||
| 'suppressAutoHyphens', | ||
| 'kinsoku', | ||
| 'wordWrap', | ||
| 'overflowPunct', | ||
| 'topLinePunct', | ||
| 'autoSpaceDE', | ||
| 'autoSpaceDN', | ||
| 'bidi', | ||
| 'adjustRightInd', | ||
| 'snapToGrid', | ||
| 'spacing', | ||
| 'ind', | ||
| 'contextualSpacing', | ||
| 'mirrorIndents', | ||
| 'suppressOverlap', | ||
| 'jc', | ||
| 'textDirection', | ||
| 'textAlignment', | ||
| 'textboxTightWrap', | ||
| 'outlineLvl', | ||
| 'divId', | ||
| 'cnfStyle', | ||
| 'rPr', | ||
| 'sectPr', | ||
| 'pPrChange', | ||
| ], | ||
| numPr: ['ilvl', 'numId', 'numberingChange', 'ins'], | ||
| trPr: [ | ||
| 'cnfStyle', | ||
| 'divId', | ||
| 'gridBefore', | ||
| 'gridAfter', | ||
| 'wBefore', | ||
| 'wAfter', | ||
| 'cantSplit', | ||
| 'trHeight', | ||
| 'tblHeader', | ||
| 'tblCellSpacing', | ||
| 'jc', | ||
| 'hidden', | ||
| 'ins', | ||
| 'del', | ||
| 'trPrChange', | ||
| ], | ||
| tcPr: [ | ||
| 'cnfStyle', | ||
| 'tcW', | ||
| 'gridSpan', | ||
| 'hMerge', | ||
| 'vMerge', | ||
| 'tcBorders', | ||
| 'shd', | ||
| 'noWrap', | ||
| 'tcMar', | ||
| 'textDirection', | ||
| 'tcFitText', | ||
| 'vAlign', | ||
| 'hideMark', | ||
| 'headers', | ||
| 'cellIns', | ||
| 'cellDel', | ||
| 'cellMerge', | ||
| 'tcPrChange', | ||
| ], | ||
| tblPr: [ | ||
| 'tblStyle', | ||
| 'tblpPr', | ||
| 'tblOverlap', | ||
| 'bidiVisual', | ||
| 'tblStyleRowBandSize', | ||
| 'tblStyleColBandSize', | ||
| 'tblW', | ||
| 'jc', | ||
| 'tblCellSpacing', | ||
| 'tblInd', | ||
| 'tblBorders', | ||
| 'shd', | ||
| 'tblLayout', | ||
| 'tblCellMar', | ||
| 'tblLook', | ||
| 'tblCaption', | ||
| 'tblDescription', | ||
| 'tblPrChange', | ||
| ], | ||
| tblPrEx: ['tblW', 'jc', 'tblCellSpacing', 'tblInd', 'tblBorders', 'shd', 'tblLayout', 'tblCellMar', 'tblLook'], | ||
| sectPr: [ | ||
| 'headerReference', | ||
| 'footerReference', | ||
| 'footnotePr', | ||
| 'endnotePr', | ||
| 'type', | ||
| 'pgSz', | ||
| 'pgMar', | ||
| 'paperSrc', | ||
| 'pgBorders', | ||
| 'lnNumType', | ||
| 'pgNumType', | ||
| 'cols', | ||
| 'formProt', | ||
| 'vAlign', | ||
| 'noEndnote', | ||
| 'titlePg', | ||
| 'textDirection', | ||
| 'bidi', | ||
| 'rtlGutter', | ||
| 'docGrid', | ||
| 'printerSettings', | ||
| 'sectPrChange', | ||
| ], | ||
| }); | ||
|
|
||
| /** | ||
| * Strips an OOXML namespace prefix from an element name (e.g. `w:rFonts` → `rFonts`). | ||
| * @param {string|undefined|null} name | ||
| * @returns {string|undefined|null} | ||
| */ | ||
| const stripPrefix = (name) => | ||
| typeof name === 'string' && name.includes(':') ? name.slice(name.indexOf(':') + 1) : name; | ||
|
|
||
| /** | ||
| * Returns the canonical child sequence for a property container, or undefined | ||
| * if the container is not a known fixed-sequence type. | ||
| * @param {string} containerName Container element name, with or without prefix (e.g. `w:rPr` or `rPr`). | ||
| * @returns {string[]|undefined} | ||
| */ | ||
| export function getCanonicalChildOrder(containerName) { | ||
| const key = stripPrefix(containerName); | ||
| return key ? OOXML_PROPERTY_CHILD_ORDER[key] : undefined; | ||
| } | ||
|
|
||
| /** | ||
| * Stable-sorts the child elements of a fixed-sequence OOXML property container | ||
| * into ECMA-376 canonical order. | ||
| * | ||
| * Known children are ordered by their position in the canonical sequence. | ||
| * Unknown/extension elements (e.g. `w14:*`, `mc:AlternateContent`) sort AFTER | ||
| * all known children, preserving their original relative order. If the | ||
| * container has no canonical sequence, the input is returned unchanged. | ||
| * | ||
| * @template {{ name?: string }} T | ||
| * @param {string} containerName The container element name (e.g. `w:rPr`). | ||
| * @param {T[]} elements The emitted child elements. | ||
| * @returns {T[]} The elements ordered into canonical schema sequence. | ||
| */ | ||
| export function sortPropertyChildElements(containerName, elements) { | ||
| if (!Array.isArray(elements) || elements.length < 2) return elements; | ||
| const order = getCanonicalChildOrder(containerName); | ||
| if (!order) return elements; | ||
|
|
||
| const rankByName = new Map(order.map((name, index) => [name, index])); | ||
| const UNKNOWN_RANK = Number.MAX_SAFE_INTEGER; | ||
|
|
||
| return elements | ||
| .map((element, index) => { | ||
| const rank = rankByName.get(stripPrefix(element?.name)); | ||
| return { element, index, rank: rank == null ? UNKNOWN_RANK : rank }; | ||
| }) | ||
| .sort((a, b) => a.rank - b.rank || a.index - b.index) | ||
| .map((entry) => entry.element); | ||
| } | ||
122 changes: 122 additions & 0 deletions
122
...super-editor/src/editors/v1/core/super-converter/v3/handlers/ooxml-property-order.test.js
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,122 @@ | ||
| import { describe, it, expect, vi } from 'vitest'; | ||
| import { sortPropertyChildElements, getCanonicalChildOrder } from './ooxml-property-order.js'; | ||
| import { translator as numPrTranslator } from './w/numPr/numPr-translator.js'; | ||
| import { translator as trPrTranslator } from './w/trPr/trPr-translator.js'; | ||
| import { translator as rPrTranslator } from './w/rpr/rpr-translator.js'; | ||
|
|
||
| // Some property translators transitively import the exporter; stub it to keep | ||
| // this unit suite isolated (mirrors numPr-translator.test.js). | ||
| vi.mock('@converter/exporter', () => ({ | ||
| exportSchemaToJson: vi.fn(), | ||
| createTrackStyleMark: vi.fn(), | ||
| })); | ||
|
|
||
| const names = (node) => node.elements.map((el) => el.name); | ||
|
|
||
| describe('sortPropertyChildElements', () => { | ||
| it('orders w:rPr children by canonical CT_RPr sequence (e.g. rFonts before color)', () => { | ||
| const elements = [{ name: 'w:color' }, { name: 'w:sz' }, { name: 'w:rFonts' }, { name: 'w:b' }]; | ||
| expect(sortPropertyChildElements('w:rPr', elements).map((e) => e.name)).toEqual([ | ||
| 'w:rFonts', | ||
| 'w:b', | ||
| 'w:color', | ||
| 'w:sz', | ||
| ]); | ||
| }); | ||
|
|
||
| it('orders w:pPr children by canonical CT_PPr sequence', () => { | ||
| const elements = [{ name: 'w:jc' }, { name: 'w:spacing' }, { name: 'w:numPr' }, { name: 'w:pStyle' }]; | ||
| expect(sortPropertyChildElements('w:pPr', elements).map((e) => e.name)).toEqual([ | ||
| 'w:pStyle', | ||
| 'w:numPr', | ||
| 'w:spacing', | ||
| 'w:jc', | ||
| ]); | ||
| }); | ||
|
|
||
| it('orders w:numPr children so w:ilvl precedes w:numId', () => { | ||
| const elements = [{ name: 'w:numId' }, { name: 'w:ilvl' }]; | ||
| expect(sortPropertyChildElements('w:numPr', elements).map((e) => e.name)).toEqual(['w:ilvl', 'w:numId']); | ||
| }); | ||
|
|
||
| it('orders w:trPr children so w:trHeight precedes w:hidden', () => { | ||
| const elements = [{ name: 'w:hidden' }, { name: 'w:trHeight' }, { name: 'w:cnfStyle' }]; | ||
| expect(sortPropertyChildElements('w:trPr', elements).map((e) => e.name)).toEqual([ | ||
| 'w:cnfStyle', | ||
| 'w:trHeight', | ||
| 'w:hidden', | ||
| ]); | ||
| }); | ||
|
|
||
| it('orders w:tcPr children by canonical CT_TcPr sequence', () => { | ||
| const elements = [{ name: 'w:vAlign' }, { name: 'w:tcW' }, { name: 'w:gridSpan' }]; | ||
| expect(sortPropertyChildElements('w:tcPr', elements).map((e) => e.name)).toEqual([ | ||
| 'w:tcW', | ||
| 'w:gridSpan', | ||
| 'w:vAlign', | ||
| ]); | ||
| }); | ||
|
|
||
| it('orders w:tblPr children so w:tblCellMar precedes w:tblLook', () => { | ||
| const elements = [{ name: 'w:tblLook' }, { name: 'w:tblCellMar' }, { name: 'w:tblStyle' }]; | ||
| expect(sortPropertyChildElements('w:tblPr', elements).map((e) => e.name)).toEqual([ | ||
| 'w:tblStyle', | ||
| 'w:tblCellMar', | ||
| 'w:tblLook', | ||
| ]); | ||
| }); | ||
|
|
||
| it('places unknown/extension elements after known children, preserving their relative order (stable)', () => { | ||
| const elements = [ | ||
| { name: 'w14:ligatures' }, | ||
| { name: 'w:color' }, | ||
| { name: 'mc:AlternateContent' }, | ||
| { name: 'w:rFonts' }, | ||
| ]; | ||
| expect(sortPropertyChildElements('w:rPr', elements).map((e) => e.name)).toEqual([ | ||
| 'w:rFonts', | ||
| 'w:color', | ||
| 'w14:ligatures', | ||
| 'mc:AlternateContent', | ||
| ]); | ||
| }); | ||
|
|
||
| it('returns the input unchanged for containers without a canonical sequence', () => { | ||
| const elements = [{ name: 'w:bottom' }, { name: 'w:top' }]; | ||
| expect(sortPropertyChildElements('w:tcMar', elements)).toBe(elements); | ||
| }); | ||
|
|
||
| it('returns the input unchanged when there are fewer than two elements', () => { | ||
| const single = [{ name: 'w:numId' }]; | ||
| expect(sortPropertyChildElements('w:numPr', single)).toBe(single); | ||
| }); | ||
|
|
||
| it('exposes canonical sequences via getCanonicalChildOrder (prefix-insensitive)', () => { | ||
| expect(getCanonicalChildOrder('w:numPr')).toEqual(['ilvl', 'numId', 'numberingChange', 'ins']); | ||
| expect(getCanonicalChildOrder('numPr')).toEqual(['ilvl', 'numId', 'numberingChange', 'ins']); | ||
| expect(getCanonicalChildOrder('w:notAContainer')).toBeUndefined(); | ||
| }); | ||
| }); | ||
|
|
||
| describe('property container translators emit children in canonical order', () => { | ||
| it('w:numPr emits w:ilvl before w:numId even when attrs are set numId-first', () => { | ||
| const result = numPrTranslator.decode({ | ||
| node: { attrs: { numberingProperties: { numId: 7, ilvl: 2 } } }, | ||
| }); | ||
| expect(names(result)).toEqual(['w:ilvl', 'w:numId']); | ||
| }); | ||
|
|
||
| it('w:trPr emits w:trHeight before w:hidden even when attrs are set hidden-first', () => { | ||
| const result = trPrTranslator.decode({ | ||
| node: { attrs: { tableRowProperties: { hidden: true, rowHeight: { value: 240, rule: 'atLeast' } } } }, | ||
| }); | ||
| expect(names(result)).toEqual(['w:trHeight', 'w:hidden']); | ||
| }); | ||
|
|
||
| it('w:rPr emits w:rFonts before w:color even when attrs are set color-first', () => { | ||
| const result = rPrTranslator.decode({ | ||
| node: { attrs: { runProperties: { color: { val: 'FF0000' }, fontFamily: { ascii: 'Arial' } } } }, | ||
| }); | ||
| expect(names(result)).toEqual(['w:rFonts', 'w:color']); | ||
| }); | ||
| }); |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When
w:rPris the paragraph-mark run properties underw:pPr, the same rPr translator can emittrackInsert/trackDeleteasw:ins/w:del; those children belong before the normal run-property sequence forCT_ParaRPr. Because this new order table omits them, the sorter treats them as unknown and moves them after known props, so a valid tracked paragraph mark such as<w:rPr><w:ins/><w:b/></w:rPr>round-trips as<w:b/><w:ins/>, which violates the fixed OOXML sequence and can still make Word reject the document.Useful? React with 👍 / 👎.