Add packrat-style memoization to grammar matcher#2295
Merged
curtisman merged 7 commits intomicrosoft:mainfrom May 6, 2026
Merged
Add packrat-style memoization to grammar matcher#2295curtisman merged 7 commits intomicrosoft:mainfrom
curtisman merged 7 commits intomicrosoft:mainfrom
Conversation
Implement success + failure memoization in the grammar matcher to eliminate exponential backtracking. The memo cache records deltas at rule-alternation boundaries and replays them on cache hits, avoiding redundant re-exploration of sub-rule alternatives. Key changes: - MemoCache stores per-rule-set success deltas and failure sentinels - MemoMarkerBacktrack captures entry state; on completion, records a SuccessDelta with relative value/valueId chains for safe replay - MemoReplayBacktrack restores snapshots and applies deltas for cached successes (N-1 frames pushed in reverse order) - Cache key includes position, leading context, pendingWildcard, requireValue, and carrier mode to prevent collisions - Repeat entries, pendingWildcardActive, and null-valueIds entries are excluded from success caching (noSuccessCache flag) - GrammarMatchOptions.memoization flag (default true) controls the feature at the API level Re-enable two previously disabled tests: - grammarMatcherBacktrackPathology.spec.ts: pathological backtracking test (was xit, now it; threshold 3000ms for CI jitter) - grammarFuzz.spec.ts: tail-promote fuzz block (was commented out)
Extract the ~75-line memo cache lookup/replay/marker-push block from
enterRulesAlternation into a standalone memoLookup function that
returns a discriminated result ("failed" | "replayed" | marker |
undefined).
Extract the repeated 3-line predicate
state.valueIds !== null &&
(part.variable !== undefined || usesImplicitDefault(state))
into a requiresValue(state, part) helper, replacing all three
occurrences (matchStringPartWithWildcard, matchStringPartWithoutWildcard,
enterRulesAlternation).
- memoizationCoverage.spec.ts (228 tests): hand-written coverage for memo flag parity, failure/success cache, delta fidelity, valueId rebasing, cache key discrimination, carrier mode replay, noSuccessCache boundaries, replay ordering, pending wildcard deltas, lastMatchedPartInfo, and policy x memo cross-product. - fuzzHarness.ts: new "memo-parity" validation kind that compares matchGrammar results with memoization ON vs OFF, with optional policy cross-product via memoPolicySets config field. - grammarFuzz.spec.ts: 5 new fuzz dimensions (~1920 tests) covering broad features, ruleRef reuse (ruleRefReuseProb: 0.4), nested rules + carriers, policy cross-product, and spacing modes. - fuzzRunner.ts: wire memo-parity into CLI repro-replay switch.
…ppression tests - Rewrite copyValueIdChainWithDelta to single-pass iterative (tail-pointer) instead of collect-then-reverse - Document the always-replaced invariant on lastMatchedPartInfo at all 4 mutation sites (captureSuccessDelta relies on pointer inequality) - Add enterDispatchPart comments explaining why tail-call returns are always true (no parent frame, no memoization) - Add suppression + memo frame preservation tests: wildcardPolicy shortest, optionalPolicy preferTake, repeatPolicy greedy, and all three combined - Update pathology test comment to reflect observed timing
memoCacheKey now returns (index << 5) | flagBits instead of allocating a template-literal string per lookup. The 5 low bits encode leadingSpacingMode (2 bits), pendingWildcard, requireValue, and carrier. index is bounded by request.length (V8 max ~2^28), so the packed value stays well within Number.MAX_SAFE_INTEGER.
When a memoized sub-rule produces multiple success deltas, replay applies each and runs the outer continuation. If the continuation fails, the delta's suffix-state key (endOffset, spacingModes, pendingWildcardOffset) is recorded in a shared failedSuffixKeys set. Subsequent replay deltas with the same key are skipped: the suffix depends only on parse position and spacing/wildcard state, not captured values, so a failure repeats identically. Changes: - Add suffixStateKey() and failedSuffixKeys tracking on MemoReplayBacktrack - Convert replay from N-1 per-delta frames to a single cursor frame referencing the cached SuccessDelta[] array (1 allocation vs N-1) - Record suffix failure in matchGrammar loop before tryNextBacktrack so immediate sibling skips take effect - Unify spacingModeIdx lookup (was duplicated as suffixSpacingIdx and memoLeadingBits) - Remove the lossy dedupSuccessDeltas option (superseded by lossless suffix-failure pruning) - Add suffixFailurePruning.spec.ts: lossless parity, activation, and key discrimination tests Performance: pathology case 13.5s (no memo) -> 830ms (memo) -> 140ms (memo + suffix pruning + cursor). All 70 test suites pass (16,103 tests).
…base) Replace flat-array delta capture with chain-head pointer saves. captureSuccessDelta now stores valuesHead/valuesStop and valueIdsHead/valueIdsStop pointers into the immutable cons-lists plus a captureBase for deferred rebasing. No chain walking, no array allocation, no deep-copy at capture time. applyDelta walks the chain segments at replay time, creating new nodes with rebased IDs (replayBase - captureBase offset). rebaseMatchedValue is called only at replay, not at capture. Also removes the skipMemo matcher wiring (the optimizer pass and type field were removed in the prior commit) since the chain-pointer optimization eliminates the cost skipMemo was trying to avoid.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add packrat-style memoization to the grammar matcher's sub-rule entry path. When the matcher enters a
RulesPartalternation, it now records whether the entry succeeded or failed and caches the result keyed by(rules, index, leadingSpacingMode, pendingWildcard, requireValue, carrier). Subsequent entries with the same context short-circuit: failures returnfalseimmediately, and successes replay a captured delta onto the live state without re-executing the sub-rule body.Design
Failure caching: A
MemoMarkerBacktracksentinel frame is pushed onto the backtrack chain before each alternation cursor. When the marker surfaces intryNextBacktrack(all alternatives exhausted) withanyMemberFinalized === false, the entry context is recorded as"failed"in the per-callMemoCache. Future entries with the same key skip the entire sub-rule.Success caching: When a sub-rule member's body completes,
captureSuccessDeltasaves chain-head pointers into the live state's immutable cons-lists, plus acaptureBasefor deferred rebasing. On cache hit,applyDeltawalks the chain segments at replay time, creating new nodes with rebased IDs (replayBase - captureBase). Additional deltas are pushed asMemoReplayBacktrackframes in reverse order so LIFO pop yields source order.Suffix-failure pruning: After a memo-replay success, if the remainder of the grammar fails for a given delta, the
suffixStateKey(packed endOffset + spacing modes + pending wildcard) is recorded on the replay frame. Subsequent deltas with the same suffix-state key are skipped without executing the suffix, eliminating redundant suffix failures. This turns the pathological case from ~830ms to ~140ms (6x on top of the initial memoization speedup).Chain-pointer delta capture:
captureSuccessDeltastoresvaluesHead/valuesStopandvalueIdsHead/valueIdsStoppointers directly into the immutable cons-lists. No chain walking, no flat-array allocation, no deep-copy at capture time (O(1) pointer saves). Rebasing viarebaseMatchedValueis deferred entirely toapplyDeltaat replay time. For the pathological grammar with ~49K deltas (most never replayed), this eliminates ~192K intermediate allocations at capture time.Soundness boundaries: Success caching is suppressed (
noSuccessCache) for repeat entries (whose finalize pushes a CONTINUE backtrack frame not capturable in a single delta), entries with an active pending wildcard (the captured substring depends on the wildcard's start position, which is not in the cache key), and entries where the outer is not tracking values (the cache key does not distinguish null vs non-null tracking). Failure caching is always sound.Cache key: Encoded as a packed number
(index << 5) | flagBitsto avoid per-lookup string allocation. The 5 low bits encodeleadingSpacingMode(2 bits),pendingWildcard,requireValue, andcarrier.Opt-in/out: Controlled by
GrammarMatchOptions.memoization(defaults totrue). Whenfalse, no marker is pushed and the matcher behaves identically to the pre-memoization code path.Helpers
memoLookup: consolidated cache-probe + marker-push logic extracted fromenterRulesAlternation, handling failure short-circuit, success replay, and fresh-marker push in one place.requiresValue: predicate extracted from inline checks at multiple rule-entry sites.copyValueIdChainWithDelta: single-pass iterative deep-copy used byapplyDeltato rebaseValueIdNodechains at replay time and for carrier chain reconstruction.suffixStateKey: packs delta fields influencing suffix matching into a numeric key for fast equality checks.Performance
Testing
"memo-parity"validation infuzzHarness.ts/fuzzRunner.tsruns every generated grammar with memo on vs off and compares results. 5 fuzz suites ingrammarFuzz.spec.tscrossing memo parity with policy overrides.xittoitwith a 2000ms timeout.