Add packrat-style memoization to grammar matcher by curtisman · Pull Request #2295 · microsoft/TypeAgent

curtisman · 2026-05-05T19:34:15Z

Summary

Add packrat-style memoization to the grammar matcher's sub-rule entry path. When the matcher enters a RulesPart alternation, it now records whether the entry succeeded or failed and caches the result keyed by (rules, index, leadingSpacingMode, pendingWildcard, requireValue, carrier). Subsequent entries with the same context short-circuit: failures return false immediately, and successes replay a captured delta onto the live state without re-executing the sub-rule body.

Design

Failure caching: A MemoMarkerBacktrack sentinel frame is pushed onto the backtrack chain before each alternation cursor. When the marker surfaces in tryNextBacktrack (all alternatives exhausted) with anyMemberFinalized === false, the entry context is recorded as "failed" in the per-call MemoCache. Future entries with the same key skip the entire sub-rule.

Success caching: When a sub-rule member's body completes, captureSuccessDelta saves chain-head pointers into the live state's immutable cons-lists, plus a captureBase for deferred rebasing. On cache hit, applyDelta walks the chain segments at replay time, creating new nodes with rebased IDs (replayBase - captureBase). Additional deltas are pushed as MemoReplayBacktrack frames in reverse order so LIFO pop yields source order.

Suffix-failure pruning: After a memo-replay success, if the remainder of the grammar fails for a given delta, the suffixStateKey (packed endOffset + spacing modes + pending wildcard) is recorded on the replay frame. Subsequent deltas with the same suffix-state key are skipped without executing the suffix, eliminating redundant suffix failures. This turns the pathological case from ~830ms to ~140ms (6x on top of the initial memoization speedup).

Chain-pointer delta capture: captureSuccessDelta stores valuesHead/valuesStop and valueIdsHead/valueIdsStop pointers directly into the immutable cons-lists. No chain walking, no flat-array allocation, no deep-copy at capture time (O(1) pointer saves). Rebasing via rebaseMatchedValue is deferred entirely to applyDelta at replay time. For the pathological grammar with ~49K deltas (most never replayed), this eliminates ~192K intermediate allocations at capture time.

Soundness boundaries: Success caching is suppressed (noSuccessCache) for repeat entries (whose finalize pushes a CONTINUE backtrack frame not capturable in a single delta), entries with an active pending wildcard (the captured substring depends on the wildcard's start position, which is not in the cache key), and entries where the outer is not tracking values (the cache key does not distinguish null vs non-null tracking). Failure caching is always sound.

Cache key: Encoded as a packed number (index << 5) | flagBits to avoid per-lookup string allocation. The 5 low bits encode leadingSpacingMode (2 bits), pendingWildcard, requireValue, and carrier.

Opt-in/out: Controlled by GrammarMatchOptions.memoization (defaults to true). When false, no marker is pushed and the matcher behaves identically to the pre-memoization code path.

Helpers

memoLookup: consolidated cache-probe + marker-push logic extracted from enterRulesAlternation, handling failure short-circuit, success replay, and fresh-marker push in one place.
requiresValue: predicate extracted from inline checks at multiple rule-entry sites.
copyValueIdChainWithDelta: single-pass iterative deep-copy used by applyDelta to rebase ValueIdNode chains at replay time and for carrier chain reconstruction.
suffixStateKey: packs delta fields influencing suffix matching into a numeric key for fast equality checks.

Performance

Stage	Pathological grammar time
No memoization	~13.5s
+ Memoization	~830ms (16x)
+ Suffix-failure pruning	~140ms (6x)
+ Chain-pointer capture	~140ms (eliminates capture allocations; same wall time, less GC pressure)

Testing

memoizationCoverage.spec.ts (762 lines): 13 describe blocks covering flag parity, failure cache, success delta fidelity, valueId rebasing, cache key discrimination, carrier mode, noSuccessCache boundaries, replay LIFO ordering, wildcard deltas, lastMatchedPartInfo preservation, policy cross-product parity, and suppression with active memo frames.
suffixFailurePruning.spec.ts (new): Tests lossless parity (memo on vs off produce identical results), activation (verifies pruning fires and reduces work), and key discrimination (different endOffsets/spacing modes are not conflated).
Fuzz harness: "memo-parity" validation in fuzzHarness.ts / fuzzRunner.ts runs every generated grammar with memo on vs off and compares results. 5 fuzz suites in grammarFuzz.spec.ts crossing memo parity with policy overrides.
Pathology test: re-enabled from xit to it with a 2000ms timeout.
All 70 test suites (16,103 tests) pass.

Implement success + failure memoization in the grammar matcher to eliminate exponential backtracking. The memo cache records deltas at rule-alternation boundaries and replays them on cache hits, avoiding redundant re-exploration of sub-rule alternatives. Key changes: - MemoCache stores per-rule-set success deltas and failure sentinels - MemoMarkerBacktrack captures entry state; on completion, records a SuccessDelta with relative value/valueId chains for safe replay - MemoReplayBacktrack restores snapshots and applies deltas for cached successes (N-1 frames pushed in reverse order) - Cache key includes position, leading context, pendingWildcard, requireValue, and carrier mode to prevent collisions - Repeat entries, pendingWildcardActive, and null-valueIds entries are excluded from success caching (noSuccessCache flag) - GrammarMatchOptions.memoization flag (default true) controls the feature at the API level Re-enable two previously disabled tests: - grammarMatcherBacktrackPathology.spec.ts: pathological backtracking test (was xit, now it; threshold 3000ms for CI jitter) - grammarFuzz.spec.ts: tail-promote fuzz block (was commented out)

Extract the ~75-line memo cache lookup/replay/marker-push block from enterRulesAlternation into a standalone memoLookup function that returns a discriminated result ("failed" | "replayed" | marker | undefined). Extract the repeated 3-line predicate state.valueIds !== null && (part.variable !== undefined || usesImplicitDefault(state)) into a requiresValue(state, part) helper, replacing all three occurrences (matchStringPartWithWildcard, matchStringPartWithoutWildcard, enterRulesAlternation).

- memoizationCoverage.spec.ts (228 tests): hand-written coverage for memo flag parity, failure/success cache, delta fidelity, valueId rebasing, cache key discrimination, carrier mode replay, noSuccessCache boundaries, replay ordering, pending wildcard deltas, lastMatchedPartInfo, and policy x memo cross-product. - fuzzHarness.ts: new "memo-parity" validation kind that compares matchGrammar results with memoization ON vs OFF, with optional policy cross-product via memoPolicySets config field. - grammarFuzz.spec.ts: 5 new fuzz dimensions (~1920 tests) covering broad features, ruleRef reuse (ruleRefReuseProb: 0.4), nested rules + carriers, policy cross-product, and spacing modes. - fuzzRunner.ts: wire memo-parity into CLI repro-replay switch.

…ppression tests - Rewrite copyValueIdChainWithDelta to single-pass iterative (tail-pointer) instead of collect-then-reverse - Document the always-replaced invariant on lastMatchedPartInfo at all 4 mutation sites (captureSuccessDelta relies on pointer inequality) - Add enterDispatchPart comments explaining why tail-call returns are always true (no parent frame, no memoization) - Add suppression + memo frame preservation tests: wildcardPolicy shortest, optionalPolicy preferTake, repeatPolicy greedy, and all three combined - Update pathology test comment to reflect observed timing

memoCacheKey now returns (index << 5) | flagBits instead of allocating a template-literal string per lookup. The 5 low bits encode leadingSpacingMode (2 bits), pendingWildcard, requireValue, and carrier. index is bounded by request.length (V8 max ~2^28), so the packed value stays well within Number.MAX_SAFE_INTEGER.

When a memoized sub-rule produces multiple success deltas, replay applies each and runs the outer continuation. If the continuation fails, the delta's suffix-state key (endOffset, spacingModes, pendingWildcardOffset) is recorded in a shared failedSuffixKeys set. Subsequent replay deltas with the same key are skipped: the suffix depends only on parse position and spacing/wildcard state, not captured values, so a failure repeats identically. Changes: - Add suffixStateKey() and failedSuffixKeys tracking on MemoReplayBacktrack - Convert replay from N-1 per-delta frames to a single cursor frame referencing the cached SuccessDelta[] array (1 allocation vs N-1) - Record suffix failure in matchGrammar loop before tryNextBacktrack so immediate sibling skips take effect - Unify spacingModeIdx lookup (was duplicated as suffixSpacingIdx and memoLeadingBits) - Remove the lossy dedupSuccessDeltas option (superseded by lossless suffix-failure pruning) - Add suffixFailurePruning.spec.ts: lossless parity, activation, and key discrimination tests Performance: pathology case 13.5s (no memo) -> 830ms (memo) -> 140ms (memo + suffix pruning + cursor). All 70 test suites pass (16,103 tests).

…base) Replace flat-array delta capture with chain-head pointer saves. captureSuccessDelta now stores valuesHead/valuesStop and valueIdsHead/valueIdsStop pointers into the immutable cons-lists plus a captureBase for deferred rebasing. No chain walking, no array allocation, no deep-copy at capture time. applyDelta walks the chain segments at replay time, creating new nodes with rebased IDs (replayBase - captureBase offset). rebaseMatchedValue is called only at replay, not at capture. Also removes the skipMemo matcher wiring (the optimizer pass and type field were removed in the prior commit) since the chain-pointer optimization eliminates the cost skipMemo was trying to avoid.

curtisman added 5 commits May 5, 2026 09:58

curtisman had a problem deploying to development-fork May 5, 2026 19:34 — with GitHub Actions Error

curtisman enabled auto-merge May 5, 2026 19:34

curtisman added 2 commits May 5, 2026 16:48

curtisman had a problem deploying to development-fork May 6, 2026 01:38 — with GitHub Actions Error

curtisman had a problem deploying to development-fork May 6, 2026 01:41 — with GitHub Actions Error

curtisman force-pushed the opt5 branch from 8df9578 to dddcd58 Compare May 6, 2026 01:41

curtisman requested a deployment to development-fork May 6, 2026 01:41 — with GitHub Actions Waiting

curtisman added this pull request to the merge queue May 6, 2026

Merged via the queue into microsoft:main with commit c680f03 May 6, 2026
16 of 29 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add packrat-style memoization to grammar matcher#2295

Add packrat-style memoization to grammar matcher#2295
curtisman merged 7 commits intomicrosoft:mainfrom
curtisman:opt5

curtisman commented May 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

curtisman commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design

Helpers

Performance

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

curtisman commented May 5, 2026 •

edited

Loading