Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
ce7c8bb
emit: emit type-checked TypeScript (tsTarget, issue #6 first step)
johnsoncodehk Jun 21, 2026
2c87267
emit: extend the tsc gate to the fallback-lexer / non-soa path (yaml,…
johnsoncodehk Jun 21, 2026
7d47ca3
emit: target-agnostic emitter — derived Go + Rust parsers (issue #6)
johnsoncodehk Jun 21, 2026
3059804
Remove gen-ast-types: the typed-CST generator had no load-bearing con…
johnsoncodehk Jun 21, 2026
070b965
emit-portable: grow to a real JS subset; derived Rust matches oxc thr…
johnsoncodehk Jun 21, 2026
d1308d3
emit-portable: arena-allocate the Go target (3.5x faster, vs tsgo)
johnsoncodehk Jun 21, 2026
7314dde
emit-portable: general token-pattern matcher (real-grammar lexer, sta…
johnsoncodehk Jun 21, 2026
747b039
emit-portable: port the general token matcher to Go + Rust (lexer con…
johnsoncodehk Jun 21, 2026
b10cfdd
emit-portable: stateful regex-vs-division lexer in all three targets …
johnsoncodehk Jun 21, 2026
c0d84d0
emit-portable: template-literal interpolation in all three targets (s…
johnsoncodehk Jun 21, 2026
c99a67d
emit-portable: postfix-token Pratt LED — tagged templates (stage 6 be…
johnsoncodehk Jun 21, 2026
65498ed
emit-portable: general (non-literal) inline alt in all three targets
johnsoncodehk Jun 21, 2026
8fc593a
emit-portable: `not` step + general Pratt NUD sequences
johnsoncodehk Jun 21, 2026
22cfc5e
emit-portable: postfix-operator Pratt LED + access-tail closure
johnsoncodehk Jun 21, 2026
ab022a7
emit-portable: grouped sub-sequence `seq` step
johnsoncodehk Jun 21, 2026
9624d4f
emit-portable: `sameLine` zero-width assertion + lexer newline tracking
johnsoncodehk Jun 21, 2026
f807c6b
emit-portable: capBelow arrow functions + fix sep trailing-delimiter
johnsoncodehk Jun 21, 2026
395ba51
emit-portable: a transparent group degrades to a `seq` step
johnsoncodehk Jun 21, 2026
544e277
emit-portable: precedence-gated mixfix LEDs (ternary + chain-rhs in/i…
johnsoncodehk Jun 21, 2026
ba158c0
emit-portable: no-in suppress, +-quantifier, sep/bracket fixes — java…
johnsoncodehk Jun 21, 2026
0b6d7fd
emit-portable: the real javascript.ts grammar emits to ts/go/rust (is…
johnsoncodehk Jun 21, 2026
cd4ebc8
emit-portable: typescript.ts emits too — both real grammars in the ga…
johnsoncodehk Jun 21, 2026
ca2a56b
docs: README — the emitted parser need not be JS (issue #6)
johnsoncodehk Jun 21, 2026
aeb4736
emit: converge to 2 target-parameterized APIs (emitParser reuses emit…
johnsoncodehk Jun 21, 2026
6e0df6c
test: move the portable-targets fixtures examples/ -> test/fixtures/
johnsoncodehk Jun 21, 2026
84895d3
Address Copilot review: portable lexer newline parity + stale-API doc…
johnsoncodehk Jun 21, 2026
8cca2bc
Fix JS line-terminator conformance across all four lexers (CR / LS / PS)
johnsoncodehk Jun 21, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Generated artifacts (npm run gen) — committed for consumers, CI-gated for
# staleness, collapsed in GitHub diffs. The grammar sources (*.ts at the repo
# root) are the hand-written truth; everything below is derived from them.
# (*.cst-types.ts / *.cst-match.ts are generated too but NOT committed — see
# .gitignore; they regenerate locally and in CI before typecheck/gates.)
# (*.cst-match.ts is generated too but NOT committed — see .gitignore;
# it regenerates locally and in CI before typecheck/gates.)
*.tmLanguage.json linguist-generated=true
*.language-configuration.json linguist-generated=true
*.monarch.json linguist-generated=true
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,9 @@ jobs:

- run: npm ci

# Regenerate every grammar's artifacts FIRST: the uncommitted ones
# (*.cst-types.ts / *.cst-match.ts, gitignored) must exist before Typecheck
# and the gates, which import them. Then fail if any COMMITTED artifact
# Regenerate every grammar's artifacts FIRST: the uncommitted one
# (*.cst-match.ts, gitignored) must exist before Typecheck
# and the gates, which import it. Then fail if any COMMITTED artifact
# drifts from the regenerated output (someone edited a grammar but forgot
# to regenerate). Covers all grammars (sources at the repo root) + the
# tree-sitter packages.
Expand Down
3 changes: 1 addition & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@ tree-sitter/*/src/node-types.json
tree-sitter/*/src/tree_sitter/
tree-sitter/*/*.wasm

# Generated CST consumer artifacts (npm run gen) — derived from the grammar, not
# Generated CST consumer artifact (npm run gen) — derived from the grammar, not
# committed: generate locally / in CI before typecheck and gates.
*.cst-types.ts
*.cst-match.ts
16 changes: 14 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -338,6 +338,19 @@ const Regex = token(seq(

[`test/agnostic.ts`](test/agnostic.ts) proves it directly — the same engine parses a toy grammar whose identifier token is `Word`, with no templates or regex. The deeper proof is [`html.ts`](html.ts): markup shares *nothing* with TypeScript's token stream, yet the same engine handles it.

### The emitted parser need not be JS — Go, Rust, native

The grammar also derives a **standalone parser in another language**. [`emitParser(grammar, target)`](src/emit.ts) runs one analysis into one language-agnostic IR, and each `Target` renders it — including its own regex-free lexer (`emitParser` reuses `emitLexer(grammar, target)`), so the output has no dependency on the JS runtime and compiles offline:

```ts
import { emitParser, goTarget, rustTarget } from './src/emit.ts';

writeFileSync('parser.go', emitParser(grammar, goTarget)); // `go build`, no deps
writeFileSync('parser.rs', emitParser(grammar, rustTarget)); // `rustc`, no crates
```

The proof is the full languages: the real [`javascript.ts`](javascript.ts) and [`typescript.ts`](typescript.ts) grammars — including the `[Await]/[Yield]` fork, left recursion, the regex/division and template state machines, arrow functions, and the TS type grammar — emit to **TypeScript, Go, and Rust**, and every CST is byte-identical to the reference interpreter. [`test/portable-targets.ts`](test/portable-targets.ts) compiles and runs all three for sixteen grammars (the two real languages plus focused fixtures) on every CI run. The Rust output reaches [oxc](https://github.com/oxc-project/oxc) throughput and the Go output beats [tsgo](https://github.com/microsoft/typescript-go) on the same corpus (an arena keeps both near zero-allocation). Byte-based Go/Rust use UTF-8 offsets — identical to the JS interpreter's for ASCII; non-ASCII offset units differ inherently.

## Adding a language

A new language is **one grammar file** on the unchanged engine:
Expand Down Expand Up @@ -375,8 +388,7 @@ typescript.ts one grammar (TypeScript combinator API)
├─ src/gen-tm.ts ───────────▶ typescript.tmLanguage.json (TextMate highlighter)
├─ src/gen-vscode-config.ts ▶ typescript.language-configuration.json (editor behavior)
├─ src/gen-treesitter.ts ───▶ tree-sitter/ (grammar.js + highlights.scm + scanner.c)
├─ src/gen-monarch.ts ──────▶ typescript.monarch.json
└─ src/gen-ast-types.ts ────▶ typescript.cst-types.ts
└─ src/gen-monarch.ts ──────▶ typescript.monarch.json

shared src/grammar-utils.ts structural helpers used across stages
src/api.ts, types.ts the grammar's combinator + type surface
Expand Down
8 changes: 2 additions & 6 deletions src/cli.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@ import { generateTmLanguage, generateMarkupInjection, generateAliasGrammar, gene
import { generateLanguageConfig } from './gen-vscode-config.ts';
import { generateTreeSitter } from './gen-treesitter.ts';
import { generateMonarch } from './gen-monarch.ts';
import { generateAstTypes } from './gen-ast-types.ts';
import { generateCstMatch } from './gen-cst-match.ts';
import type { CstGrammar, RuleExpr } from './types.ts';
import { tokenPatternSource } from './token-pattern.ts';
Expand Down Expand Up @@ -115,11 +114,8 @@ emit(`tree-sitter/${langName}/package.json`,
// Monaco Monarch tokenizer (markup-aware: emits a tag/text/raw-text state machine).
emit(`${langName}.monarch.json`, JSON.stringify(generateMonarch(grammar), null, 2));

// CST node types (TypeScript) — generic over rules, fine for markup too.
emit(`${langName}.cst-types.ts`, generateAstTypes(grammar));

// Per-arm CST destructurers (value-level sibling of the types above).
emit(`${langName}.cst-match.ts`, generateCstMatch(grammar, `./${langName}.cst-types.ts`));
// Per-arm CST destructurers.
emit(`${langName}.cst-match.ts`, generateCstMatch(grammar));

function formatExpr(expr: RuleExpr): string {
switch (expr.type) {
Expand Down
Loading
Loading