Skip to content

tree-sitter-qmd - allow multi-line attribute lists on inline images/spans#209

Open
rundel wants to merge 3 commits into
quarto-dev:mainfrom
rundel:bugfix/multiline-inline-attrs
Open

tree-sitter-qmd - allow multi-line attribute lists on inline images/spans#209
rundel wants to merge 3 commits into
quarto-dev:mainfrom
rundel:bugfix/multiline-inline-attrs

Conversation

@rundel
Copy link
Copy Markdown
Contributor

@rundel rundel commented May 15, 2026

Not sure if this is even something you want to support or not - if the latter feel free to close this. Otherwise this is an attempt at a semi-minimally invasive fix for multi-line attributes.

Multi-line {...} attribute lists on inline images, spans, and other inline constructs that share _pandoc_attr_specifier were rejected at the first attribute. The same form rendered fine in TS Quarto and was common in quarto-web sources, leaving q2 as the lone holdout.

Root cause was twofold in tree-sitter-markdown/grammar.js:

  • attribute_specifier / _pandoc_attr_specifier had no tolerance for whitespace between { and the first specifier (or between _commonmark_specifier_start_with_class and } — also broke the single-line { .cls } form).
  • _inline_whitespace is choice($._whitespace, $._soft_line_break), a single token, so a \n followed by indent on the next line couldn't match as one inter-attribute separator.

Introduce _attr_ws: prec(-1, repeat1(choice($._whitespace, $._soft_line_break))) and use it where attribute lists need to span lines:

  • optional($._attr_ws) immediately after { in both attribute_specifier and _pandoc_attr_specifier.
  • Add trailing optional($._attr_ws) inside _commonmark_specifier_start_with_class (already present on _commonmark_specifier_start_with_kv). Keeping the trailer inside the specifier (rather than around the choice at the wrapper) avoids an LR conflict with language_specifier.
  • Swap inner _inline_whitespace_attr_ws at the four sites that join successive classes, classes-to-kv, and successive kvs.

commonmark_specifier's leading optional($._inline_whitespace) is left untouched — changing it triggered the language_specifier conflict, and the wrapper-level _attr_ws already absorbs anything that would have leaked through.

Regenerated artifacts (parser.c, grammar.json, node-types.json) are produced by tree-sitter generate; do not hand-edit. Also regenerated crates/pampa/resources/error-corpus/_autogen-table.json via deno run -A scripts/build_error_table.ts because parser-state IDs shifted — without this regen, 7 apostrophe-quotes tests in qmd-syntax-helper fail (the rule keys off Q-2-10 diagnostics which map through (lr_state, sym) pairs).

Tests:

  • 6 new tree-sitter corpus cases in inline-multiline-attrs.txt cover multi-line class-only, multi-line class + key=value, { .cls } symmetry, span multi-line class, span multi-line kv, and # Heading { .cls } for the block-level path.
  • 1 new pampa integration test in test_attr_source_parsing.rs verifies that the AST and attr_source byte offsets are correct for the quarto-web-style multi-line form.

Full tree-sitter corpus: 492/492 pass (was 486). Full workspace: 8943/8943 pass. cargo xtask verify Rust legs all green.

End-to-end check:

$ pampa repro.qmd
[ Header 1 ( "test" , [] , [] ) [Str "Test"]
, Para [Image ( "" , ["hero-banner", "img-fluid"]
, [("fig-align", "center"), ("width", "600px")] )
[] ("featured.png" , "")]
, Para [Str "Done."] ]

…es/spans

Multi-line `{...}` attribute lists on inline images, spans, and other
inline constructs that share `_pandoc_attr_specifier` were rejected at
the first attribute. The same form rendered fine in TS Quarto and was
common in quarto-web sources, leaving q2 as the lone holdout.

Root cause was twofold in `tree-sitter-markdown/grammar.js`:

  - `attribute_specifier` / `_pandoc_attr_specifier` had no tolerance
    for whitespace between `{` and the first specifier (or between
    `_commonmark_specifier_start_with_class` and `}` — also broke the
    single-line `{ .cls }` form).
  - `_inline_whitespace` is `choice($._whitespace, $._soft_line_break)`,
    a single token, so a `\n` followed by indent on the next line
    couldn't match as one inter-attribute separator.

Introduce `_attr_ws: prec(-1, repeat1(choice($._whitespace, $._soft_line_break)))`
and use it where attribute lists need to span lines:

  - `optional($._attr_ws)` immediately after `{` in both
    `attribute_specifier` and `_pandoc_attr_specifier`.
  - Add trailing `optional($._attr_ws)` inside
    `_commonmark_specifier_start_with_class` (already present on
    `_commonmark_specifier_start_with_kv`). Keeping the trailer inside
    the specifier (rather than around the choice at the wrapper)
    avoids an LR conflict with `language_specifier`.
  - Swap inner `_inline_whitespace` → `_attr_ws` at the four sites
    that join successive classes, classes-to-kv, and successive kvs.

`commonmark_specifier`'s leading `optional($._inline_whitespace)` is
left untouched — changing it triggered the language_specifier conflict,
and the wrapper-level `_attr_ws` already absorbs anything that would
have leaked through.

Regenerated artifacts (`parser.c`, `grammar.json`, `node-types.json`)
are produced by `tree-sitter generate`; do not hand-edit. Also
regenerated `crates/pampa/resources/error-corpus/_autogen-table.json`
via `deno run -A scripts/build_error_table.ts` because parser-state
IDs shifted — without this regen, 7 `apostrophe-quotes` tests in
`qmd-syntax-helper` fail (the rule keys off Q-2-10 diagnostics which
map through `(lr_state, sym)` pairs).

Tests:

  - 6 new tree-sitter corpus cases in `inline-multiline-attrs.txt`
    cover multi-line class-only, multi-line class + key=value,
    `{ .cls }` symmetry, span multi-line class, span multi-line kv,
    and `# Heading { .cls }` for the block-level path.
  - 1 new pampa integration test in `test_attr_source_parsing.rs`
    verifies that the AST and `attr_source` byte offsets are correct
    for the quarto-web-style multi-line form.

Full tree-sitter corpus: 492/492 pass (was 486). Full workspace:
8943/8943 pass. `cargo xtask verify` Rust legs all green.

End-to-end check:

  $ pampa repro.qmd
  [ Header 1 ( "test" , [] , [] ) [Str "Test"]
  , Para [Image ( "" , ["hero-banner", "img-fluid"]
                , [("fig-align", "center"), ("width", "600px")] )
           [] ("featured.png" , "")]
  , Para [Str "Done."] ]

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cscheid
Copy link
Copy Markdown
Member

cscheid commented May 15, 2026

I'm not against this fix, but anything involving line breaks in Markdown gives me a case of the heebie jeebies. I think it's prudent to add tests that exercise those spans inside bulleted lists and block quotes before declaring success.

(Operational annoyance: every time a grammar fix makes it in, we need to regenerate the error corpus table and parser.c autogen file. There's instructions for Claude to do it in the repo but you'll need to install Deno.)

rundel and others added 2 commits May 15, 2026 18:11
…e error

Adds edge-case coverage for the multi-line inline-attribute fix:

* tree-sitter corpus (`inline-multiline-attrs.txt`): six new cases for
  multi-line `{...}` attribute lists inside list items — bulleted,
  ordered, nested, image-in-list, plus two sibling-bullet variants
  (top-level and nested) that exercise list continuation across
  preceding and following items.
* pampa integration tests (`test_attr_source_parsing.rs`): five new
  AST-level tests mirroring the corpus cases — verifying that classes,
  key/value pairs, and `attr_source` byte offsets survive the
  block→inline boundary for each list shape.

Adds a new error code Q-2-37 for the one shape the grammar fix
cannot reach — multi-line attribute lists inside a blockquote. The
tree-sitter external scanner short-circuits SOFT_LINE_ENDING when the
next line begins with `>` (`scanner.c:2380-2407`), so the inline pass
only sees the first physical line of the attribute list and Q-2-2
fires at the same `(state=2587, sym="_close_block")` pair as a
plain top-level unclosed `{`. The two cases are indistinguishable at
the error-table lookup level but distinguishable in the source text.

* `resources/error-corpus/Q-2-37.json` — documents the new code with
  `cases: []` (no state mapping; this entry is emitted manually).
* `readers/qmd_error_messages.rs::upgrade_q22_to_q237_if_in_blockquote`
  — post-processes each Q-2-2 diagnostic: if the line of the failing
  `{` (after stripping leading whitespace) begins with `>`, rewrites
  `code`, `title`, `problem`, `hints`, and clears inherited
  `details` so the message reads cleanly without the Q-2-2 anchor
  note.
* `tests/test_q_2_37_blockquote_multiline_attrs.rs` — four tests:
  image-in-blockquote upgrades to Q-2-37; span-in-blockquote upgrades;
  blockquote with leading indent still upgrades; top-level `[attr]{[`
  stays Q-2-2 (negative control).

Full tree-sitter corpus: 495/495 (was 489 pre-fix, 493 after the
initial commit on this branch). Full workspace: 8952/8952.

End-to-end check on a real blockquote case:

  Error: [Q-2-37] Multi-line inline attribute list inside blockquote
   1 │ > ![](img.png){
     │                ╰── Inside a blockquote, an inline `{...}`
     │                    attribute list cannot span multiple lines.
  ℹ Put the attribute list on a single line, or move this construct
    out of the blockquote.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ne-attrs

# Conflicts:
#	crates/pampa/resources/error-corpus/Q-2-37.json
#	crates/pampa/resources/error-corpus/_autogen-table.json
#	crates/tree-sitter-qmd/tree-sitter-markdown/src/parser.c
@rundel
Copy link
Copy Markdown
Contributor Author

rundel commented May 15, 2026

Good call - bulleted lists look like they work fine and block quotes are a nightmare. For now I punted with it just throwing an error on multiline attributes w/in a block quote

I'm not sure if the Q-2-2 upgrade to Q-2-38 behavior based on context is acceptable or not.

Claude's summary is below.


Builds on the earlier multi-line attribute-list fix by adding edge-case coverage and a new diagnostic for the one shape the grammar fix can't reach.

List-context tests

Multi-line {...} attribute lists already work inside list items thanks to list-continuation handling — these tests lock that in against regression.

Tree-sitter corpus (inline-multiline-attrs.txt, 6 new cases):

  • Bulleted list item with multi-line span attrs.
  • Ordered list item with multi-line span attrs.
  • Nested bulleted list with multi-line span attrs in the inner item.
  • Bulleted list item with multi-line image attrs (class + key=value).
  • Top-level bullets with sibling items before and after the multi-line attrs at the same indent.
  • Nested bullets with sibling inner items before and after at the same indent.

Pampa integration tests (test_attr_source_parsing.rs, 5 new cases): AST-level mirrors of the above, verifying classes, key/value pairs, and attr_source byte offsets survive the block→inline boundary for each shape.

Q-2-38 — blockquote-specific error

Multi-line attribute lists inside a > blockquote don't work: the tree-sitter external scanner short-circuits SOFT_LINE_ENDING when the next line begins with > (scanner.c:2380-2407), so the inline parser sees only the first physical line and the same (state=2587, sym="_close_block") fires as a top-level unclosed { — indistinguishable at the lookup level, but distinguishable in the source text.

qmd_error_messages.rs::upgrade_q22_to_q238_if_in_blockquote post-processes each Q-2-2 diagnostic: if the error line begins with > (after optional leading whitespace), the diagnostic is rewritten to Q-2-38:

Error: [Q-2-38] Multi-line inline attribute list inside blockquote
 1 │ > ![](img.png){
   │                ╰── Inside a blockquote, an inline `{...}` attribute list cannot span multiple lines.
ℹ Put the attribute list on a single line, or move this construct out of the blockquote.
  • resources/error-corpus/Q-2-38.json documents the new code (cases: [] — manually emitted, not state-mapped).
  • tests/test_q_2_38_blockquote_multiline_attrs.rs: 4 tests — image-in-bq, span-in-bq, blockquote with leading indent, and a top-level negative control confirming [attr]{[ still maps to Q-2-2.

@cscheid
Copy link
Copy Markdown
Member

cscheid commented May 16, 2026

I think I want to leave this as an open issue until we can handle it uniformly. Creating these syntax exceptions is sort of opening the door for future trouble. I'd rather us reject those attributes uniformly even if it's annoying.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants