Skip to content

yeast: Add support for a user-managed context during rule rewrites#22054

Open
tausbn wants to merge 13 commits into
mainfrom
tausbn/yeast-context-reification
Open

yeast: Add support for a user-managed context during rule rewrites#22054
tausbn wants to merge 13 commits into
mainfrom
tausbn/yeast-context-reification

Conversation

@tausbn

@tausbn tausbn commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

This provides a (hopefully) clean way to allow information to flow from nodes to their descendants.

Note that although the user context is mutable, it is also snapshotted before each recursive call (and restored afterwards). Thus, modifications that happen inside one child node cannot affect the context seen by another. This seemed like the healthiest choice and the most consistent semantics.

This PR builds on tope of #22016, which should be merged first.
It can be reviewed commit-by-commit.

(The first commit is a bit heavy due to all of the generic parameters being introduced. The interesting changes take place in lib.rs and swift.rs.)

@github-advanced-security github-advanced-security AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

Comment thread shared/yeast/src/lib.rs
fn apply_one_shot_rules_inner<C: Clone>(
index: &RuleIndex<C>,
ast: &mut Ast,
user_ctx: &mut C,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand this correctly: the function takes &mut C but promises to always restore it to its original state before returning? Could it just take C instead? Is this an optimisation of some sort?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When running a manual_rule, we may end up mutating the context and so it needs to be mutable at least there. For C to be immutable, we would have to pass it along separately.

I did consider this (specifically having it as a separate argument to the ctx.translate call), but it seemed a bit verbose and the context approach felt like a better fit.

/// translation reaches inner rules that read that state.
///
/// ```text
/// manual_rule!(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the name 👍

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think I just had a good idea (dangerous, I know). What if instead of having a separate macro that auto-translates nothing (and one that auto-translates everything), we add a way to annotate a capture as "raw" or "untranslated"?

Then we could -- at the cost of relatively little syntax -- get the best of both worlds. Non-raw captures get auto-translated before the body of the rule target is executed, everything else is up to the user.

As for the syntax, I'm thinking either @!foo or perhaps @@foo -- something that stands out sufficiently to remind the user that there's a raw capture that needs to be dealt with.

Also, in that case I would auto-wrap the entire body of (the result of expanding) manual_rule in Ok to avoid having to write Ok(...) at the end. This is what rule does currently anyway, and it wouldn't change the expressivity (since we need to return Result anyway).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry about the syntax becoming a soup of symbols tbh. Can we keep manual_rule as it is for now?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. I'll postpone those changes for a different PR.

tausbn and others added 13 commits June 25, 2026 17:28
Renames what was previously called `__yeast_ctx` into just `ctx`, and
adds a new field `user_ctx` to this context. Said field can contain a
struct of any user type (necessitating making various parts of the
implementation generic in said type).

Through some Deref magic, field accesses are delegated to the inner
struct (assuming they are not already defined on `ctx`), which should
hopefully make the interface a bit more ergonomic.
This will enable us to actually capture and log errors in complicated
rules (e.g. ones written in Rust) rather than just panicking.
This enables users to specify how and when these captures get
translated. In conjunction with the context mechanism, this can be used
to e.g. translate some piece of information (e.g. the type of
something), record it in the context, and then recursively translate
some other capture that relies on this information. This allows
information to be cleanly passed into descendants (which can be written
using context accesses in the `rule!` macro form).

As a consequence of this change, we now need to pass around a
TranslatorHandle to perform the manual translation. For Repeating rules,
it doesn't really make sense to translate things, so in this case we
simply signal an error.

Also, the implementation of the `rule!` macro changes slightly (without
changing semantics): it now essentially delegates to `Rule::new`,
receiving raw captures, but then immediately applies the translation to
those captures (which, for the majority of cases, is likely the desired
behaviour).
Adds `manual_rule!` which provides a more low-level interface for
defining rewrites. (I'm not entirely sold on the name, so any
suggestions would be welcome.)

Notably, the captures bound in the body of such rules have _not_ been
translated yet -- they still come from the _input_ tree. It is the
user's duty to call ctx.translate on these (which has the effect of
recursively invoking the translation) before substituting them into the
output.

For _truly_ low-level access, the user can still construct a Rule
directly, but this is now somewhat cumbersome as the closure contained
therein takes quite a few parameters. Still, the possibility remains.
This was necessary since otherwise the generic type of the
user-specified context (which should only be a concern for yeast) starts
to bleed out into the shared extractor. Instead, we type-erase it by
putting it inside the aforementioned trait.
Propagates in name and type information for various property
declarations, using the context mechanism. This avoids mutating
already-translated nodes in-place, and is generally much easier to read.
Extends the context with a field for keeping track of the default value.

In the process, we also rename the context to SwiftContext as it now
doesn't only concern itself with properties.
Avoids more "mutation after creation" via prepend_field.

Also adds a test to the corpus for exercising this syntax. Although it's
not evident, the test output was unchanged by this refactoring.
Same as in the preceding commit, we added a test beforehand for testing
this syntax, and verified that it was unchanged by the cleanup in this
commit.
Gets rid of the final uses of mutation (via prepend_field). The approach
is the same as in the preceding commits: we set the appropriate fields
on the context when processing the outer node, and then access these
fields on the inner nodes.

The repeated use of `modifier` fields is a _bit_ clunky, but since we're
likely moving to an out-of-band modifier mechanism at some point, I
think it's good enough for now.
(Both reduce_left and map are still supported, but we could remove them
at this point.)

I think this way of writing things makes the intent a lot clearer -- it
avoids extending the yeast rule language with complicated constructs,
pushing the complexity (such as it is) into Rust instead.
Cleans up a few places where we were constructing trees piece by piece
rather than using the `tree!` macro.

In the process, Copilot noticed an issue that should probably be
addressed: the labeled_statement rule can never fire, since there are no
such nodes in the input. This is possibly a simple as making
_labeled_statement (which _does_ exist) named, but I haven't attempted
this.

Finally, a small change to yeast makes it so that the contents of a {}
interpolation can be a Rust block (previously it could only be a single
expression). This avoids the need to double-wrap instances where you
want to interpolate a single node produced as the final value of some
block.
Format the touched Rust crates (shared/tree-sitter-extractor,
shared/yeast, shared/yeast-macros, unified/extractor) so the
tree-sitter-extractor CI fmt check passes. No functional changes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@tausbn tausbn force-pushed the tausbn/yeast-context-reification branch from f75cf95 to af7ae8c Compare June 25, 2026 15:28
@tausbn tausbn marked this pull request as ready for review June 25, 2026 15:28
@tausbn tausbn requested review from a team as code owners June 25, 2026 15:28
Copilot AI review requested due to automatic review settings June 25, 2026 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants