yeast: Add support for a user-managed context during rule rewrites#22054
yeast: Add support for a user-managed context during rule rewrites#22054tausbn wants to merge 13 commits into
Conversation
There was a problem hiding this comment.
CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.
| fn apply_one_shot_rules_inner<C: Clone>( | ||
| index: &RuleIndex<C>, | ||
| ast: &mut Ast, | ||
| user_ctx: &mut C, |
There was a problem hiding this comment.
Do I understand this correctly: the function takes &mut C but promises to always restore it to its original state before returning? Could it just take C instead? Is this an optimisation of some sort?
There was a problem hiding this comment.
When running a manual_rule, we may end up mutating the context and so it needs to be mutable at least there. For C to be immutable, we would have to pass it along separately.
I did consider this (specifically having it as a separate argument to the ctx.translate call), but it seemed a bit verbose and the context approach felt like a better fit.
| /// translation reaches inner rules that read that state. | ||
| /// | ||
| /// ```text | ||
| /// manual_rule!( |
There was a problem hiding this comment.
Actually, I think I just had a good idea (dangerous, I know). What if instead of having a separate macro that auto-translates nothing (and one that auto-translates everything), we add a way to annotate a capture as "raw" or "untranslated"?
Then we could -- at the cost of relatively little syntax -- get the best of both worlds. Non-raw captures get auto-translated before the body of the rule target is executed, everything else is up to the user.
As for the syntax, I'm thinking either @!foo or perhaps @@foo -- something that stands out sufficiently to remind the user that there's a raw capture that needs to be dealt with.
Also, in that case I would auto-wrap the entire body of (the result of expanding) manual_rule in Ok to avoid having to write Ok(...) at the end. This is what rule does currently anyway, and it wouldn't change the expressivity (since we need to return Result anyway).
There was a problem hiding this comment.
I worry about the syntax becoming a soup of symbols tbh. Can we keep manual_rule as it is for now?
There was a problem hiding this comment.
Fair enough. I'll postpone those changes for a different PR.
Renames what was previously called `__yeast_ctx` into just `ctx`, and adds a new field `user_ctx` to this context. Said field can contain a struct of any user type (necessitating making various parts of the implementation generic in said type). Through some Deref magic, field accesses are delegated to the inner struct (assuming they are not already defined on `ctx`), which should hopefully make the interface a bit more ergonomic.
This will enable us to actually capture and log errors in complicated rules (e.g. ones written in Rust) rather than just panicking.
This enables users to specify how and when these captures get translated. In conjunction with the context mechanism, this can be used to e.g. translate some piece of information (e.g. the type of something), record it in the context, and then recursively translate some other capture that relies on this information. This allows information to be cleanly passed into descendants (which can be written using context accesses in the `rule!` macro form). As a consequence of this change, we now need to pass around a TranslatorHandle to perform the manual translation. For Repeating rules, it doesn't really make sense to translate things, so in this case we simply signal an error. Also, the implementation of the `rule!` macro changes slightly (without changing semantics): it now essentially delegates to `Rule::new`, receiving raw captures, but then immediately applies the translation to those captures (which, for the majority of cases, is likely the desired behaviour).
Adds `manual_rule!` which provides a more low-level interface for defining rewrites. (I'm not entirely sold on the name, so any suggestions would be welcome.) Notably, the captures bound in the body of such rules have _not_ been translated yet -- they still come from the _input_ tree. It is the user's duty to call ctx.translate on these (which has the effect of recursively invoking the translation) before substituting them into the output. For _truly_ low-level access, the user can still construct a Rule directly, but this is now somewhat cumbersome as the closure contained therein takes quite a few parameters. Still, the possibility remains.
This was necessary since otherwise the generic type of the user-specified context (which should only be a concern for yeast) starts to bleed out into the shared extractor. Instead, we type-erase it by putting it inside the aforementioned trait.
Propagates in name and type information for various property declarations, using the context mechanism. This avoids mutating already-translated nodes in-place, and is generally much easier to read.
Extends the context with a field for keeping track of the default value. In the process, we also rename the context to SwiftContext as it now doesn't only concern itself with properties.
Avoids more "mutation after creation" via prepend_field. Also adds a test to the corpus for exercising this syntax. Although it's not evident, the test output was unchanged by this refactoring.
Same as in the preceding commit, we added a test beforehand for testing this syntax, and verified that it was unchanged by the cleanup in this commit.
Gets rid of the final uses of mutation (via prepend_field). The approach is the same as in the preceding commits: we set the appropriate fields on the context when processing the outer node, and then access these fields on the inner nodes. The repeated use of `modifier` fields is a _bit_ clunky, but since we're likely moving to an out-of-band modifier mechanism at some point, I think it's good enough for now.
(Both reduce_left and map are still supported, but we could remove them at this point.) I think this way of writing things makes the intent a lot clearer -- it avoids extending the yeast rule language with complicated constructs, pushing the complexity (such as it is) into Rust instead.
Cleans up a few places where we were constructing trees piece by piece
rather than using the `tree!` macro.
In the process, Copilot noticed an issue that should probably be
addressed: the labeled_statement rule can never fire, since there are no
such nodes in the input. This is possibly a simple as making
_labeled_statement (which _does_ exist) named, but I haven't attempted
this.
Finally, a small change to yeast makes it so that the contents of a {}
interpolation can be a Rust block (previously it could only be a single
expression). This avoids the need to double-wrap instances where you
want to interpolate a single node produced as the final value of some
block.
Format the touched Rust crates (shared/tree-sitter-extractor, shared/yeast, shared/yeast-macros, unified/extractor) so the tree-sitter-extractor CI fmt check passes. No functional changes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
f75cf95 to
af7ae8c
Compare
This provides a (hopefully) clean way to allow information to flow from nodes to their descendants.
Note that although the user context is mutable, it is also snapshotted before each recursive call (and restored afterwards). Thus, modifications that happen inside one child node cannot affect the context seen by another. This seemed like the healthiest choice and the most consistent semantics.
This PR builds on tope of #22016, which should be merged first.
It can be reviewed commit-by-commit.
(The first commit is a bit heavy due to all of the generic parameters being introduced. The interesting changes take place in
lib.rsandswift.rs.)