Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds support for regex literal syntax (/pattern/flags) to the ReScript tree-sitter grammar, fixing issue #255 where ReScript 12's new regex literal format was not being parsed correctly. The implementation is based on the tree-sitter-javascript approach and replaces the need for the older %re() syntax.
Changes:
- Added regex grammar rules with pattern and flags support to grammar.js
- Updated syntax highlighting and language injection queries for regex literals
- Added comprehensive test cases covering regex literals in various contexts
Reviewed changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| grammar.js | Defines regex grammar rules (regex, regex_pattern, regex_flags) and integrates them into expression and literal pattern rules |
| src/grammar.json | Generated grammar definition reflecting the new regex rules |
| src/node-types.json | Generated node type definitions including regex, regex_pattern, and regex_flags |
| queries/highlights.scm | Adds syntax highlighting for regex literals as string.special |
| queries/injections.scm | Adds regex language injection for regex patterns and removes trailing whitespace |
| test/corpus/literals.txt | Adds comprehensive test cases for regex literals including basic patterns, flags, let bindings, and function arguments |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| (regex_pattern) | ||
| (regex_flags)) | ||
| (string | ||
| (escape_sequence))))))) |
There was a problem hiding this comment.
Consider adding a test case that combines regex literals with division operators to explicitly verify disambiguation. For example:
let result = (10 / 2) + /test/g
This would help ensure the parser correctly distinguishes between division operators and regex delimiters in complex expressions, especially since both use the forward slash character. While the implementation should handle this correctly due to the use of token.immediate and precedence, an explicit test would provide confidence that this edge case is covered.
| (escape_sequence))))))) | |
| (escape_sequence))))))) | |
| ================================================================================ | |
| Regex literal combined with division operator | |
| ================================================================================ | |
| let result = (10 / 2) + /test/g | |
| -------------------------------------------------------------------------------- | |
| (source_file | |
| (let_declaration | |
| (let_binding | |
| (value_identifier) | |
| (binary_expression | |
| (binary_expression | |
| (integer) | |
| (operator) | |
| (integer)) | |
| (operator) | |
| (regex | |
| (regex_pattern) | |
| (regex_flags)))))) |
| ), | ||
| ), | ||
|
|
||
| regex_flags: (_) => token.immediate(/[a-z]+/), |
There was a problem hiding this comment.
The regex_flags pattern allows any sequence of lowercase letters (pattern: /[a-z]+/). While JavaScript/ReScript officially supports only specific flags (g, i, m, s, u, y, d), this lenient validation allows invalid flags like "xyz" to parse without error. Consider whether stricter validation would be beneficial, such as:
regex_flags: (_) => token.immediate(/[gimsuy]+/)However, the current implementation may be intentionally permissive for forward compatibility with potential new flags. If this is the intended behavior, it's acceptable, but it's worth documenting this design decision.
| regex_flags: (_) => token.immediate(/[a-z]+/), | |
| // Only allow officially supported JS/ReScript regex flags: g, i, m, s, u, y, d | |
| regex_flags: (_) => token.immediate(/[gimsuyd]+/), |
Fixes #255
I took inspiration (copied) from https://github.com/tree-sitter/tree-sitter-javascript, it seems rescript has the same regex syntax as js, couldn't find something official to confirm it.
I tested it using the zed extension everything works fine. Testing on a random file with contents:
Tree before:
Tree after: