Skip to content

Add regex support#265

Open
lungarella-raffaele wants to merge 1 commit intorescript-lang:mainfrom
lungarella-raffaele:feat-regex-grammar
Open

Add regex support#265
lungarella-raffaele wants to merge 1 commit intorescript-lang:mainfrom
lungarella-raffaele:feat-regex-grammar

Conversation

@lungarella-raffaele
Copy link

Fixes #255

I took inspiration (copied) from https://github.com/tree-sitter/tree-sitter-javascript, it seems rescript has the same regex syntax as js, couldn't find something official to confirm it.

I tested it using the zed extension everything works fine. Testing on a random file with contents:

let r = /\\n/g

Tree before:

(source_file [0, 0] - [1, 0]
  (ERROR [0, 0] - [0, 14]
    (value_identifier [0, 4] - [0, 5])
    (ERROR [0, 11] - [0, 12])
    (ERROR [0, 13] - [0, 14])))

Tree after:

(source_file [0, 0] - [1, 0]
  (let_declaration [0, 0] - [0, 14]
    (let_binding [0, 4] - [0, 14]
      pattern: (value_identifier [0, 4] - [0, 5])
      body: (regex [0, 8] - [0, 14]
        pattern: (regex_pattern [0, 9] - [0, 12])
        flags: (regex_flags [0, 13] - [0, 14])))))

@lungarella-raffaele lungarella-raffaele changed the title Add regex parsing Add regex support Jan 29, 2026
@shulhi shulhi requested a review from aspeddro February 12, 2026 00:55
@nojaf nojaf requested a review from Copilot February 18, 2026 06:50
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for regex literal syntax (/pattern/flags) to the ReScript tree-sitter grammar, fixing issue #255 where ReScript 12's new regex literal format was not being parsed correctly. The implementation is based on the tree-sitter-javascript approach and replaces the need for the older %re() syntax.

Changes:

  • Added regex grammar rules with pattern and flags support to grammar.js
  • Updated syntax highlighting and language injection queries for regex literals
  • Added comprehensive test cases covering regex literals in various contexts

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
grammar.js Defines regex grammar rules (regex, regex_pattern, regex_flags) and integrates them into expression and literal pattern rules
src/grammar.json Generated grammar definition reflecting the new regex rules
src/node-types.json Generated node type definitions including regex, regex_pattern, and regex_flags
queries/highlights.scm Adds syntax highlighting for regex literals as string.special
queries/injections.scm Adds regex language injection for regex patterns and removes trailing whitespace
test/corpus/literals.txt Adds comprehensive test cases for regex literals including basic patterns, flags, let bindings, and function arguments

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

(regex_pattern)
(regex_flags))
(string
(escape_sequence)))))))
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a test case that combines regex literals with division operators to explicitly verify disambiguation. For example:

let result = (10 / 2) + /test/g

This would help ensure the parser correctly distinguishes between division operators and regex delimiters in complex expressions, especially since both use the forward slash character. While the implementation should handle this correctly due to the use of token.immediate and precedence, an explicit test would provide confidence that this edge case is covered.

Suggested change
(escape_sequence)))))))
(escape_sequence)))))))
================================================================================
Regex literal combined with division operator
================================================================================
let result = (10 / 2) + /test/g
--------------------------------------------------------------------------------
(source_file
(let_declaration
(let_binding
(value_identifier)
(binary_expression
(binary_expression
(integer)
(operator)
(integer))
(operator)
(regex
(regex_pattern)
(regex_flags))))))

Copilot uses AI. Check for mistakes.
),
),

regex_flags: (_) => token.immediate(/[a-z]+/),
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex_flags pattern allows any sequence of lowercase letters (pattern: /[a-z]+/). While JavaScript/ReScript officially supports only specific flags (g, i, m, s, u, y, d), this lenient validation allows invalid flags like "xyz" to parse without error. Consider whether stricter validation would be beneficial, such as:

regex_flags: (_) => token.immediate(/[gimsuy]+/)

However, the current implementation may be intentionally permissive for forward compatibility with potential new flags. If this is the intended behavior, it's acceptable, but it's worth documenting this design decision.

Suggested change
regex_flags: (_) => token.immediate(/[a-z]+/),
// Only allow officially supported JS/ReScript regex flags: g, i, m, s, u, y, d
regex_flags: (_) => token.immediate(/[gimsuyd]+/),

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support regex literal from ReScript 12

1 participant

Comments