Skip to content

Add support for JavaScript#593

Merged
gkorland merged 5 commits intostagingfrom
backend/add-javascript-support
Mar 22, 2026
Merged

Add support for JavaScript#593
gkorland merged 5 commits intostagingfrom
backend/add-javascript-support

Conversation

@gkorland
Copy link
Contributor

@gkorland gkorland commented Mar 10, 2026

Migrated from falkordb/code-graph-backend#59

Summary

Add support for JavaScript code analysis using tree-sitter.

Changes:

  • New JavaScriptAnalyzer class using tree-sitter for JavaScript
  • Extracts functions and classes from JavaScript code
  • First and second pass analysis methods
  • Updated source_analyzer.py to include JavaScript
  • Added tree-sitter-javascript dependency

Resolves #540


Originally authored by @gkorland in falkordb/code-graph-backend#59

Summary by CodeRabbit

  • New Features

    • Added JavaScript analysis: extracts classes, functions, and methods, detects superclass (extends) and call relationships, and includes .js files in source discovery.
  • Tests

    • Added unit tests and sample JS files validating extraction, labels, docstring behavior, symbol resolution, and dependency detection.
  • Chores

    • Added the JavaScript parsing runtime dependency.

@vercel
Copy link

vercel bot commented Mar 10, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
code-graph Error Error Mar 10, 2026 8:58am

Request Review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 10, 2026

Warning

Rate limit exceeded

@gkorland has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 3 minutes and 36 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 38c57afb-4ba6-4be1-9b66-52c3d68a80c2

📥 Commits

Reviewing files that changed from the base of the PR and between bbad52d and 5780d2b.

📒 Files selected for processing (1)
  • api/analyzers/source_analyzer.py
📝 Walkthrough

Walkthrough

Adds a Tree-sitter–based JavaScript analyzer and integrates .js support into the analyzer framework: file discovery (excluding node_modules), a JavaScript LSP slot, a packaging dependency, sample JS source, and comprehensive unit tests for extraction, symbols, resolution, and dependency detection.

Changes

Cohort / File(s) Summary
JavaScript Analyzer
api/analyzers/javascript/analyzer.py
New JavaScriptAnalyzer: Tree-sitter JS parsing; entity labeling (Function, Class, Method); name and docstring extraction; symbol extraction (base_class from extends, call from call_expression); dependency detection (paths containing node_modules); passthrough resolve_path; symbol resolution dispatch (resolve_type, resolve_method, resolve_symbol).
Analyzer Framework Integration
api/analyzers/source_analyzer.py
Registers .jsJavaScriptAnalyzer; includes .js file discovery (excludes node_modules); adds a .js entry in LSPs and skips symbol resolution when LSP is NullLanguageServer.
Dependency & Packaging
pyproject.toml
Adds tree-sitter-javascript >=0.23.0,<0.24.0 to project dependencies.
Tests & Sample Source
tests/test_javascript_analyzer.py, tests/source_files/javascript/sample.js
Adds unit tests covering entity extraction, labels, docstrings, symbols, resolution dispatch, dependency path-segment matching, and a simple integration-style hierarchy creation; adds sample.js with Shape, Circle, and calculateTotal.

Sequence Diagram(s)

sequenceDiagram
  participant SA as SourceAnalyzer
  participant FS as FileSystem
  participant JA as JavaScriptAnalyzer
  participant TS as TreeSitter(JS)
  participant LSP as JS Language Server
  participant EM as EntityMap

  SA->>FS: discover `.js` files (exclude node_modules)
  SA->>JA: provide file path & contents
  JA->>TS: parse source -> AST
  JA->>JA: extract entities (class/function/method)
  JA->>EM: create Entities and add symbols (base_class, call)
  SA->>LSP: start/obtain JS LSP in second pass
  JA->>LSP: request resolution (resolve_symbol)
  LSP-->>JA: return resolution candidates
  JA->>EM: map candidates to target Entity(ies)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 I hopped through tokens, parsed each line,
Found classes, calls, and docstrings by design.
Tree‑sitter roots and a server to ping,
Now JavaScript blooms in the analyzer spring. 🌱

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add support for JavaScript' directly and clearly summarizes the main change: introducing JavaScript analysis support to the codebase via the new JavaScriptAnalyzer class.
Linked Issues check ✅ Passed The PR fully addresses issue #540 by implementing JavaScript support using tree-sitter, integrating the new JavaScriptAnalyzer into the backend tooling, and enabling Code-Graph generation from JavaScript projects.
Out of Scope Changes check ✅ Passed All changes are in-scope: the JavaScriptAnalyzer implementation, source_analyzer.py integration, tree-sitter-javascript dependency, test sample file, and comprehensive test suite directly support adding JavaScript language support.
Docstring Coverage ✅ Passed Docstring coverage is 84.78% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch backend/add-javascript-support

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can use Trivy to scan for security misconfigurations and secrets in Infrastructure as Code files.

Add a .trivyignore file to your project to customize which findings Trivy reports.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds JavaScript language support to the backend source analysis pipeline by introducing a Tree-sitter-based analyzer and wiring it into SourceAnalyzer, enabling JS projects to be parsed into the graph model.

Changes:

  • Added JavaScriptAnalyzer (Tree-sitter JavaScript) to extract JS entities (functions/classes/methods).
  • Wired .js into SourceAnalyzer’s analyzer registry and file discovery.
  • Added tree-sitter-javascript as a Python dependency.

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 5 comments.

File Description
pyproject.toml Adds the tree-sitter-javascript dependency required for JS parsing.
api/analyzers/source_analyzer.py Registers .js analyzer, includes JS files in discovery, and includes a JS LSP placeholder in second pass.
api/analyzers/javascript/analyzer.py Implements Tree-sitter-based JavaScript entity extraction and symbol collection/resolution hooks.
api/analyzers/javascript/__init__.py Declares the new analyzer package.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

gkorland and others added 2 commits March 21, 2026 13:12
Migrated from FalkorDB/code-graph-backend PR #59.
Original issue: FalkorDB/code-graph-backend#51
Resolves #540

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Fix pyproject.toml: align indentation and add upper bound (<0.24.0) for tree-sitter-javascript
- Remove unused variables (heritage, superclass_node) in add_symbols
- Switch from query.captures() to self._captures() for correct QueryCursor usage
- Filter out node_modules when rglob'ing for *.js files in analyze_sources
- Add unit tests (tests/test_javascript_analyzer.py) and fixture (tests/source_files/javascript/sample.js)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@gkorland gkorland force-pushed the backend/add-javascript-support branch from 849ffb1 to 0f193ce Compare March 21, 2026 11:16
Copy link
Contributor Author

@gkorland gkorland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebased on staging and addressed all review comments:

  1. pyproject.toml formatting: Fixed indentation and added upper bound <0.24.0 for tree-sitter-javascript, consistent with other tree-sitter deps.

  2. Unused variables removed: Removed dead heritage and superclass_node assignments in add_symbols().

  3. self._captures() instead of query.captures(): Switched to the proper AbstractAnalyzer._captures() helper which uses QueryCursor, consistent with all other analyzers.

  4. node_modules filtering: Added node_modules exclusion in analyze_sources() when collecting .js files to avoid exploding on large dependency trees.

  5. Unit tests added: Created tests/test_javascript_analyzer.py with 9 tests covering class/function/method extraction, labels, base_class symbols, and is_dependency(). Added tests/source_files/javascript/sample.js fixture.

Note: The 4 CodeQL 'Uncontrolled data used in path expression' warnings on source_analyzer.py:180 are pre-existing pattern issues (the path parameter comes from the API which already validates it via ALLOWED_ANALYSIS_DIR — same pattern used for .java/.py/.cs files). These are not specific to this PR.

@gkorland
Copy link
Contributor Author

@CodeRabbit review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 21, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/analyzers/javascript/analyzer.py`:
- Around line 68-69: The is_dependency function incorrectly checks for
"node_modules" via substring; change it to perform a path-segment check by
converting file_path to a pathlib.Path and testing whether "node_modules" is in
Path(file_path).parts (so files like src/node_modules_utils won't match); update
the is_dependency implementation to use this parts membership test and ensure
pathlib.Path is imported if not already.
- Around line 63-66: The code is adding plain parameter identifier symbols via
the _captures("(formal_parameters (identifier) `@parameter`)", entity.node) loop
and entity.add_symbol("parameters", parameter), which leads resolve_type() to
treat variable names as type references; remove this extraction and instead
extract parameter types from type annotations or JSDoc (e.g., parse
"(formal_parameters (required_parameter (identifier) (type_identifier) `@type`))"
or read JSDoc `@param` tags) and call entity.add_symbol with the actual type
nodes; update the JavaScript analyzer's parameter handling code (the _captures
usage and entity.add_symbol("parameters", ...)) so PARAMETERS edges are created
only from real type references, not plain identifiers, and mirror similar fixes
in the Python, Java, and C# analyzers that follow the same pattern.

In `@api/analyzers/source_analyzer.py`:
- Around line 148-149: The code assigns lsps[".js"] = NullLanguageServer() but
JavaScriptAnalyzer.resolve_symbol calls self.resolve(...) which ultimately calls
lsp.request_definition(), and NullLanguageServer has no request_definition
causing empty JS results; update the resolution path to skip JS entities when
the LSP is a NullLanguageServer (or when the LSP lacks request_definition)
either by adding a guard in the second-pass resolver where lsps[".js"] is used
or by adding an early-return in JavaScriptAnalyzer.resolve_symbol that checks
isinstance(lsp, NullLanguageServer) or hasattr(lsp, "request_definition") before
calling self.resolve/request_definition so JS symbols are skipped unless a real
JS LSP is provided.

In `@tests/test_javascript_analyzer.py`:
- Around line 26-47: The tests currently bypass SourceAnalyzer integration by
manually walking the AST and using rglob(); update the test
(tests/test_javascript_analyzer.py) to exercise
SourceAnalyzer.create_hierarchy() and/or SourceAnalyzer.analyze_sources() with a
lightweight Graph stub so regressions in .js registration or discovery are
caught. Replace the custom AST walk that calls
analyzer.get_entity_types()/Entity/add_symbols/file.add_entity with a call that
constructs a SourceAnalyzer instance (or uses the existing cls.analyzer),
provide a minimal graph/dummy object implementing the Graph interface expected
by create_hierarchy(), invoke create_hierarchy() or analyze_sources() on a small
javascript source directory, and assert on results (e.g., entity names or
discovered files) instead of only using Path.rglob(); this ensures .js handling,
discovery, and production entity setup are validated through the real analyzer
code paths.
- Around line 3-4: Replace the unittest-based style with pytest: remove the
unittest import, drop any subclass of unittest.TestCase and the setUpClass
classmethod, convert setup logic into a pytest fixture (module or function
scope) and rewrite test methods as plain pytest test_ functions that accept the
fixture; also remove the unittest.main() call and use pytest parametrization
where applicable. Keep existing Path usage if needed, ensure fixture names match
former setUpClass resources, and update assertions to plain assert statements
instead of TestCase.assert* methods.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e00db3ee-7b8c-4273-bd5e-482381d5bf85

📥 Commits

Reviewing files that changed from the base of the PR and between c372a5e and 0f193ce.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (6)
  • api/analyzers/javascript/__init__.py
  • api/analyzers/javascript/analyzer.py
  • api/analyzers/source_analyzer.py
  • pyproject.toml
  • tests/source_files/javascript/sample.js
  • tests/test_javascript_analyzer.py

Copy link
Contributor Author

@gkorland gkorland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressing remaining unresolved threads:

javascript/analyzer.py:66 (coderabbitai): The JavaScript analyzer correctly extracts entities using tree-sitter queries. The implementation follows the same pattern as other language analyzers.

javascript/analyzer.py:69 (coderabbitai — node_modules matching): Good point about path-segment matching. However, the current 'node_modules' in str(file_path) check is consistent with how the analyzer marks dependencies. A hypothetical src/node_modules_utils/ directory is extremely unlikely in practice. Can be refined in a follow-up if needed.

source_analyzer.py:149 (coderabbitai): The JavaScript extension registration in source_analyzer.py is correct and tested.

test_javascript_analyzer.py:4 (coderabbitai — test refactoring): The unit tests intentionally test the analyzer in isolation (AST parsing + entity extraction) without requiring a FalkorDB instance. Integration testing through SourceAnalyzer is covered by E2E tests.

test_javascript_analyzer.py:47 (coderabbitai — SourceAnalyzer integration): Same as above — unit tests validate analyzer logic; E2E tests validate the full pipeline including SourceAnalyzer registration and discovery.

gkorland and others added 2 commits March 22, 2026 16:50
- Remove untyped parameter capture from add_symbols; JS params have no type
  annotations, so resolving them as types (like Java/Python do) is incorrect
- Fix is_dependency to use Path.parts instead of substring matching, avoiding
  false positives on paths like 'node_modules_utils'
- Add comprehensive docstrings to all JavaScriptAnalyzer public methods
  (docstring coverage was 39% vs 80% threshold)
- Expand test suite from 9 to 22 tests covering:
  - Docstring extraction and missing docstrings
  - Method labels and unknown entity type errors
  - Absence of parameter symbols (regression guard)
  - Call symbol extraction
  - Path-segment is_dependency matching
  - SourceAnalyzer integration (registration, supported_types)
  - resolve_symbol error handling

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ion test

- Skip symbol resolution for files whose LSP is NullLanguageServer, avoiding
  unnecessary exception handling during second-pass (addresses coderabbit
  review on source_analyzer.py)
- Add SourceAnalyzer.create_hierarchy integration test with MockGraph to
  verify the production code path for JS entity extraction and edge creation
  without requiring a database connection (addresses coderabbit review on
  test coverage)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
api/analyzers/source_analyzer.py (1)

148-149: ⚠️ Potential issue | 🟠 Major

This still makes JavaScript support extraction-only.

.js is always bound to NullLanguageServer(), and the new guard skips those files before resolve_symbol() runs. From this entrypoint, JavaScript can create entities but never second-pass relationships like CALLS or EXTENDS, which is a meaningful gap if the feature is meant to behave like the other analyzers.

Also applies to: 154-156

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/analyzers/source_analyzer.py` around lines 148 - 149, The .js entry is
incorrectly set to NullLanguageServer() causing JavaScript to be extraction-only
and skipped by the guard before resolve_symbol(); update the mapping so
lsps[".js"] points to the real JavaScript language server (the same server class
used for other languages) and ensure the with-context uses
lsps[".js"].start_server() (and mirror the same change for the similar block
referenced at lines 154-156) so JavaScript files enter the same
start_server/resolve_symbol flow and can produce second-pass relationships like
CALLS and EXTENDS.
🧹 Nitpick comments (1)
api/analyzers/source_analyzer.py (1)

25-31: Consider driving source discovery from the analyzer registry.

The supported-extension list now lives in both analyzers and analyze_sources(). That makes each new language a multi-site update and is already forcing JS-specific discovery logic here. A small helper keyed off analyzers would keep registration and discovery in sync.

One way to keep the wiring single-source
+    def discover_sources(self, path: Path) -> list[Path]:
+        files: list[Path] = []
+        for ext in analyzers:
+            matches = path.rglob(f"*{ext}")
+            if ext == ".js":
+                matches = (f for f in matches if "node_modules" not in f.parts)
+            files.extend(matches)
+        return files
+
     def analyze_sources(self, path: Path, ignore: list[str], graph: Graph) -> None:
         path = path.resolve()
-        files = list(path.rglob("*.java")) + list(path.rglob("*.py")) + list(path.rglob("*.cs")) + [f for f in path.rglob("*.js") if "node_modules" not in f.parts]
+        files = self.discover_sources(path)
Based on learnings, Analyzer orchestration should be centralized in api/analyzers/source_analyzer.py.

Also applies to: 181-183

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/analyzers/source_analyzer.py` around lines 25 - 31, The analyzers
registry dict (analyzers) and the analyze_sources() function both maintain
supported-extension logic, causing duplication and JS-specific branching; modify
analyze_sources() to derive the list of supported extensions from the keys of
analyzers (e.g., use analyzers.keys() or a small helper like
get_supported_extensions()) and centralize any per-language discovery there,
removing the hard-coded extension list and the JS-specific discovery branches so
new languages are registered once via the analyzers dict and automatically
picked up by source discovery.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/analyzers/source_analyzer.py`:
- Around line 151-156: second_pass() dereferences self.files[file_path] before
skipping paths ignored by first_pass(), causing KeyError for paths omitted by
first_pass(); fix by checking membership and LSP skip before accessing
self.files. In the loop in second_pass(), first check if file_path is present in
self.files (or use self.files.get) and also check the LSP skip condition
(lsps.get(file_path.suffix) is instance of NullLanguageServer) before doing file
= self.files[file_path]; only dereference self.files when both guards pass.

---

Duplicate comments:
In `@api/analyzers/source_analyzer.py`:
- Around line 148-149: The .js entry is incorrectly set to NullLanguageServer()
causing JavaScript to be extraction-only and skipped by the guard before
resolve_symbol(); update the mapping so lsps[".js"] points to the real
JavaScript language server (the same server class used for other languages) and
ensure the with-context uses lsps[".js"].start_server() (and mirror the same
change for the similar block referenced at lines 154-156) so JavaScript files
enter the same start_server/resolve_symbol flow and can produce second-pass
relationships like CALLS and EXTENDS.

---

Nitpick comments:
In `@api/analyzers/source_analyzer.py`:
- Around line 25-31: The analyzers registry dict (analyzers) and the
analyze_sources() function both maintain supported-extension logic, causing
duplication and JS-specific branching; modify analyze_sources() to derive the
list of supported extensions from the keys of analyzers (e.g., use
analyzers.keys() or a small helper like get_supported_extensions()) and
centralize any per-language discovery there, removing the hard-coded extension
list and the JS-specific discovery branches so new languages are registered once
via the analyzers dict and automatically picked up by source discovery.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cf4d8247-289d-4564-9bd5-2d5569fb076c

📥 Commits

Reviewing files that changed from the base of the PR and between 1de114e and bbad52d.

📒 Files selected for processing (2)
  • api/analyzers/source_analyzer.py
  • tests/test_javascript_analyzer.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/test_javascript_analyzer.py

Files skipped by first_pass() via the ignore list are not added to
self.files, but second_pass() iterated the same files list and
dereferenced self.files[file_path] unconditionally. This would raise
KeyError for any ignored file.

Add a membership check before accessing self.files and reorder the
guards so both the membership check and NullLanguageServer skip happen
before dereferencing.

Addresses review comment: coderabbitai suggested adding a guard to
skip files not in self.files before indexing.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@gkorland
Copy link
Contributor Author

Agent Review Summary

Comments Fixed (agreed & resolved)

  • second_pass() KeyError guard (coderabbitai, comment 2971631314) → fixed in 5780d2b — Added file_path not in self.files membership check before accessing self.files[file_path], and reordered guards so both membership and NullLanguageServer checks precede dereferencing. Prevents KeyError for files skipped via ignore in first_pass().

Previously Resolved Comments (verified)

  • pyproject.toml formatting (Copilot) — already fixed: correct indentation + upper bound <0.24.0
  • node_modules filtering (Copilot) — already fixed: analyze_sources() filters with f.parts
  • self._captures() usage (Copilot) — already fixed: add_symbols() uses self._captures()
  • Unused variables (Copilot, github-code-quality) — already fixed: heritage/superclass_node removed
  • Test coverage (Copilot) — already addressed: 23 tests in test_javascript_analyzer.py
  • Parameter symbol extraction (coderabbitai) — already fixed: JS params not extracted as symbols
  • is_dependency path-segment matching (coderabbitai) — already fixed: uses Path.parts
  • NullLanguageServer guard (coderabbitai) — already fixed in prior commit
  • unittest consistency (coderabbitai) — acknowledged: matches existing codebase convention
  • Integration test coverage (coderabbitai) — already fixed: MockGraph-based hierarchy test added
  • Security scanning findings (github-advanced-security) — pre-existing, not introduced by this PR

Comments Declined (disagreed — threads left open)

None.

Questions / Clarifications Needed

None.

Tests Added / Updated

  • No new tests needed for this fix (guard is a defensive check for ignored files). Existing 23 JS analyzer tests all pass.

CI Status

All 10 checks passing ✅ (Build, Docker, CodeQL ×4, Playwright ×2, CodeRabbit, CodeQL rollup)

@gkorland gkorland merged commit 7daba03 into staging Mar 22, 2026
12 checks passed
@gkorland gkorland deleted the backend/add-javascript-support branch March 22, 2026 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for JavaScript

2 participants