feat: add Neo4j lineage exporter for dbt manifest metadata by SaranyaVaitheeswaran · Pull Request #990 · elementary-data/dbt-data-reliability

SaranyaVaitheeswaran · 2026-04-22T18:34:37Z

Summary

Adds a Neo4j lineage exporter that reads dbt manifest.json and loads
the full lineage graph into Neo4j as a property graph.

Motivation

Elementary already captures rich lineage metadata via dbt artifacts.
This exporter makes that lineage available in Neo4j, enabling:

Impact analysis: which models are affected by a source schema change?
Root cause detection: trace data quality issues upstream
Graph traversal queries across the full dbt DAG

Changes

elementary_neo4j/neo4j_config.py — connection config with env var support
elementary_neo4j/neo4j_exporter.py — main exporter class
tests/test_neo4j_exporter.py — 8 unit tests, all passing
README_neo4j.md — usage docs with Cypher query examples

Example Usage

from elementary_neo4j.neo4j_config import Neo4jConfig
from elementary_neo4j.neo4j_exporter import Neo4jLineageExporter

config = Neo4jConfig.from_env()
exporter = Neo4jLineageExporter(config)
result = exporter.export("target/manifest.json")

{"nodes_exported": 42, "dependencies_exported": 67}

Tests

8 unit tests covering node extraction, dependency extraction,
manifest loading, and full export pipeline. All passing.

Summary by CodeRabbit

New Features
- Neo4j Lineage Exporter: export dbt lineage into Neo4j as a property graph with nodes and FEEDS_INTO relationships.
- Environment-configurable Neo4j connection (URI/username/password/database) with sensible defaults.
Documentation
- Added README with installation, usage examples, environment variables, and sample Cypher for impact analysis.
Chores
- Declared Neo4j client dependency (neo4j >=5.0,<7.0).
Tests
- Added unit tests covering manifest handling, extraction, export flow, and error cases.

- Add Neo4jLineageExporter class to export dbt nodes and dependencies to Neo4j - Supports models, sources, seeds and snapshots as DbtNode graph nodes - Creates FEEDS_INTO relationships for downstream impact analysis - Add Neo4jConfig dataclass with environment variable support - Add 8 unit tests, all passing - Add README with usage examples and Cypher query examples

github-actions · 2026-04-22T18:34:48Z

👋 @SaranyaVaitheeswaran
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in the elementary repository.

coderabbitai · 2026-04-22T18:39:37Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cc7e22dc-f72f-4dcd-bb2c-63e4076cccda

📥 Commits

Reviewing files that changed from the base of the PR and between 2186788 and 8ade214.

📒 Files selected for processing (3)

README_neo4j.md
elementary_neo4j/neo4j_exporter.py
elementary_neo4j/requirements.txt

✅ Files skipped from review due to trivial changes (2)

elementary_neo4j/requirements.txt
README_neo4j.md

🚧 Files skipped from review as they are similar to previous changes (1)

elementary_neo4j/neo4j_exporter.py

📝 Walkthrough

Walkthrough

Adds a new Neo4j lineage exporter that reads a dbt manifest.json, maps models/sources/seeds/snapshots to DbtNode nodes, creates FEEDS_INTO relationships, provides a Neo4jConfig.from_env() helper, CLI/usage docs, and unit tests for extraction and export flows.

Changes

Cohort / File(s)	Summary
Documentation `README_neo4j.md`	New README describing the Elementary Neo4j Lineage Exporter: graph model (`DbtNode`, `FEEDS_INTO`), install/usage examples, env vars (`NEO4J_*`), sample Cypher, and test command.
Configuration Module `elementary_neo4j/neo4j_config.py`	New `Neo4jConfig` dataclass with `uri`, `username`, `password`, optional `database` and `from_env()` class method reading `NEO4J_URI`, `NEO4J_USERNAME`, `NEO4J_PASSWORD`, `NEO4J_DATABASE` with defaults.
Export Implementation `elementary_neo4j/neo4j_exporter.py`	New `Neo4jLineageExporter` that: loads manifest from path (raises FileNotFoundError if missing), extracts nodes (models/seeds/snapshots + sources) and dependencies (filtering to exported set), upserts nodes via `MERGE`/`SET`, upserts `FEEDS_INTO` relationships, provides `export(manifest_path)` and `close()`.
Tests `tests/test_neo4j_exporter.py`	New unit tests covering `Neo4jConfig`, `Neo4jLineageExporter` extraction logic, dependency direction, manifest loading error path, and mocked persistence (session.run assertions); uses fixtures and mocks for Neo4j driver and file access.
Dependencies `elementary_neo4j/requirements.txt`	Adds `neo4j>=5.0,<7.0` dependency.

Sequence Diagram

sequenceDiagram
    actor User
    participant App as Application
    participant Exporter as Neo4jLineageExporter
    participant FS as FileSystem
    participant Neo4j as Neo4jDB

    User->>App: invoke export(manifest_path)
    App->>Exporter: instantiate with Neo4jConfig
    Exporter->>Neo4j: open driver/session

    App->>Exporter: export(manifest_path)
    Exporter->>FS: read manifest.json
    FS-->>Exporter: return manifest data

    Exporter->>Exporter: extract_nodes()
    Exporter->>Exporter: extract_dependencies()

    Exporter->>Neo4j: export_nodes(nodes) -- MERGE/SET per node
    Neo4j-->>Exporter: nodes upserted

    Exporter->>Neo4j: export_dependencies(deps) -- MATCH + MERGE FEEDS_INTO
    Neo4j-->>Exporter: relationships upserted

    Exporter-->>App: return {nodes_exported, dependencies_exported}
    App->>Exporter: close()
    Exporter->>Neo4j: close driver

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I hopped through manifests, tidy and spry,
Mapping models and sources beneath the sky.
Into Neo4j I gently scribed each line,
FEEDS_INTO paths now brightly shine.
Hop, export, close — lineage in a vine.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 30.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly and concisely describes the main change: adding a Neo4j lineage exporter for dbt manifest metadata. It is specific, directly related to the changeset, and accurately reflects the primary functionality introduced across all modified files.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (1)

tests/test_neo4j_exporter.py (1)
24-61: Cover seed/snapshot extraction and relationship writes directly.

The fixture only exercises model/source nodes, and the full export test patches out export_dependencies. Add coverage for seed/snapshot resources plus a direct export_dependencies assertion so relationship direction and Cypher persistence can’t regress silently.

Also applies to: 121-125
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_neo4j_exporter.py` around lines 24 - 61, Update the
sample_manifest fixture to include seed and snapshot resource entries (e.g., add
keys like "seed.my_project.seed_customers" and "snapshot.my_project.snap_orders"
with appropriate fields mirroring existing nodes/sources) so tests exercise
seed/snapshot extraction; then in the test that patches
exporter.export_dependencies (or the test function that uses sample_manifest)
add a direct assertion that export_dependencies was called with the expected
relationship tuples (verify direction and types) and/or assert the Cypher
relationship write was invoked (mock/check the method that persists Cypher in
Neo4j exporter, e.g., export_dependencies and the Cypher persistence helper) to
ensure relationships are actually written and cannot regress.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@elementary_neo4j/neo4j_exporter.py`:
- Line 4: The import list in neo4j_exporter.py currently includes an unused
symbol Optional; remove Optional from the typing import (change "from typing
import Any, Dict, List, Optional" to only import the actually used names such as
Any, Dict, List) so the module no longer triggers the F401 unused-import Flake8
error.
- Around line 14-17: The class docstring claims the exporter "Creates nodes for
models, sources, and columns" which is inaccurate because extract_nodes only
generates dbt model/seed/snapshot/source nodes; update the class docstring to
reflect the implemented graph model (models/seeds/snapshots/sources and their
relationships) or, if column nodes are intended, implement column extraction in
extract_nodes/related methods; reference extract_nodes and the class docstring
to ensure the text and implementation stay consistent.
- Around line 64-73: extract_dependencies is including dependencies for manifest
entries that were never exported (tests/other resource types), causing
export_dependencies to try to create relationships with missing DbtNode
endpoints; update extract_dependencies to only emit dependency dicts when both
the upstream_id and unique_id are part of the actual exported node set (e.g.,
compare against the set of exported DbtNode unique_ids or the manifest entries
filtered by the same export criteria). Locate the function extract_dependencies
and either accept an additional parameter (exported_node_ids) or compute the
exported set using the same filtering logic used by export_dependencies/DbtNode
creation, then only append dependencies when upstream_id in exported_node_ids
and unique_id in exported_node_ids. Ensure identifiers referenced are
extract_dependencies, export_dependencies, and DbtNode so the change aligns with
existing export behavior.
- Line 6: The file elementary_neo4j/neo4j_exporter.py imports GraphDatabase from
the external package neo4j, so add neo4j to the project's declared dependencies
(e.g., add "neo4j" with an appropriate version constraint such as
"neo4j>=5.0,<6.0") in your packaging metadata (pyproject.toml under
[project.dependencies] or setup.cfg install_requires, and/or requirements.txt)
so installations pull the package automatically; ensure the version chosen is
compatible with the code that uses GraphDatabase and update CI/install steps to
validate the new dependency.

In `@README_neo4j.md`:
- Around line 72-75: The example impact-analysis Cypher query uses name matching
which is not unique; update the MATCH to locate the source by unique_id (the
docs' primary key) instead of name and return impacted.unique_id to avoid
ambiguous results — e.g., match (source:DbtNode {unique_id:
"<unique_id_here>"})-[:FEEDS_INTO*]->(impacted) and return impacted.unique_id,
impacted.resource_type; ensure the docs reference the DbtNode label, the
unique_id property, and the FEEDS_INTO relationship so readers use the canonical
identifier.

---

Nitpick comments:
In `@tests/test_neo4j_exporter.py`:
- Around line 24-61: Update the sample_manifest fixture to include seed and
snapshot resource entries (e.g., add keys like "seed.my_project.seed_customers"
and "snapshot.my_project.snap_orders" with appropriate fields mirroring existing
nodes/sources) so tests exercise seed/snapshot extraction; then in the test that
patches exporter.export_dependencies (or the test function that uses
sample_manifest) add a direct assertion that export_dependencies was called with
the expected relationship tuples (verify direction and types) and/or assert the
Cypher relationship write was invoked (mock/check the method that persists
Cypher in Neo4j exporter, e.g., export_dependencies and the Cypher persistence
helper) to ensure relationships are actually written and cannot regress.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b3513345-dde9-4fe7-bed2-7de914660721

📥 Commits

Reviewing files that changed from the base of the PR and between 10e92c4 and 2186788.

📒 Files selected for processing (6)

README_neo4j.md
elementary_neo4j/__init__.py
elementary_neo4j/neo4j_config.py
elementary_neo4j/neo4j_exporter.py
tests/__init__.py
tests/test_neo4j_exporter.py

coderabbitai · 2026-04-22T18:44:28Z

+from pathlib import Path
+from typing import Any, Dict, List, Optional
+
+from neo4j import GraphDatabase


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Description: Verify that the Neo4j Python driver is declared in dependency metadata. set -euo pipefail mapfile -t metadata_files < <( fd -HI '^(pyproject\.toml|setup\.py|setup\.cfg|requirements.*|Pipfile|poetry\.lock)$' ) printf 'Dependency metadata files found:\n' printf ' - %s\n' "${metadata_files[@]:-}" if ((${`#metadata_files`[@]})); then rg -n -i -C2 '\bneo4j\b' "${metadata_files[@]}" || true fi

Repository: elementary-data/dbt-data-reliability

Length of output: 136

🏁 Script executed:

#!/bin/bash # Find all dependency metadata files fd -HI '^(pyproject\.toml|setup\.py|setup\.cfg|requirements.*|Pipfile|poetry\.lock)$' > /tmp/metadata_files.txt echo "Dependency metadata files found:" cat /tmp/metadata_files.txt || echo "(none)" echo "" echo "Searching for 'neo4j' in dependency files:" if [ -s /tmp/metadata_files.txt ]; then cat /tmp/metadata_files.txt | xargs rg -n -i -C2 '\bneo4j\b' || echo "No neo4j found in dependency files" fi

Repository: elementary-data/dbt-data-reliability

Length of output: 226

🏁 Script executed:

# Check root level for common Python project files echo "=== Files in root directory ===" ls -la | grep -E '(pyproject|setup|requirements|poetry|Pipfile)' || echo "No common files at root" echo "" echo "=== All occurrences of 'neo4j' in the repository ===" rg -i -n 'neo4j' --max-count=50 echo "" echo "=== Content of integration_tests/requirements.txt ===" cat integration_tests/requirements.txt echo "" echo "=== Search for .txt files with dependencies ===" fd -i 'requirements.*\.txt$' -x cat -n {}

Repository: elementary-data/dbt-data-reliability

Length of output: 3994

🏁 Script executed:

# Check for any Python project metadata files with different search echo "=== All Python config files in repo ===" find . -maxdepth 2 -type f $ -name "pyproject.toml" -o -name "setup.py" -o -name "setup.cfg" -o -name "MANIFEST.in" $ -not -path "./.*" echo "" echo "=== Check if elementary_neo4j is a package with metadata ===" ls -la elementary_neo4j/ echo "" echo "=== Check for any __init__.py that might have version/dependencies ===" cat elementary_neo4j/__init__.py 2>/dev/null || echo "No __init__.py found"

Repository: elementary-data/dbt-data-reliability

Length of output: 563

Add neo4j to project dependencies.

The module imports neo4j at import time (line 6), but the package is not declared in any dependency metadata. This means environments installing from project metadata will fail to install neo4j automatically, despite README-only instructions.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@elementary_neo4j/neo4j_exporter.py` at line 6, The file elementary_neo4j/neo4j_exporter.py imports GraphDatabase from the external package neo4j, so add neo4j to the project's declared dependencies (e.g., add "neo4j" with an appropriate version constraint such as "neo4j>=5.0,<6.0") in your packaging metadata (pyproject.toml under [project.dependencies] or setup.cfg install_requires, and/or requirements.txt) so installations pull the package automatically; ensure the version chosen is compatible with the code that uses GraphDatabase and update CI/install steps to validate the new dependency.

coderabbitai · 2026-04-22T18:44:28Z

+    def export(self, manifest_path: str):
+        """Full export pipeline — nodes + dependencies."""
+        logger.info(f"Loading manifest from {manifest_path}")
+        manifest = self.load_manifest(manifest_path)
+        nodes = self.extract_nodes(manifest)
+        dependencies = self.extract_dependencies(manifest)
+        self.export_nodes(nodes)
+        self.export_dependencies(dependencies)
+        logger.info("Neo4j lineage export complete")
+        return {
+            "nodes_exported": len(nodes),
+            "dependencies_exported": len(dependencies)
+        }


⚠️ Potential issue | 🟠 Major

Handle stale graph state on repeated exports.

The export only MERGEs current nodes/relationships. If a model is removed or a dependency changes, old DbtNodes and FEEDS_INTO edges remain in Neo4j, causing false impact-analysis results. Add a scoped replace/cleanup strategy or document that this is append-only.

- Remove unused Optional import - Fix extract_dependencies to filter non-exported node types - Fix class docstring to reflect actual graph model - Fix README Cypher query to use unique_id instead of name - Add requirements.txt with neo4j dependency

SaranyaVaitheeswaran requested a deployment to elementary_test_env April 22, 2026 18:34 — with GitHub Actions Waiting

coderabbitai Bot reviewed Apr 22, 2026

View reviewed changes

SaranyaVaitheeswaran requested a deployment to elementary_test_env April 22, 2026 19:39 — with GitHub Actions Waiting

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Neo4j lineage exporter for dbt manifest metadata#990

feat: add Neo4j lineage exporter for dbt manifest metadata#990
SaranyaVaitheeswaran wants to merge 2 commits intoelementary-data:masterfrom
SaranyaVaitheeswaran:feature/neo4j-lineage-exporter

SaranyaVaitheeswaran commented Apr 22, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

github-actions Bot commented Apr 22, 2026

Uh oh!

coderabbitai Bot commented Apr 22, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot Apr 22, 2026

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Apr 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SaranyaVaitheeswaran commented Apr 22, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Changes

Example Usage

{"nodes_exported": 42, "dependencies_exported": 67}

Tests

Summary by CodeRabbit

Uh oh!

github-actions Bot commented Apr 22, 2026

Uh oh!

coderabbitai Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

SaranyaVaitheeswaran commented Apr 22, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 22, 2026 •

edited

Loading