Skip to content

feat: add Neo4j lineage exporter for dbt manifest metadata#990

Open
SaranyaVaitheeswaran wants to merge 2 commits intoelementary-data:masterfrom
SaranyaVaitheeswaran:feature/neo4j-lineage-exporter
Open

feat: add Neo4j lineage exporter for dbt manifest metadata#990
SaranyaVaitheeswaran wants to merge 2 commits intoelementary-data:masterfrom
SaranyaVaitheeswaran:feature/neo4j-lineage-exporter

Conversation

@SaranyaVaitheeswaran
Copy link
Copy Markdown

@SaranyaVaitheeswaran SaranyaVaitheeswaran commented Apr 22, 2026

Summary

Adds a Neo4j lineage exporter that reads dbt manifest.json and loads
the full lineage graph into Neo4j as a property graph.

Motivation

Elementary already captures rich lineage metadata via dbt artifacts.
This exporter makes that lineage available in Neo4j, enabling:

  • Impact analysis: which models are affected by a source schema change?
  • Root cause detection: trace data quality issues upstream
  • Graph traversal queries across the full dbt DAG

Changes

  • elementary_neo4j/neo4j_config.py — connection config with env var support
  • elementary_neo4j/neo4j_exporter.py — main exporter class
  • tests/test_neo4j_exporter.py — 8 unit tests, all passing
  • README_neo4j.md — usage docs with Cypher query examples

Example Usage

from elementary_neo4j.neo4j_config import Neo4jConfig
from elementary_neo4j.neo4j_exporter import Neo4jLineageExporter

config = Neo4jConfig.from_env()
exporter = Neo4jLineageExporter(config)
result = exporter.export("target/manifest.json")

{"nodes_exported": 42, "dependencies_exported": 67}

Tests

8 unit tests covering node extraction, dependency extraction,
manifest loading, and full export pipeline. All passing.

Summary by CodeRabbit

  • New Features

    • Neo4j Lineage Exporter: export dbt lineage into Neo4j as a property graph with nodes and FEEDS_INTO relationships.
    • Environment-configurable Neo4j connection (URI/username/password/database) with sensible defaults.
  • Documentation

    • Added README with installation, usage examples, environment variables, and sample Cypher for impact analysis.
  • Chores

    • Declared Neo4j client dependency (neo4j >=5.0,<7.0).
  • Tests

    • Added unit tests covering manifest handling, extraction, export flow, and error cases.

- Add Neo4jLineageExporter class to export dbt nodes and dependencies to Neo4j
- Supports models, sources, seeds and snapshots as DbtNode graph nodes
- Creates FEEDS_INTO relationships for downstream impact analysis
- Add Neo4jConfig dataclass with environment variable support
- Add 8 unit tests, all passing
- Add README with usage examples and Cypher query examples
@github-actions
Copy link
Copy Markdown
Contributor

👋 @SaranyaVaitheeswaran
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in the elementary repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 22, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cc7e22dc-f72f-4dcd-bb2c-63e4076cccda

📥 Commits

Reviewing files that changed from the base of the PR and between 2186788 and 8ade214.

📒 Files selected for processing (3)
  • README_neo4j.md
  • elementary_neo4j/neo4j_exporter.py
  • elementary_neo4j/requirements.txt
✅ Files skipped from review due to trivial changes (2)
  • elementary_neo4j/requirements.txt
  • README_neo4j.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • elementary_neo4j/neo4j_exporter.py

📝 Walkthrough

Walkthrough

Adds a new Neo4j lineage exporter that reads a dbt manifest.json, maps models/sources/seeds/snapshots to DbtNode nodes, creates FEEDS_INTO relationships, provides a Neo4jConfig.from_env() helper, CLI/usage docs, and unit tests for extraction and export flows.

Changes

Cohort / File(s) Summary
Documentation
README_neo4j.md
New README describing the Elementary Neo4j Lineage Exporter: graph model (DbtNode, FEEDS_INTO), install/usage examples, env vars (NEO4J_*), sample Cypher, and test command.
Configuration Module
elementary_neo4j/neo4j_config.py
New Neo4jConfig dataclass with uri, username, password, optional database and from_env() class method reading NEO4J_URI, NEO4J_USERNAME, NEO4J_PASSWORD, NEO4J_DATABASE with defaults.
Export Implementation
elementary_neo4j/neo4j_exporter.py
New Neo4jLineageExporter that: loads manifest from path (raises FileNotFoundError if missing), extracts nodes (models/seeds/snapshots + sources) and dependencies (filtering to exported set), upserts nodes via MERGE/SET, upserts FEEDS_INTO relationships, provides export(manifest_path) and close().
Tests
tests/test_neo4j_exporter.py
New unit tests covering Neo4jConfig, Neo4jLineageExporter extraction logic, dependency direction, manifest loading error path, and mocked persistence (session.run assertions); uses fixtures and mocks for Neo4j driver and file access.
Dependencies
elementary_neo4j/requirements.txt
Adds neo4j>=5.0,<7.0 dependency.

Sequence Diagram

sequenceDiagram
    actor User
    participant App as Application
    participant Exporter as Neo4jLineageExporter
    participant FS as FileSystem
    participant Neo4j as Neo4jDB

    User->>App: invoke export(manifest_path)
    App->>Exporter: instantiate with Neo4jConfig
    Exporter->>Neo4j: open driver/session

    App->>Exporter: export(manifest_path)
    Exporter->>FS: read manifest.json
    FS-->>Exporter: return manifest data

    Exporter->>Exporter: extract_nodes()
    Exporter->>Exporter: extract_dependencies()

    Exporter->>Neo4j: export_nodes(nodes) -- MERGE/SET per node
    Neo4j-->>Exporter: nodes upserted

    Exporter->>Neo4j: export_dependencies(deps) -- MATCH + MERGE FEEDS_INTO
    Neo4j-->>Exporter: relationships upserted

    Exporter-->>App: return {nodes_exported, dependencies_exported}
    App->>Exporter: close()
    Exporter->>Neo4j: close driver
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I hopped through manifests, tidy and spry,
Mapping models and sources beneath the sky.
Into Neo4j I gently scribed each line,
FEEDS_INTO paths now brightly shine.
Hop, export, close — lineage in a vine.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 30.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and concisely describes the main change: adding a Neo4j lineage exporter for dbt manifest metadata. It is specific, directly related to the changeset, and accurately reflects the primary functionality introduced across all modified files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (1)
tests/test_neo4j_exporter.py (1)

24-61: Cover seed/snapshot extraction and relationship writes directly.

The fixture only exercises model/source nodes, and the full export test patches out export_dependencies. Add coverage for seed/snapshot resources plus a direct export_dependencies assertion so relationship direction and Cypher persistence can’t regress silently.

Also applies to: 121-125

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_neo4j_exporter.py` around lines 24 - 61, Update the
sample_manifest fixture to include seed and snapshot resource entries (e.g., add
keys like "seed.my_project.seed_customers" and "snapshot.my_project.snap_orders"
with appropriate fields mirroring existing nodes/sources) so tests exercise
seed/snapshot extraction; then in the test that patches
exporter.export_dependencies (or the test function that uses sample_manifest)
add a direct assertion that export_dependencies was called with the expected
relationship tuples (verify direction and types) and/or assert the Cypher
relationship write was invoked (mock/check the method that persists Cypher in
Neo4j exporter, e.g., export_dependencies and the Cypher persistence helper) to
ensure relationships are actually written and cannot regress.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@elementary_neo4j/neo4j_exporter.py`:
- Line 4: The import list in neo4j_exporter.py currently includes an unused
symbol Optional; remove Optional from the typing import (change "from typing
import Any, Dict, List, Optional" to only import the actually used names such as
Any, Dict, List) so the module no longer triggers the F401 unused-import Flake8
error.
- Around line 14-17: The class docstring claims the exporter "Creates nodes for
models, sources, and columns" which is inaccurate because extract_nodes only
generates dbt model/seed/snapshot/source nodes; update the class docstring to
reflect the implemented graph model (models/seeds/snapshots/sources and their
relationships) or, if column nodes are intended, implement column extraction in
extract_nodes/related methods; reference extract_nodes and the class docstring
to ensure the text and implementation stay consistent.
- Around line 64-73: extract_dependencies is including dependencies for manifest
entries that were never exported (tests/other resource types), causing
export_dependencies to try to create relationships with missing DbtNode
endpoints; update extract_dependencies to only emit dependency dicts when both
the upstream_id and unique_id are part of the actual exported node set (e.g.,
compare against the set of exported DbtNode unique_ids or the manifest entries
filtered by the same export criteria). Locate the function extract_dependencies
and either accept an additional parameter (exported_node_ids) or compute the
exported set using the same filtering logic used by export_dependencies/DbtNode
creation, then only append dependencies when upstream_id in exported_node_ids
and unique_id in exported_node_ids. Ensure identifiers referenced are
extract_dependencies, export_dependencies, and DbtNode so the change aligns with
existing export behavior.
- Line 6: The file elementary_neo4j/neo4j_exporter.py imports GraphDatabase from
the external package neo4j, so add neo4j to the project's declared dependencies
(e.g., add "neo4j" with an appropriate version constraint such as
"neo4j>=5.0,<6.0") in your packaging metadata (pyproject.toml under
[project.dependencies] or setup.cfg install_requires, and/or requirements.txt)
so installations pull the package automatically; ensure the version chosen is
compatible with the code that uses GraphDatabase and update CI/install steps to
validate the new dependency.

In `@README_neo4j.md`:
- Around line 72-75: The example impact-analysis Cypher query uses name matching
which is not unique; update the MATCH to locate the source by unique_id (the
docs' primary key) instead of name and return impacted.unique_id to avoid
ambiguous results — e.g., match (source:DbtNode {unique_id:
"<unique_id_here>"})-[:FEEDS_INTO*]->(impacted) and return impacted.unique_id,
impacted.resource_type; ensure the docs reference the DbtNode label, the
unique_id property, and the FEEDS_INTO relationship so readers use the canonical
identifier.

---

Nitpick comments:
In `@tests/test_neo4j_exporter.py`:
- Around line 24-61: Update the sample_manifest fixture to include seed and
snapshot resource entries (e.g., add keys like "seed.my_project.seed_customers"
and "snapshot.my_project.snap_orders" with appropriate fields mirroring existing
nodes/sources) so tests exercise seed/snapshot extraction; then in the test that
patches exporter.export_dependencies (or the test function that uses
sample_manifest) add a direct assertion that export_dependencies was called with
the expected relationship tuples (verify direction and types) and/or assert the
Cypher relationship write was invoked (mock/check the method that persists
Cypher in Neo4j exporter, e.g., export_dependencies and the Cypher persistence
helper) to ensure relationships are actually written and cannot regress.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b3513345-dde9-4fe7-bed2-7de914660721

📥 Commits

Reviewing files that changed from the base of the PR and between 10e92c4 and 2186788.

📒 Files selected for processing (6)
  • README_neo4j.md
  • elementary_neo4j/__init__.py
  • elementary_neo4j/neo4j_config.py
  • elementary_neo4j/neo4j_exporter.py
  • tests/__init__.py
  • tests/test_neo4j_exporter.py

Comment thread elementary_neo4j/neo4j_exporter.py Outdated
from pathlib import Path
from typing import Any, Dict, List, Optional

from neo4j import GraphDatabase
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Verify that the Neo4j Python driver is declared in dependency metadata.

set -euo pipefail

mapfile -t metadata_files < <(
  fd -HI '^(pyproject\.toml|setup\.py|setup\.cfg|requirements.*|Pipfile|poetry\.lock)$'
)

printf 'Dependency metadata files found:\n'
printf ' - %s\n' "${metadata_files[@]:-}"

if ((${`#metadata_files`[@]})); then
  rg -n -i -C2 '\bneo4j\b' "${metadata_files[@]}" || true
fi

Repository: elementary-data/dbt-data-reliability

Length of output: 136


🏁 Script executed:

#!/bin/bash
# Find all dependency metadata files
fd -HI '^(pyproject\.toml|setup\.py|setup\.cfg|requirements.*|Pipfile|poetry\.lock)$' > /tmp/metadata_files.txt

echo "Dependency metadata files found:"
cat /tmp/metadata_files.txt || echo "(none)"

echo ""
echo "Searching for 'neo4j' in dependency files:"
if [ -s /tmp/metadata_files.txt ]; then
  cat /tmp/metadata_files.txt | xargs rg -n -i -C2 '\bneo4j\b' || echo "No neo4j found in dependency files"
fi

Repository: elementary-data/dbt-data-reliability

Length of output: 226


🏁 Script executed:

# Check root level for common Python project files
echo "=== Files in root directory ==="
ls -la | grep -E '(pyproject|setup|requirements|poetry|Pipfile)' || echo "No common files at root"

echo ""
echo "=== All occurrences of 'neo4j' in the repository ==="
rg -i -n 'neo4j' --max-count=50

echo ""
echo "=== Content of integration_tests/requirements.txt ==="
cat integration_tests/requirements.txt

echo ""
echo "=== Search for .txt files with dependencies ==="
fd -i 'requirements.*\.txt$' -x cat -n {}

Repository: elementary-data/dbt-data-reliability

Length of output: 3994


🏁 Script executed:

# Check for any Python project metadata files with different search
echo "=== All Python config files in repo ==="
find . -maxdepth 2 -type f \( -name "pyproject.toml" -o -name "setup.py" -o -name "setup.cfg" -o -name "MANIFEST.in" \) -not -path "./.*"

echo ""
echo "=== Check if elementary_neo4j is a package with metadata ==="
ls -la elementary_neo4j/

echo ""
echo "=== Check for any __init__.py that might have version/dependencies ==="
cat elementary_neo4j/__init__.py 2>/dev/null || echo "No __init__.py found"

Repository: elementary-data/dbt-data-reliability

Length of output: 563


Add neo4j to project dependencies.

The module imports neo4j at import time (line 6), but the package is not declared in any dependency metadata. This means environments installing from project metadata will fail to install neo4j automatically, despite README-only instructions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@elementary_neo4j/neo4j_exporter.py` at line 6, The file
elementary_neo4j/neo4j_exporter.py imports GraphDatabase from the external
package neo4j, so add neo4j to the project's declared dependencies (e.g., add
"neo4j" with an appropriate version constraint such as "neo4j>=5.0,<6.0") in
your packaging metadata (pyproject.toml under [project.dependencies] or
setup.cfg install_requires, and/or requirements.txt) so installations pull the
package automatically; ensure the version chosen is compatible with the code
that uses GraphDatabase and update CI/install steps to validate the new
dependency.

Comment thread elementary_neo4j/neo4j_exporter.py
Comment thread elementary_neo4j/neo4j_exporter.py
Comment on lines +107 to +119
def export(self, manifest_path: str):
"""Full export pipeline — nodes + dependencies."""
logger.info(f"Loading manifest from {manifest_path}")
manifest = self.load_manifest(manifest_path)
nodes = self.extract_nodes(manifest)
dependencies = self.extract_dependencies(manifest)
self.export_nodes(nodes)
self.export_dependencies(dependencies)
logger.info("Neo4j lineage export complete")
return {
"nodes_exported": len(nodes),
"dependencies_exported": len(dependencies)
} No newline at end of file
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle stale graph state on repeated exports.

The export only MERGEs current nodes/relationships. If a model is removed or a dependency changes, old DbtNodes and FEEDS_INTO edges remain in Neo4j, causing false impact-analysis results. Add a scoped replace/cleanup strategy or document that this is append-only.

Comment thread README_neo4j.md Outdated
- Remove unused Optional import
- Fix extract_dependencies to filter non-exported node types
- Fix class docstring to reflect actual graph model
- Fix README Cypher query to use unique_id instead of name
- Add requirements.txt with neo4j dependency
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant