feat: add Neo4j lineage exporter for dbt manifest metadata#990
feat: add Neo4j lineage exporter for dbt manifest metadata#990SaranyaVaitheeswaran wants to merge 2 commits intoelementary-data:masterfrom
Conversation
- Add Neo4jLineageExporter class to export dbt nodes and dependencies to Neo4j - Supports models, sources, seeds and snapshots as DbtNode graph nodes - Creates FEEDS_INTO relationships for downstream impact analysis - Add Neo4jConfig dataclass with environment variable support - Add 8 unit tests, all passing - Add README with usage examples and Cypher query examples
|
👋 @SaranyaVaitheeswaran |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
✅ Files skipped from review due to trivial changes (2)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds a new Neo4j lineage exporter that reads a dbt Changes
Sequence DiagramsequenceDiagram
actor User
participant App as Application
participant Exporter as Neo4jLineageExporter
participant FS as FileSystem
participant Neo4j as Neo4jDB
User->>App: invoke export(manifest_path)
App->>Exporter: instantiate with Neo4jConfig
Exporter->>Neo4j: open driver/session
App->>Exporter: export(manifest_path)
Exporter->>FS: read manifest.json
FS-->>Exporter: return manifest data
Exporter->>Exporter: extract_nodes()
Exporter->>Exporter: extract_dependencies()
Exporter->>Neo4j: export_nodes(nodes) -- MERGE/SET per node
Neo4j-->>Exporter: nodes upserted
Exporter->>Neo4j: export_dependencies(deps) -- MATCH + MERGE FEEDS_INTO
Neo4j-->>Exporter: relationships upserted
Exporter-->>App: return {nodes_exported, dependencies_exported}
App->>Exporter: close()
Exporter->>Neo4j: close driver
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (1)
tests/test_neo4j_exporter.py (1)
24-61: Cover seed/snapshot extraction and relationship writes directly.The fixture only exercises model/source nodes, and the full export test patches out
export_dependencies. Add coverage for seed/snapshot resources plus a directexport_dependenciesassertion so relationship direction and Cypher persistence can’t regress silently.Also applies to: 121-125
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_neo4j_exporter.py` around lines 24 - 61, Update the sample_manifest fixture to include seed and snapshot resource entries (e.g., add keys like "seed.my_project.seed_customers" and "snapshot.my_project.snap_orders" with appropriate fields mirroring existing nodes/sources) so tests exercise seed/snapshot extraction; then in the test that patches exporter.export_dependencies (or the test function that uses sample_manifest) add a direct assertion that export_dependencies was called with the expected relationship tuples (verify direction and types) and/or assert the Cypher relationship write was invoked (mock/check the method that persists Cypher in Neo4j exporter, e.g., export_dependencies and the Cypher persistence helper) to ensure relationships are actually written and cannot regress.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@elementary_neo4j/neo4j_exporter.py`:
- Line 4: The import list in neo4j_exporter.py currently includes an unused
symbol Optional; remove Optional from the typing import (change "from typing
import Any, Dict, List, Optional" to only import the actually used names such as
Any, Dict, List) so the module no longer triggers the F401 unused-import Flake8
error.
- Around line 14-17: The class docstring claims the exporter "Creates nodes for
models, sources, and columns" which is inaccurate because extract_nodes only
generates dbt model/seed/snapshot/source nodes; update the class docstring to
reflect the implemented graph model (models/seeds/snapshots/sources and their
relationships) or, if column nodes are intended, implement column extraction in
extract_nodes/related methods; reference extract_nodes and the class docstring
to ensure the text and implementation stay consistent.
- Around line 64-73: extract_dependencies is including dependencies for manifest
entries that were never exported (tests/other resource types), causing
export_dependencies to try to create relationships with missing DbtNode
endpoints; update extract_dependencies to only emit dependency dicts when both
the upstream_id and unique_id are part of the actual exported node set (e.g.,
compare against the set of exported DbtNode unique_ids or the manifest entries
filtered by the same export criteria). Locate the function extract_dependencies
and either accept an additional parameter (exported_node_ids) or compute the
exported set using the same filtering logic used by export_dependencies/DbtNode
creation, then only append dependencies when upstream_id in exported_node_ids
and unique_id in exported_node_ids. Ensure identifiers referenced are
extract_dependencies, export_dependencies, and DbtNode so the change aligns with
existing export behavior.
- Line 6: The file elementary_neo4j/neo4j_exporter.py imports GraphDatabase from
the external package neo4j, so add neo4j to the project's declared dependencies
(e.g., add "neo4j" with an appropriate version constraint such as
"neo4j>=5.0,<6.0") in your packaging metadata (pyproject.toml under
[project.dependencies] or setup.cfg install_requires, and/or requirements.txt)
so installations pull the package automatically; ensure the version chosen is
compatible with the code that uses GraphDatabase and update CI/install steps to
validate the new dependency.
In `@README_neo4j.md`:
- Around line 72-75: The example impact-analysis Cypher query uses name matching
which is not unique; update the MATCH to locate the source by unique_id (the
docs' primary key) instead of name and return impacted.unique_id to avoid
ambiguous results — e.g., match (source:DbtNode {unique_id:
"<unique_id_here>"})-[:FEEDS_INTO*]->(impacted) and return impacted.unique_id,
impacted.resource_type; ensure the docs reference the DbtNode label, the
unique_id property, and the FEEDS_INTO relationship so readers use the canonical
identifier.
---
Nitpick comments:
In `@tests/test_neo4j_exporter.py`:
- Around line 24-61: Update the sample_manifest fixture to include seed and
snapshot resource entries (e.g., add keys like "seed.my_project.seed_customers"
and "snapshot.my_project.snap_orders" with appropriate fields mirroring existing
nodes/sources) so tests exercise seed/snapshot extraction; then in the test that
patches exporter.export_dependencies (or the test function that uses
sample_manifest) add a direct assertion that export_dependencies was called with
the expected relationship tuples (verify direction and types) and/or assert the
Cypher relationship write was invoked (mock/check the method that persists
Cypher in Neo4j exporter, e.g., export_dependencies and the Cypher persistence
helper) to ensure relationships are actually written and cannot regress.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b3513345-dde9-4fe7-bed2-7de914660721
📒 Files selected for processing (6)
README_neo4j.mdelementary_neo4j/__init__.pyelementary_neo4j/neo4j_config.pyelementary_neo4j/neo4j_exporter.pytests/__init__.pytests/test_neo4j_exporter.py
| from pathlib import Path | ||
| from typing import Any, Dict, List, Optional | ||
|
|
||
| from neo4j import GraphDatabase |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Verify that the Neo4j Python driver is declared in dependency metadata.
set -euo pipefail
mapfile -t metadata_files < <(
fd -HI '^(pyproject\.toml|setup\.py|setup\.cfg|requirements.*|Pipfile|poetry\.lock)$'
)
printf 'Dependency metadata files found:\n'
printf ' - %s\n' "${metadata_files[@]:-}"
if ((${`#metadata_files`[@]})); then
rg -n -i -C2 '\bneo4j\b' "${metadata_files[@]}" || true
fiRepository: elementary-data/dbt-data-reliability
Length of output: 136
🏁 Script executed:
#!/bin/bash
# Find all dependency metadata files
fd -HI '^(pyproject\.toml|setup\.py|setup\.cfg|requirements.*|Pipfile|poetry\.lock)$' > /tmp/metadata_files.txt
echo "Dependency metadata files found:"
cat /tmp/metadata_files.txt || echo "(none)"
echo ""
echo "Searching for 'neo4j' in dependency files:"
if [ -s /tmp/metadata_files.txt ]; then
cat /tmp/metadata_files.txt | xargs rg -n -i -C2 '\bneo4j\b' || echo "No neo4j found in dependency files"
fiRepository: elementary-data/dbt-data-reliability
Length of output: 226
🏁 Script executed:
# Check root level for common Python project files
echo "=== Files in root directory ==="
ls -la | grep -E '(pyproject|setup|requirements|poetry|Pipfile)' || echo "No common files at root"
echo ""
echo "=== All occurrences of 'neo4j' in the repository ==="
rg -i -n 'neo4j' --max-count=50
echo ""
echo "=== Content of integration_tests/requirements.txt ==="
cat integration_tests/requirements.txt
echo ""
echo "=== Search for .txt files with dependencies ==="
fd -i 'requirements.*\.txt$' -x cat -n {}Repository: elementary-data/dbt-data-reliability
Length of output: 3994
🏁 Script executed:
# Check for any Python project metadata files with different search
echo "=== All Python config files in repo ==="
find . -maxdepth 2 -type f \( -name "pyproject.toml" -o -name "setup.py" -o -name "setup.cfg" -o -name "MANIFEST.in" \) -not -path "./.*"
echo ""
echo "=== Check if elementary_neo4j is a package with metadata ==="
ls -la elementary_neo4j/
echo ""
echo "=== Check for any __init__.py that might have version/dependencies ==="
cat elementary_neo4j/__init__.py 2>/dev/null || echo "No __init__.py found"Repository: elementary-data/dbt-data-reliability
Length of output: 563
Add neo4j to project dependencies.
The module imports neo4j at import time (line 6), but the package is not declared in any dependency metadata. This means environments installing from project metadata will fail to install neo4j automatically, despite README-only instructions.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@elementary_neo4j/neo4j_exporter.py` at line 6, The file
elementary_neo4j/neo4j_exporter.py imports GraphDatabase from the external
package neo4j, so add neo4j to the project's declared dependencies (e.g., add
"neo4j" with an appropriate version constraint such as "neo4j>=5.0,<6.0") in
your packaging metadata (pyproject.toml under [project.dependencies] or
setup.cfg install_requires, and/or requirements.txt) so installations pull the
package automatically; ensure the version chosen is compatible with the code
that uses GraphDatabase and update CI/install steps to validate the new
dependency.
| def export(self, manifest_path: str): | ||
| """Full export pipeline — nodes + dependencies.""" | ||
| logger.info(f"Loading manifest from {manifest_path}") | ||
| manifest = self.load_manifest(manifest_path) | ||
| nodes = self.extract_nodes(manifest) | ||
| dependencies = self.extract_dependencies(manifest) | ||
| self.export_nodes(nodes) | ||
| self.export_dependencies(dependencies) | ||
| logger.info("Neo4j lineage export complete") | ||
| return { | ||
| "nodes_exported": len(nodes), | ||
| "dependencies_exported": len(dependencies) | ||
| } No newline at end of file |
There was a problem hiding this comment.
Handle stale graph state on repeated exports.
The export only MERGEs current nodes/relationships. If a model is removed or a dependency changes, old DbtNodes and FEEDS_INTO edges remain in Neo4j, causing false impact-analysis results. Add a scoped replace/cleanup strategy or document that this is append-only.
- Remove unused Optional import - Fix extract_dependencies to filter non-exported node types - Fix class docstring to reflect actual graph model - Fix README Cypher query to use unique_id instead of name - Add requirements.txt with neo4j dependency
Summary
Adds a Neo4j lineage exporter that reads dbt manifest.json and loads
the full lineage graph into Neo4j as a property graph.
Motivation
Elementary already captures rich lineage metadata via dbt artifacts.
This exporter makes that lineage available in Neo4j, enabling:
Changes
elementary_neo4j/neo4j_config.py— connection config with env var supportelementary_neo4j/neo4j_exporter.py— main exporter classtests/test_neo4j_exporter.py— 8 unit tests, all passingREADME_neo4j.md— usage docs with Cypher query examplesExample Usage
from elementary_neo4j.neo4j_config import Neo4jConfig
from elementary_neo4j.neo4j_exporter import Neo4jLineageExporter
config = Neo4jConfig.from_env()
exporter = Neo4jLineageExporter(config)
result = exporter.export("target/manifest.json")
{"nodes_exported": 42, "dependencies_exported": 67}
Tests
8 unit tests covering node extraction, dependency extraction,
manifest loading, and full export pipeline. All passing.
Summary by CodeRabbit
New Features
Documentation
Chores
Tests