What problem does this solve?
SQL DDL is under-modeled. CREATE TABLE / CREATE VIEW are extracted but labeled generic Variable, CREATE MATERIALIZED VIEW is dropped, and there is no table-to-table lineage. Every CREATE collapses to an untyped Variable node with zero lineage edges, so the graph cannot answer basic questions:
- "What tables/views does this view read from?"
- "What breaks downstream if I change this table?"
Public test bed: lerocha/chinook-database (pure CREATE TABLE / CREATE VIEW DDL).
Proposed solution
- Label
create_table as Table, and create_view / create_materialized_view as View.
- Emit FROM/JOIN
relation references as USAGE lineage edges, resolved through the existing definition registry, so a view links to the tables/views it selects from.
- Zero schema change: freeform node labels + the existing
USAGE edge type. Only relations that resolve against the registry emit an edge, so DML-only files fabricate nothing.
Scope: DDL labeling + FROM/JOIN lineage only.
Alternatives considered
- A new
REFERENCES edge type instead of reusing USAGE: avoided to add no edge type; USAGE already means "X refers to Y".
- Modeling columns as
Field nodes: deferred to keep this small.
Confirmations
What problem does this solve?
SQL DDL is under-modeled.
CREATE TABLE/CREATE VIEWare extracted but labeled genericVariable,CREATE MATERIALIZED VIEWis dropped, and there is no table-to-table lineage. EveryCREATEcollapses to an untypedVariablenode with zero lineage edges, so the graph cannot answer basic questions:Public test bed:
lerocha/chinook-database(pureCREATE TABLE/CREATE VIEWDDL).Proposed solution
create_tableasTable, andcreate_view/create_materialized_viewasView.relationreferences asUSAGElineage edges, resolved through the existing definition registry, so a view links to the tables/views it selects from.USAGEedge type. Only relations that resolve against the registry emit an edge, so DML-only files fabricate nothing.Scope: DDL labeling + FROM/JOIN lineage only.
Alternatives considered
REFERENCESedge type instead of reusingUSAGE: avoided to add no edge type;USAGEalready means "X refers to Y".Fieldnodes: deferred to keep this small.Confirmations