Skip to content

feat(sql): first-class Table/View nodes and FROM/JOIN lineage edges#582

Open
alexisperinger-ux wants to merge 4 commits into
DeusData:mainfrom
alexisperinger-ux:feat/sql-ddl-nodes
Open

feat(sql): first-class Table/View nodes and FROM/JOIN lineage edges#582
alexisperinger-ux wants to merge 4 commits into
DeusData:mainfrom
alexisperinger-ux:feat/sql-ddl-nodes

Conversation

@alexisperinger-ux

@alexisperinger-ux alexisperinger-ux commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

SQL DDL was routed through the config-variable path, so CREATE TABLE / CREATE VIEW became generic Variable nodes with no table-to-table lineage: questions like "what does this view read from?" or "what breaks if I drop this table?" were unanswerable from the graph. This makes DDL first-class:

  • create_table becomes a Table node; create_view / create_materialized_view become View nodes.
  • FROM/JOIN relations become USAGE lineage edges, resolved against the Table/View registry, so view-to-table lineage is queryable.
  • Schema-qualified DDL is named by the table, not the schema (app.users resolves as users).

Only relations that resolve against the registry emit an edge, so unresolved names and DML-only files fabricate no nodes or edges.

No schema change: node labels are freeform strings and USAGE is an existing edge type.

Index mode: the new nodes need a full-mode index; fast and moderate prune the directories where DDL typically lives.

Tests: sql_ddl_node_labels, sql_view_lineage_usages, sql_schema_qualified_name in tests/test_extraction.c.

Fixes #574.

Part of the SQL/dbt indexing series

This is one of three PRs that split the SQL + dbt graph-indexing work to keep each under the one-issue-per-PR contributing rule. They share the same extraction and registry plumbing, so they are one logical change reviewed as a set:

  • #582 (this PR): SQL DDL, first-class Table / View nodes + FROM/JOIN lineage (#574).
  • #584: dbt Jinja from raw .sql, Model / Macro nodes + ref() / source() lineage (#575).
  • #583: dbt manifest ingestion, Model / Source nodes + DEPENDS_ON lineage (#576).

CREATE TABLE/VIEW/MATERIALIZED VIEW now extract as Table/View defs (were
generic Variable nodes) and CREATE PROCEDURE as Function. FROM/JOIN
relations are emitted as usages scoped to the enclosing CREATE def, so
pass_usages resolves them into view->table USAGE lineage edges. The
definition-registry allowlist gains Table/View so those defs resolve as
edge targets (kept in sync across pass_definitions and pass_parallel).
Adds extraction tests for the new labels and the lineage usages.

Signed-off-by: alexisperinger-ux <alexis.peringer@iss-stoxx.com>
(cherry picked from commit 63054f04465ec33ffde0b9a23d6f29ce817d96df)
Signed-off-by: alexisperinger-ux <alexis.peringer@iss-stoxx.com>
CREATE TABLE/VIEW now produce Table/View nodes (63054f0) instead of the
old Variable mislabel; update the label golden and the probe accordingly.

Signed-off-by: alexisperinger-ux <alexis.peringer@iss-stoxx.com>
(cherry picked from commit 7cca3f4faec7d2a12ac1cc6f4e432c8bc286cd94)
Signed-off-by: alexisperinger-ux <alexis.peringer@iss-stoxx.com>
resolve_sql_func_name took the first identifier of object_reference (the
schema) for schema.table names; take the last (the table) so CREATE
TABLE/VIEW nodes and FROM/JOIN lineage use the real relation name. Adds
a schema-qualified regression test.

Signed-off-by: alexisperinger-ux <alexis.peringer@iss-stoxx.com>
(cherry picked from commit 877ad51e8c14daf901656d918ced2ef636f7a5b1)
Signed-off-by: alexisperinger-ux <alexis.peringer@iss-stoxx.com>
Signed-off-by: alexisperinger-ux <alexis.peringer@iss-stoxx.com>
(cherry picked from commit 8ffee3834223ce58aa6c26b83c88f2902e07180e)
Signed-off-by: alexisperinger-ux <alexis.peringer@iss-stoxx.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SQL DDL is under-modeled: CREATE TABLE/VIEW become generic Variable nodes with no lineage

1 participant