Skip to content

feat(dbt): extract dbt Jinja lineage and macros from raw .sql models#584

Open
alexisperinger-ux wants to merge 6 commits into
DeusData:mainfrom
alexisperinger-ux:feat/dbt-jinja-extraction
Open

feat(dbt): extract dbt Jinja lineage and macros from raw .sql models#584
alexisperinger-ux wants to merge 6 commits into
DeusData:mainfrom
alexisperinger-ux:feat/dbt-jinja-extraction

Conversation

@alexisperinger-ux

Copy link
Copy Markdown
Contributor

Depends on #574 (PR #582) and must merge after it. This branch is stacked on
that work, so until #582 lands the diff below also shows its changes; once
#582 merges, this diff narrows to the dbt Jinja changes alone.

What does this PR do?

Indexes raw (uncompiled) dbt .sql models without a manifest: {{ ref('m') }} / {{ source('s','t') }} become USAGE lineage edges (via the vendored tree-sitter-jinja2 grammar), a model file becomes a name-addressable Model node so cross-file ref() resolves into model-to-model lineage, and {% macro name(...) %} becomes a Macro node. Model is emitted only on the .sql path; a macro-defining file is treated as a library, not a model.

Reuses freeform labels and the existing USAGE edge type, so there is no schema change. Macro extraction is gated to full mode, so the new nodes require a full-mode index. Complements the manifest path (#576).

Fixes #575.

Checklist

  • Every commit is signed off (git commit -s)
  • Tests pass locally (make -f Makefile.cbm test)
  • Lint passes (make -f Makefile.cbm lint-ci)
  • New behavior is covered by a test

CREATE TABLE/VIEW/MATERIALIZED VIEW now extract as Table/View defs (were
generic Variable nodes) and CREATE PROCEDURE as Function. FROM/JOIN
relations are emitted as usages scoped to the enclosing CREATE def, so
pass_usages resolves them into view->table USAGE lineage edges. The
definition-registry allowlist gains Table/View so those defs resolve as
edge targets (kept in sync across pass_definitions and pass_parallel).
Adds extraction tests for the new labels and the lineage usages.

Signed-off-by: alexisperinger-ux <alexis.peringer@iss-stoxx.com>
(cherry picked from commit 63054f04465ec33ffde0b9a23d6f29ce817d96df)
Signed-off-by: alexisperinger-ux <alexis.peringer@iss-stoxx.com>
CREATE TABLE/VIEW now produce Table/View nodes (63054f0) instead of the
old Variable mislabel; update the label golden and the probe accordingly.

Signed-off-by: alexisperinger-ux <alexis.peringer@iss-stoxx.com>
(cherry picked from commit 7cca3f4faec7d2a12ac1cc6f4e432c8bc286cd94)
Signed-off-by: alexisperinger-ux <alexis.peringer@iss-stoxx.com>
resolve_sql_func_name took the first identifier of object_reference (the
schema) for schema.table names; take the last (the table) so CREATE
TABLE/VIEW nodes and FROM/JOIN lineage use the real relation name. Adds
a schema-qualified regression test.

Signed-off-by: alexisperinger-ux <alexis.peringer@iss-stoxx.com>
(cherry picked from commit 877ad51e8c14daf901656d918ced2ef636f7a5b1)
Signed-off-by: alexisperinger-ux <alexis.peringer@iss-stoxx.com>
Signed-off-by: alexisperinger-ux <alexis.peringer@iss-stoxx.com>
(cherry picked from commit 8ffee3834223ce58aa6c26b83c88f2902e07180e)
Signed-off-by: alexisperinger-ux <alexis.peringer@iss-stoxx.com>
Signed-off-by: alexisperinger-ux <alexis.peringer@iss-stoxx.com>
(cherry picked from commit 23f81ed76a05c813a68a67bc5b1dc47344e160f0)
Signed-off-by: alexisperinger-ux <alexis.peringer@iss-stoxx.com>
…ates

dbt models are .sql (the SQL host path); a plain .jinja/.j2 template is
not a dbt model. Emit Model only on the SQL path; the JINJA2 path keeps
macro and ref/source extraction.

Signed-off-by: alexisperinger-ux <alexis.peringer@iss-stoxx.com>
(cherry picked from commit 582ad0c0293be148a97e1e52d0043ad6c4fe0e7d)
Signed-off-by: alexisperinger-ux <alexis.peringer@iss-stoxx.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Extract dbt lineage and macros from raw .sql models (no compiled manifest)

1 participant