Skip to content

Extract dbt lineage and macros from raw .sql models (no compiled manifest) #575

Description

@alexisperinger-ux

What problem does this solve?

dbt .sql models are Jinja-templated ({{ ref() }}, {{ source() }}, {% macro %}), which the SQL grammar cannot parse, and dbt's manifest only exists after dbt compile. So in a raw checkout, models and their dependencies are invisible: a model file is only a generic Module, and a referenced model like stg_users is not even a node, so ref() lineage cannot form.

Public test bed: dbt-labs/jaffle_shop (index the raw models/ without compiling).

Proposed solution

Run an additive tree-sitter-jinja2 pass on dbt-templated .sql (files containing {{ / {%):

  • {{ ref('m') }} / {{ source('s','t') }} become USAGE lineage edges.
  • A dbt model (a .sql file with no macro defs) becomes a Model node keyed by file stem, so cross-file {{ ref('that_model') }} resolves into model-to-model lineage.
  • {% macro name(...) %} becomes a Macro.

Zero schema change (freeform labels + existing USAGE edge). Model is emitted only on the .sql path, so a plain .jinja / .j2 template is not treated as a model. These source-level Model nodes coexist with the manifest path (#576) without conflict.

Caveat: the vendored tree-sitter-jinja2 grammar models only {{ }} expressions (so ref() / source() are parsed from the AST). It has no rule for {% %} statements, so {% macro %} names are recovered with a small text scan until a statement-aware grammar is vendored.

Alternatives considered

  • Reusing the embedded-language (<script>-in-HTML) re-parse path: rejected because dbt Jinja is interleaved throughout the file, not a delimited sub-region; a full-file second parse is the right shape.
  • Extending the grammar to parse {% %} statements: larger and touches grammar vendoring; deferred.

Confirmations

  • I searched existing issues and this is not a duplicate.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions