Skip to content

feat(validators): dbt-tests-pass validator + dbt-schema-verify hardening#849

Open
anandgupta42 wants to merge 18 commits into
mainfrom
feat/validator-framework
Open

feat(validators): dbt-tests-pass validator + dbt-schema-verify hardening#849
anandgupta42 wants to merge 18 commits into
mainfrom
feat/validator-framework

Conversation

@anandgupta42
Copy link
Copy Markdown
Contributor

@anandgupta42 anandgupta42 commented May 27, 2026

What does this PR do?

Completes the harness-side completion-gate validator framework. The registry and dbt-schema-verify were shipped in earlier PRs; this adds dbt-tests-pass and hardens the existing validator.

New: dbt-tests-pass validator

Fires after the agent declares done (finishReason === "stop"). Finds every dbt model .sql file the session modified (by mtime since sessionStartMs), runs altimate-dbt test --model <name> against each, and injects a synthetic user message if any tests fail — forcing one more agent turn to fix the SQL.

What this catches that schema-verify doesn't: row-data correctness errors. A model whose column names all match schema.yml can still fail relationships, unique, not_null, accepted_values, or AUTO_*_equality tests because the SELECT logic produces wrong values or wrong row counts.

Hardened: dbt-schema-verify

  • parseSchemaVerifyOutput() — scans the stdout blob for the last balanced {...} block. altimate-dbt emits dbt log noise (ANSI codes, parser warnings) before the JSON verdict; JSON.parse(stdout) was failing on it. This fix was validated on airbnb006 where the validator correctly triggered a schema-mismatch retry.
  • Debug logging (spawn errors, close events, stdout/stderr heads) gated behind ALTIMATE_VALIDATORS_DEBUG=1.

prompt.ts: diagnostic cleanup

  • All console.error calls gated behind ALTIMATE_VALIDATORS_DEBUG=1 — previously they were always on, producing noise in normal sessions.
  • Added missing validatorsEnabled && guard on dispatch condition.
  • Added dispatch_enter, dispatch_result, dispatch_error debug events.

build-local-tarball.sh: altimate-dbt on PATH

Added "altimate-dbt": "./dbt-tools/bin/altimate-dbt" to bin entries in the benchmark npm package. Validators spawn altimate-dbt and it wasn't in PATH inside the Docker containers used by ade-bench.

system.ts: Marker Guard fix

Moved // altimate_change end from inside the skills() function body (after the unreachable return) to outside it. The function's closing } was appearing as unguarded new code in the diff, tripping Marker Guard CI.

Type of change

  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)

Issue for this PR

Closes #848

How did you verify your code works?

End-to-end on DGX with the ade-bench Docker harness:

  • dbt-schema-verify: airbnb006 — agent triggered a schema-mismatch retry, fixed 5 column mismatches, all tests passed
  • dbt-tests-pass: confirmed validator detects test failures and injects the correct fix hint
  • altimate-dbt test on PATH: which altimate-dbt/usr/bin/altimate-dbt inside container
  • TypeScript: bun run typecheck passes
  • Marker Guard: bun run script/upstream/analyze.ts --markers --base main --strict passes

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • I ran bun run script/upstream/analyze.ts --markers --base main --strict and it passes

🤖 Generated with Claude Code


Summary by cubic

Adds a harness-side dbt-tests-pass validator and strengthens dbt-schema-verify to block session completion when dbt tests fail or schema diffs are detected. Adds subprocess timeouts, nested dbt-project detection, always-on telemetry, and expanded adversarial tests to improve reliability.

  • New Features

    • dbt-tests-pass validator: after finish, runs altimate-dbt test --model <name> on modified models and injects a user message on failures to force another turn.
    • Validator framework wiring: explicit registration in the session loop; enable with ALTIMATE_VALIDATORS_ENABLED=1 and control retries via ALTIMATE_VALIDATORS_MAX_RETRIES.
    • altimate-dbt schema-verify: new CLI and auto-run in build responses to surface column-shape verdicts inline.
  • Bug Fixes

    • Timeouts: kill long-running altimate-dbt subprocesses with ALTIMATE_VALIDATORS_TIMEOUT_MS (default 60s) to prevent hangs.
    • Nested projects: use findDbtProjectRoot for cwd and file scanning, fixing “not a dbt project” errors in subdirs.
    • Telemetry: validator runs are tracked regardless of the feature flag; added retries-exhausted event; dispatch guard restored; debug logs remain behind ALTIMATE_VALIDATORS_DEBUG=1.
    • Robustness and perf: shared validator-utils with strict JSON extraction, platform-safe paths, and parallel model checks; improved spawn-failure reporting; expanded adversarial tests (117 cases) for JSON parsing, dbt test summary parsing, project-root detection, and modified-model scanning.
    • Tooling: ensure altimate-dbt is on PATH in ade-bench; tarball script reliability fix; Marker Guard false-positive resolved.

Written for commit 04abef0. Summary will update on new commits. Review in cubic

Summary by CodeRabbit

  • New Features

    • Added schema-verify CLI and auto-run after successful builds; integrated auto-loading skills (applyPaths/alwaysApply) and a validator framework that enforces schema checks and dbt unit-test verification. Added an Altimate Code ADE-bench agent and local tarball/setup helpers.
  • Documentation

    • Expanded dbt development guidance, schema-verify usage, skills configuration examples, and an ADE-bench reproduction guide and experiment reports.
  • Tests

    • Added comprehensive tests covering schema-verify, build auto-verification, validators, and related utilities.

Review Change Stack

anandgupta42 and others added 16 commits May 10, 2026 23:27
…ments

Adds research/kimi-k26-ade-bench-2026-05-10/ with a blog-ready writeup of how
the Moonshot Kimi-K2.6 model behaves as a coding agent inside altimate-code's
agent loop, derived from 78 trial traces against ADE-Bench. Findings cover
tool-usage distribution, wall-clock anatomy (~89% model generation, ~5%
tools), prompt-cache amplification (85.8%), per-failure-class taxonomy, and
extended appendices (per-trial manifest, pass-rate by family, skill
invocation log, cost/runtime distribution, reproducibility command, glossary,
open questions).

Also extends two shipped skills with generic dbt-best-practice patterns
surfaced during the analysis (all benchmark-agnostic, applicable to any dbt
project):

- dbt-develop/SKILL.md
  * stronger description with explicit invocation triggers
  * new section on transformation-logic pitfalls: incremental high-water
    marks (>= vs >), snapshot strategy selection, LEFT JOIN + COUNT(*)
    phantom rows, type harmonization in COALESCE/CASE/UNION, date-spine
    completeness, off-by-one window boundaries, uniqueness enforcement,
    window-LIMIT tiebreakers
  * deliverable-enumeration step in Validate phase + iron rule
  * unit-test verification step + iron rule
- dbt-unit-tests/SKILL.md
  * new iron rule requiring mock data to exercise every SQL construct's
    failure mode (LEFT JOIN unmatched parents, NULLIF zero, CASE branches,
    COALESCE all-null, window boundaries, date spines, etc.)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the source-code + scripts + 4 small patches needed to plug
altimate-code into upstream ade-bench. Lets anyone reproduce the
81.3% pass rate described in research/kimi-k26-ade-bench-2026-05-10/
without trusting the pre-aggregated numbers.

What's included:
- benchmark/ade-bench/README.md — full reproduction guide (prereqs,
  Docker memory, env-var knobs, step-by-step commands, troubleshooting)
- benchmark/ade-bench/altimate_code_agent/ — drop-in agent module
  (AltimateCodeAgent class, JSON event parser, log formatter, install
  script that runs inside the trial container, tarball builder)
- benchmark/ade-bench/patches/ — 4 small patches against upstream
  dbt-labs/ade-bench (register AgentName.ALTIMATE_CODE, wire it into
  the AgentFactory, export from installed_agents/__init__.py, route
  the existing shared/config/AGENTS.md baseline file the same way
  Codex receives it — pure parity, no benchmark-specific content)

Explicitly NOT in this folder:
- Trace files / per-trial agent.log / results.json (regenerable)
- The 130 MB built tarball (build-local-tarball.sh recreates it)
- Seed DuckDB databases (downloaded from dbt-labs/ade-bench releases)
- Per-task ground-truth seeds + test SQL (those live in upstream
  ade-bench and are never sent to the agent at run time)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…itfalls

Two related changes, both shipped to every altimate-code user.

(1) `feat(skill)`: add `alwaysApply: bool` and `applyPaths: string|string[]`
    frontmatter to skill metadata, mirroring Cursor's "Always Apply" and
    "Auto Attached" rule modes. When a skill is `alwaysApply: true` or has
    `applyPaths` matching at least one file under the worktree, its body
    is inlined into the system prompt at session start under an
    `<auto_loaded_skill>` block — the model no longer needs to invoke the
    Skill tool to access that guidance.

    Motivation: benchmark traces show the agent invokes the `Skill` tool
    in <1% of tool calls, even after the skill description is rewritten
    to be imperative. Many failures occur on patterns the relevant skill
    already documents but the agent never loads. Auto-loading puts the
    body deterministically in context for projects where the skill
    applies.

    Files:
      • packages/opencode/src/skill/skill.ts — Info schema + both load
        paths (filesystem + binary-embedded) pluck the new fields
      • packages/opencode/src/session/system.ts — auto-inline matched
        skill bodies after the existing available_skills XML block
      • .opencode/skills/dbt-develop/SKILL.md — frontmatter now declares
        `applyPaths: [dbt_project.yml, **/dbt_project.yml]`, so dbt
        projects auto-load this skill's body (~270 lines of dbt
        best-practice patterns) at session start

    The existing skill-tool-invocation path is unchanged; auto-load is
    additive. Skills without `alwaysApply` / `applyPaths` continue to
    require explicit invocation. Prompt caching amortizes the extra
    tokens across the long agent loop.

(2) `docs(skill)`: three new generic dbt pitfall sections in
    `dbt-develop/SKILL.md`, all benchmark-agnostic best practices
    surfaced during failure-trace analysis:

    • String concatenation with `NULL` operands — `||` / `CONCAT`
      propagate `NULL`; wrap with `COALESCE` or use `CONCAT_WS`.
      Catches an invisible row-dropper in surrogate-key generation and
      derived columns.
    • dbt model versioning (dbt 1.8+) — when introducing a v2 of an
      existing model, use dbt's `versions:` block in `_models.yml` with
      `defined_in:`, not a sibling `_v2.sql` file. Otherwise downstream
      lineage and `{{ ref(model, v=2) }}` resolution break.
    • Strengthened the existing window-rank + `LIMIT` section to call
      out determinism explicitly, including the `QUALIFY ROW_NUMBER()
      OVER (... ORDER BY metric, id)` form and the "if you can't think
      of a tiebreaker, you don't have a unique key yet" framing.

    All three patterns are documented in well-known dbt style guides
    and would benefit any real altimate-code user — they are not
    benchmark-targeted tweaks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds reference for the new auto-load mechanism to docs/docs/configure/skills.md:
- Lists the two new frontmatter fields in the Frontmatter Fields table
- New "Auto-loading skills" section explaining the lazy-load default,
  how `alwaysApply` and `applyPaths` change it, a worked example,
  a "when to use" table, and an honest section on context-size
  implications + prompt-cache amortization

Pure documentation update — no code change in this commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two changes informed by trace analysis of the benchmark run with the
initial auto-load mechanism. With the auto-loaded body present in the
system prompt, 6 of 8 sampled failing trials never referenced any of
its guidance keywords (date spine, tiebreaker, deliverable, etc.) —
the model was treating the auto-loaded section as background reference
rather than binding directive. These two changes address the framing.

(1) `feat(system-prompt)`: move auto-loaded skill bodies BEFORE the
    lazy-loaded `<available_skills>` XML block in the skills section.

    Previously the order was:
      1. "Use the skill tool to load a skill..." preamble
      2. <available_skills> XML (long, descriptions only)
      3. <auto_loaded_skill> body (binding guidance)

    Now:
      1. <auto_loaded_skill> body (binding guidance — read FIRST)
      2. "Skills provide specialized instructions..." preamble
      3. <available_skills> XML (lazy-loaded skills the agent can opt into)

    Framing the auto-loaded body as "rules of the road" at the start
    rather than supplementary documentation at the end. Pure ordering
    change in `SystemPrompt.skills()` parts array — no schema or API
    change. Applies to any skill using `applyPaths` or `alwaysApply`.

    File: packages/opencode/src/session/system.ts

(2) `docs(skill)`: add a "Pre-completion checklist" section (§5) to
    dbt-develop that the agent is told to emit with `[x]/[ ]` marks
    before declaring the task done.

    Each item is a yes/no question against patterns the skill already
    documents (LEFT JOIN cardinality, date-spine completeness,
    window-rank tiebreaker, type harmonization in COALESCE/CASE/UNION,
    string-concat NULL handling, uniqueness enforcement, incremental
    high-water mark, snapshot strategy, dbt model versioning v2,
    unit-test verification).

    The forcing function: the agent must produce the checklist text in
    its final message. Unchecked items without a stated "n/a" reason
    mean the task is not done. Forces the model to slow down at the
    end and verify the patterns against the SQL it just wrote, rather
    than silently skip the verification phase.

    All items are generic dbt patterns applicable to any project — no
    benchmark-specific test names, no solution-seed values, no
    grading-rubric hints.

    File: .opencode/skills/dbt-develop/SKILL.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…result

The "emit a [x]/[ ] checklist before declaring done" addition to
dbt-develop (§5, shipped two commits ago) was measured negative on
the post-A+B benchmark re-run:

  - Checklist appeared in 6 of 14 still-failing trial outputs.
  - Zero of those 6 flipped to PASS.
  - In multiple traces, the agent self-marked `[x] LEFT JOIN
    cardinality correct` while the underlying SQL still had the
    exact phantom-row bug the item warned against.

The framing trained the model to perform verification theater
rather than actually re-read its SQL. The two flips attributed
earlier to "A+B" (helixops_saas007, helixops_saas009) trace back
to the placement reorder (A) — the checklist (B) contributed
nothing measurable, and adds 50+ lines of system-prompt content
for no benefit.

This commit:

(1) Removes §5 from `.opencode/skills/dbt-develop/SKILL.md`.
    The other sections (Plan → Discover → Write → Validate,
    Common Pitfalls in Transformation Logic, Iron Rules) stay
    intact. The placement reorder in `system.ts` and the
    `applyPaths`/`alwaysApply` frontmatter mechanism stay.

(2) Adds a "What we tried that didn't work" section to
    research/kimi-k26-ade-bench-2026-05-10/findings.md so the
    negative result is preserved as institutional knowledge.
    The broader principle — "soft self-verification (model
    promises it checked X) is unreliable on this model class;
    hard verification (compile/test failures) still works" — is
    worth keeping around.

(3) Updates the findings TL;DR with both the original 81.3%
    headline and the post-second-wave 85.3% best-of-runs number,
    with the caveat that the body of the post analyzes the
    first-wave traces.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The `LEFT JOIN + COUNT(*)` pitfall example referenced `dim_listings LEFT JOIN
fct_reviews`. Those names are the canonical airbnb dbt-tutorial models (from
Maven Analytics / public dbt courses) and also happen to be ADE-Bench tasks,
so even though the rule itself is fully generic, the example wording was
needlessly close to benchmark content. Swap to abstract `dim_parent LEFT JOIN
fct_child` — the rule is identical, the wording is unambiguous.

No behavior change. Cosmetic only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…iff validate step

Three new generalizable dbt patterns surfaced from systematic trace
analysis of ADE-Bench failures:

1. Iron Rule 8 — Schema Fidelity: agent must match the spec's column
   tuple exactly (names, types, ORDER, no extras). Adding "helpful"
   columns or substituting synonyms (supplier_id vs supplier_company)
   breaks AUTO_*_equality tests against the spec contract.

2. CTE-to-model refactor row preservation: when extracting a CTE into
   a standalone intermediate model, build it FROM the parent table the
   CTE started from, not the child table. The extracted model otherwise
   becomes effectively an INNER JOIN and drops parent rows with no
   children. Includes dbt_utils.equal_rowcount and audit_helper
   verification patterns.

3. Diff-against-spec step in the validate phase: agent produces three
   lists (columns_extra, columns_missing, columns_reordered) and treats
   any non-empty list as "not done". Verification > in-prompt negative
   rules (per the Self-Verification Dilemma literature).

All three pass the "When working on any dbt project, ..." self-test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… check

Adds a new `altimate-dbt schema-verify --model <name>` subcommand that
mechanically diffs a model's produced columns against the schema.yml
spec and returns a structured `{verdict, columns_extra, columns_missing,
columns_reordered, type_mismatches}` result.

Background: trace analysis of repeated benchmark failures showed that
in-prompt rules ("match the column spec exactly") were being read but
not applied — the agent agrees in principle, then adds extra columns
or reorders them anyway. The Self-Verification Dilemma literature
predicts this: negative rules without a mechanical check are weak.

Design follows the existing dbt-tools split: dbt parsing lives in
altimate-code (via dbt-integration's adapter), so the bridge belongs
here. Spec source: `adapter.parseManifest().nodeMetaMap.lookupByBaseName(model).columns`
(schema.yml entries compiled into manifest.json). Actual source:
`adapter.getColumnsOfModel(model)` (warehouse / catalog).

Case-insensitive name comparison (dbt convention). Type mismatches
are reported only when the spec actually declares `data_type` —
common to omit it, and treating omission as a mismatch would
produce noise.

Skill change: the validate phase's "diff column shape" section now
prescribes calling `altimate-dbt schema-verify` and treating any
`mismatch` verdict as "not done", instead of asking the agent to
self-diff column lists.

Iron Rule 8 also tightened to point at the mechanical check.

Tests: 13 covering the four diff categories, the no-spec skip,
case-insensitivity, type-mismatch precedence rules, error
propagation, and two regression-style cases mirroring real
ade-bench failure shapes (extra rank-breakdown columns, leading-
column reorder).

All four pass the "When working on any dbt project, …" self-test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move the schema-verify procedure out of dbt-develop's body (where it
was ~30 lines deep in a 450-line skill) into its own focused skill
that auto-loads on dbt projects.

Why: trace inspection of v5 runs showed the agent reads the
schema-verify instruction inside dbt-develop, agrees with it in
chain-of-thought, then doesn't actually run the command. Burying a
procedural step inside a discursive skill is part of the problem;
the agent gives the step less attention than skill-top imperative
content.

Design: short, procedural, imperative skill body. Auto-loads via
applyPaths alongside dbt-develop. Iron rules state the contract
explicitly. Includes a fallback when altimate-dbt isn't available
(reads schema.yml + dbt show by hand). Cross-references the
dbt-develop "CTE row-preservation" pattern for the related
row-count case (which schema-verify does NOT cover).

dbt-develop now points at dbt-schema-verify instead of embedding the
full procedure. Iron Rule 8 similarly points at the dedicated skill.

Honest caveat: this is still a prompt-level intervention. Trace
inspection of v4 and v5 runs both showed that even mechanically-
callable tools get ignored if the harness doesn't enforce the call.
The structural fix is harness-level: a before_terminate hook with
per-domain completion validators, of which dbt-schema-verify would
be one. That work is a follow-up — this skill is the cleanest
prompt-side fix in the meantime.

Passes the "When working on any dbt project, ..." self-test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The agent has been ignoring skill-level instructions to call
schema-verify before declaring done (v4 and v5 trace inspection
showed the agent reads the rule, writes the intention in
chain-of-thought, then doesn't run the command). Building a full
harness-level validator framework that intercepts session
termination is real engineering; this commit ships the smallest
forcing mechanism that doesn't require touching the session loop:
auto-trigger schema-verify inside the build command's own response.

The agent now cannot see a successful `altimate-dbt build --model X`
without also seeing the schema-verify verdict in the same tool
result. The diff is in the JSON response under `schema_verify`,
in-context where the agent's attention sits — much harder to
ignore than a system-prompt skill rule.

Behavior:
- `build` without `--model` is unchanged (project-wide build, no
  per-model verify makes sense).
- `build --model X` runs schema-verify on X after a successful
  build. The full structured result lives at `schema_verify`.
- A verify failure does NOT mask the build's stdout — both are
  reported. Build status remains the success/error signal.
- If verify itself errors (missing manifest, unbuilt table), the
  error is reported under `schema_verify.error` with a fix hint.

Tests:
- Updated existing build-test mocks to include parseManifest +
  getColumnsOfModel (no behavior change, just shape consistency).
- New assertion: build --model X result now contains schema_verify.

Skill: dbt-schema-verify body adds a note that the agent gets
schema-verify "for free" inside build's response, so it doesn't
need to call it twice for verification.

This is a stepping stone — the full validator framework that
intercepts session termination is the next iteration. This commit
tests whether putting the diff inline with the build response is
enough to break through the ignore-the-rule pattern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v7 trace inspection revealed the agent uses `altimate-dbt build`
(no --model) for project-wide builds and `dbt build --model X`
(plain dbt) for per-model — never `altimate-dbt build --model X`.
So the per-model auto-trigger added in 3924009 never fired in any
of the 30 trials, even though it was wired correctly. The hook
missed because of the agent's command choice, not because the agent
ignored the result.

This commit extends the auto-trigger to the project-wide path:
after a successful `altimate-dbt build` (no --model), iterate every
model in the parsed manifest that has columns declared in schema.yml,
run schema-verify on each, and roll up the results into a single
`schema_verify_summary` field on the response:

  {
    "stdout": "...",
    "schema_verify_summary": {
      "models_checked": N,
      "match": M,
      "mismatch": K,
      "no_spec": L,
      "errored": E,
      "mismatches": [ { model, verdict, columns_extra, ... } ]
    }
  }

Only the mismatches are reported in full. Match and no-spec models
are counted but not echoed (keeps the response compact for 49-model
projects). Errored models include the per-model error string so the
agent can investigate.

The summary is the closest a CLI command can get to harness-level
enforcement without intercepting session termination: every
project-wide build now returns the full diff against schema.yml in
the same tool result the agent receives for the build. The agent
literally cannot see a green project-build without also seeing every
schema mismatch in the project.

Tests: new "project-wide build collects mismatches" test exercises
the 3-model case (match + mismatch + no-spec) end to end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Six experiments (v3-v9) proved that every form of completion-discipline
enforcement living inside the agent's command surface — skill rule, tool
description, auto-trigger inside a wrapping CLI, even binary substitution
— gets read, agreed-with in chain-of-thought, then ignored. In v9 the
agent actively found a backup binary at `.orig` to bypass the wrapping
shim. The Self-Verification Dilemma literature predicts this. The only
remaining lever is enforcement the agent cannot see: the harness
inspecting the world after `finishReason === "stop"` and refusing to
terminate if a registered validator says the work isn't done.

This commit adds the framework but does not enable it. Behavior is opt-in
via ALTIMATE_VALIDATORS_ENABLED=1, with a separate retry budget knob
(ALTIMATE_VALIDATORS_MAX_RETRIES=3 default). Telemetry fires
unconditionally so we can measure baseline fire rate against historical
traffic even before the gate is enabled.

Files added (framework, domain-agnostic):
  - session/validators/types.ts — Validator, ValidatorResult,
    ValidatorContext interfaces with a load-bearing comment explaining
    why this lives in the harness and not in skills/tools.
  - session/validators/registry.ts — Map-keyed registry + runAll that
    catches per-validator exceptions and converts them to soft-passes
    (a buggy validator should never brick the agent loop).

Files added (altimate domain, first concrete validator):
  - altimate/validators/dbt-schema-verify.ts — wraps the existing
    `altimate-dbt schema-verify` CLI. appliesTo: dbt project detected
    in worktree. check: scans models/ for .sql files mtime'd in this
    session, runs schema-verify on each, returns mismatch with a
    structured fixHint listing columns_extra/missing/reordered.
  - altimate/validators/index.ts — side-effect registration on import.

Wiring in session/prompt.ts step loop:
  - After processor.process() returns and the model declared finish:"stop"
    with no error and no pending tool calls, runAll() is dispatched.
  - Telemetry fires for every validator regardless of opt-in.
  - If the gate is enabled AND any validator failed AND we're under the
    retry budget: a synthetic user message is appended to the session
    with the aggregated failure reasons + fix hints. The step loop's
    top-of-iteration break check then sees the newer user message and
    does NOT break — the model gets one more turn to address the gap.
  - Retry budget exhaustion falls through to the natural break.

Architectural choice: the dispatch hook is in prompt.ts not in
processor.ts. processor.process() returns per-step semantics (stop /
continue / compact); prompt.ts owns the multi-step harness loop. The
validator gate is a harness concept, not a stream concept.

Generalisable: the framework is domain-agnostic. New validators register
via `ValidatorRegistry.register(...)` from any module's side-effect
import. Phase 2 candidates (already scoped, not in this commit):
dbt-rowcount-preservation, dbt-tests-pass, sql-compile, sql-equivalence,
pii-scan, column-lineage. Each is ~30-50 LOC on top of the framework.

Not in this commit (deferred):
  - The skill diet (extracting process-discipline content out of
    dbt-develop / dbt-schema-verify into validator fixHints). Lands once
    we've measured validator fire rates with the gate enabled.
  - Unit tests for the framework (lives in a follow-up dedicated test
    PR since prompt.ts is already heavily integration-tested upstream).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bun --single may tree-shake side-effect imports. Switch to explicit
registerAltimateValidators() call so the registration is unambiguously
referenced. Also add an info log on every hook entry so we can confirm
the code path is reached even when validators don't fire.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ixes

Harness-side completion-gate validator framework, completing the 3-part
series from PR #792 (registry) + PR #800 (dbt-schema-verify) + this PR.

### New: dbt-tests-pass validator
- Fires after the agent declares done (finish === "stop")
- Detects dbt model `.sql` files modified since session start via mtime
- Runs `altimate-dbt test --model <name>` against each touched model
- Parses `Done. PASS=N WARN=N ERROR=N ...` summary from dbt output
- Extracts individual failing test names from per-line output
- Injects synthetic user message with fix hints when tests fail
- `extractLastJsonObject()` handles altimate-dbt's JSON envelope + log noise
- Only activates in dbt projects (scans for `dbt_project.yml`)

### Enhanced: dbt-schema-verify hardening
- `parseSchemaVerifyOutput()` — scans backwards for last balanced `{...}`
  block to handle dbt log noise (ANSI codes, parser warnings) emitted
  before the JSON verdict
- Debug logging for spawn errors and close events, gated behind
  `ALTIMATE_VALIDATORS_DEBUG=1` so normal sessions stay quiet
- Better error fallback: reports non-JSON stdout when stderr is empty

### prompt.ts: debug-gated diagnostics
- `ALTIMATE_VALIDATORS_DEBUG=1` env var gates all stderr console.error
  calls — on by default in ade-bench harness, off everywhere else
- Added `validatorsEnabled &&` guard on dispatch condition (was missing)
- Debug logs for dispatch_enter, dispatch_result, dispatch_error events
- `hasError` field added to validator_hook_reached diagnostic

### build-local-tarball.sh: altimate-dbt on PATH
- Added `"altimate-dbt": "./dbt-tools/bin/altimate-dbt"` to bin entries
- Ensures `altimate-dbt` is available via PATH in benchmark Docker
  containers (was missing; validators depend on it)

### system.ts: fix stray altimate_change marker placement
- Moved `// altimate_change end` from inside `skills()` function body
  (before the closing `}`) to outside it — the function's closing brace
  was appearing outside any marker block, triggering Marker Guard CI

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 27, 2026

📝 Walkthrough

Walkthrough

Adds a validator framework with dbt schema and test validators, a schema-verify CLI and build integration, skill auto-load frontmatter + prompt inlining, and an ADE‑Bench AltimateCode agent plus build/setup scripts, tests, and docs.

Changes

Core Validator Framework and dbt Correctness

Layer / File(s) Summary
Validator framework types and registry
packages/opencode/src/session/validators/types.ts, packages/opencode/src/session/validators/registry.ts
Defines ValidatorContext, ValidatorResult, and Validator; implements ValidatorRegistry with safe async dispatch and management APIs.
Validator utilities and tests
packages/opencode/src/altimate/validators/validator-utils.ts, packages/opencode/test/altimate/validators/validator-utils.test.ts
Adds shared utils: VALIDATOR_TIMEOUT_MS, findDbtProjectRoot, modelsModifiedSince, modelNameFromPath, and extractLastJsonObject with unit tests.
dbt-tools schema-verify command & build integration
packages/dbt-tools/src/commands/schema-verify.ts, packages/dbt-tools/src/commands/build.ts, packages/dbt-tools/src/index.ts, tests
Adds schemaVerify CLI command (match/mismatch/no-spec, extra/missing/reordered/type mismatches), auto-runs after builds, aggregates per-model summaries, and includes safeVerify() with tests.
DbtSchemaVerifyValidator
packages/opencode/src/altimate/validators/dbt-schema-verify.ts
Runs altimate-dbt schema-verify for touched models with timeout, parses verdicts, aggregates mismatches, and returns ok:false with fix hints on mismatch.
DbtTestsPassValidator & parser
packages/opencode/src/altimate/validators/dbt-tests-pass.ts, packages/opencode/test/altimate/validators/dbt-tests-pass.test.ts
Runs altimate-dbt test for touched models, parses PASS/WARN/ERROR summaries and failing test names, treats total===0 as non-failure, and formats multi-model fix hints; includes comprehensive parsing tests.
Validator registration & session integration
packages/opencode/src/altimate/validators/index.ts, packages/opencode/src/session/prompt.ts
Provides registerAltimateValidators() and wires ValidatorRegistry.runAll into the session completion path with telemetry, synthetic-user retry injection (feature-flagged), and bounded retry attempts.

Skill Auto-Loading and System Prompt

Layer / File(s) Summary
Skill auto-load frontmatter and metadata
packages/opencode/src/skill/skill.ts
Extends Skill.Info with alwaysApply and applyPaths, parses and persists them for filesystem and builtin skills.
System prompt auto-load logic
packages/opencode/src/session/system.ts
Collects auto-loaded skills via alwaysApply or glob applyPaths matches and inlines full skill content in an <auto_loaded_skill> block ahead of verbose skill listing; adds glob helpers and logging.
Skills documentation
docs/docs/configure/skills.md, .opencode/skills/*
Documents alwaysApply and applyPaths, auto-load timing, YAML examples, and strengthens dbt skill guidance (dbt-develop, dbt-schema-verify, dbt-unit-tests) with checklists and iron rules.

ADE‑Bench Agent Integration

Layer / File(s) Summary
AltimateCodeAgent implementation
benchmark/ade-bench/altimate_code_agent/altimate_code_agent.py, benchmark/ade-bench/altimate_code_agent/__init__.py
Adds AltimateCodeAgent, a JSON event parser for metrics/turns, log formatter for readable turns, env forwarding, local tarball handling, and tool extraction.
Agent setup and tarball build scripts
benchmark/ade-bench/altimate_code_agent/altimate-code-setup.sh, benchmark/ade-bench/altimate_code_agent/build-local-tarball.sh
Installs/configures altimate-code (prefers local tarball), generates provider config, and builds altimate-code-local.tgz tarball for ADE‑Bench.
ADE‑Bench registration patches & README
benchmark/ade-bench/patches/*, benchmark/ade-bench/README.md
Adds ALTIMATE_CODE enum, registers agent in factory and exports, copies AGENTS.md during setup, and provides a full reproduction README and research findings.

dbt Skill Enhancements

Layer / File(s) Summary
dbt-develop skill guardrails
.opencode/skills/dbt-develop/SKILL.md
Adds applyPaths, mandates skill load before SQL, requires enumerating deliverables in planning, enforces unit-test validation and schema-verify gating, extends iron rules and common pitfalls.
New dbt-schema-verify skill
.opencode/skills/dbt-schema-verify/SKILL.md
Describes when/how to run altimate-dbt schema-verify, verdict interpretation, fix mappings, iron rules, manual fallback, and scope exclusions.
dbt-unit-tests guidance
.opencode/skills/dbt-unit-tests/SKILL.md
Adds Iron Rule requiring unit tests to include mock rows exercising failure modes for every SQL construct present.

Sequence Diagram

sequenceDiagram
  participant Agent as LLM Agent
  participant Session as Session Loop
  participant Registry as ValidatorRegistry
  participant SchemaVal as DbtSchemaVerifyValidator
  participant TestVal as DbtTestsPassValidator
  participant DbtCLI as altimate-dbt

  Agent->>Session: signals clean stop
  Session->>Registry: runAll(ctx)
  Registry->>SchemaVal: appliesTo()/check(ctx)
  SchemaVal->>DbtCLI: schema-verify --model <name>
  DbtCLI-->>SchemaVal: verdict JSON
  Registry->>TestVal: appliesTo()/check(ctx)
  TestVal->>DbtCLI: test --model <name>
  DbtCLI-->>TestVal: test output
  Registry-->>Session: aggregated results
  alt Any validator fails and retries remain
    Session->>Agent: synthetic user message (reasons + fixHints)
    Agent->>Session: next turn (retry)
  else All pass or max retries reached
    Session->>Agent: allow completion
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

Suggested labels

needs-review:blocked

Poem

🐰 I hop through schemas, tests, and logs with care,

Counting columns, chasing mismatches in the air.
Skills auto-load when file paths align,
Validators guard the finish line.
ADE‑Bench hums — I nibble bugs, then cheerfully declare: it's fair!

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description is comprehensive and well-structured, covering what changed and why, test verification, and includes proper checklist items. However, it is missing the required 'PINEAPPLE' marker at the top as specified in the template for AI-generated contributions. Add 'PINEAPPLE' at the very top of the PR description before any other content, as required for AI-generated contributions per the repository template.
Docstring Coverage ⚠️ Warning Docstring coverage is 32.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main changes: adding a dbt-tests-pass validator and hardening dbt-schema-verify. It directly reflects the primary features and improvements in the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/validator-framework

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 20

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@benchmark/ade-bench/altimate_code_agent/altimate_code_agent.py`:
- Around line 227-229: The code appends self._model_name directly into the shell
command string (building variable command) which is unsafe; modify the command
construction in altimate_code_agent.py where command is built so that the model
name is safely escaped (e.g., wrap self._model_name with shlex.quote OR
preferably stop using a single shell-interpolated string and build command as a
list and pass to subprocess without shell=True), replacing the direct f" --model
{self._model_name}" interpolation to an escaped or argument-list approach to
prevent shell injection.

In `@benchmark/ade-bench/altimate_code_agent/altimate-code-setup.sh`:
- Around line 27-30: The script altimate-code-setup.sh currently falls back to
installing `@altimateai/altimate-code`@latest when a local tarball isn't staged;
change this to avoid non-deterministic installs by failing fast instead (or
pinning a known version). Modify the else branch that echoes "Local tarball not
staged; falling back to latest published" to either exit with a non-zero status
(e.g., echo an error and run exit 1) so the benchmark run stops when the tarball
is missing, or replace the npm install target with a fixed, documented version
string if you must auto-install; ensure the branch updates the message to
reflect the new behavior.

In `@benchmark/ade-bench/altimate_code_agent/build-local-tarball.sh`:
- Around line 11-14: REPO_ROOT is computed by climbing too many directories
("$(cd "$SCRIPT_DIR/../../../../../.." && pwd)") which can resolve outside the
repo; change the traversal to the correct depth (e.g. "$(cd
"$SCRIPT_DIR/../../.." && pwd)") so REPO_ROOT points to the repository root and
update dependent vars PKG_DIR, DBT_TOOLS_DIR and SKILLS_DIR remain based on
REPO_ROOT; locate and replace the REPO_ROOT assignment in build-local-tarball.sh
to use the corrected relative path.

In `@benchmark/ade-bench/README.md`:
- Around line 9-22: The README.md's fenced directory-tree code block is missing
a language tag (MD040); update the opening fence for the tree block in README.md
to include a language identifier (e.g., change ``` to ```text) so the snippet is
properly tagged, leaving the content and closing fence unchanged.

In `@docs/docs/configure/skills.md`:
- Around line 68-72: Add a language identifier to the fenced code block that
contains the <auto_loaded_skill> snippet so markdownlint rule MD040 is
satisfied; update the triple-backtick fence that wraps the <auto_loaded_skill
name="<skill-name>"> ... </auto_loaded_skill> block to use a language such as
"xml" (e.g., ```xml) so the code block is properly annotated.
- Around line 65-67: The docs currently say auto-loaded skills are appended
after <available_skills>, but runtime prepends matched skill bodies into the
skills section; update the wording around session start / the skills section
(lines mentioning "<available_skills>") to state that every matched skill body
is prepended to the system prompt in the skills section (before the
<available_skills> listing) so the documentation matches the implementation.

In `@packages/opencode/src/altimate/validators/dbt-schema-verify.ts`:
- Around line 36-57: isDbtProject currently only returns a boolean so nested
dbt_project.yml detections don’t yield the actual project path; change
isDbtProject to return the resolved project root path (string) or null (or add a
new findDbtProjectRoot that does this) and update appliesTo and check to use
that resolved root instead of ctx.workingDirectory when scanning directories or
spawning altimate-dbt; locate references in the same module (isDbtProject /
appliesTo / check) and ensure all file system operations and child_process
spawns use the discovered root path so nested projects run in the correct cwd.
- Around line 17-227: Refactor isDbtProject, modelsModifiedSince and
runSchemaVerify to use Effect-based services: replace direct fs/promises calls
with the FileSystem.FileSystem service (use its methods inside Effect.gen or
Effect.fn tracers) and replace child_process.spawn usage in runSchemaVerify with
ChildProcessSpawner.ChildProcessSpawner to spawn processes and ChildProcess.make
to manage the child handle; keep parseSchemaVerifyOutput as-is but call it from
the new Effect-based runSchemaVerify. Specifically, convert top-level async
functions to Effect.gen(function* { ... }) (or Effect.fn("Domain.method", ...)
wrappers) so file ops (stat, readdir) use FileSystem API and process exec uses
ChildProcessSpawner.spawn/make, propagate debug logging through the traced
Effect.fn calls, and ensure return types remain Promise-like by returning
Effect.toPromise where the callsites expect a Promise.
- Around line 268-291: The code currently drops null returns from
runSchemaVerify and returns ok: true whenever mismatches.length === 0, which
lets failures slip through; change the loop that calls runSchemaVerify (using
touched, modelNameFromPath, runSchemaVerify, ctx.workingDirectory) so that null
or errored runs are recorded in results (e.g., push a SchemaVerifyOutput entry
marking error/verdict appropriately instead of skipping), then update the final
check so ok: true is returned only when mismatches.length === 0 AND errored ===
0 AND results.length === touched.length (or otherwise ensure every touched model
was verified); keep using the existing symbols (results, mismatches, noSpec,
matches, errored) when computing the response.
- Around line 84-85: The code currently splits paths using "/" which fails on
Windows; update the checks to be OS-independent by using Node's path utilities:
normalize paths and split on path.sep (replace occurrences like full.split("/")
and p.split("/") with full = path.normalize(full) then full.split(path.sep), and
inside modelNameFromPath normalize p and split on path.sep or better use
path.parse/path.dirname/path.basename to extract the model directory/name),
ensuring the logic that detects "models" (where found.push(full) happens) and
the modelNameFromPath function work correctly across platforms.

In `@packages/opencode/src/altimate/validators/dbt-tests-pass.ts`:
- Around line 155-166: runDbtTest currently resolves null on spawn() errors
which lets the validator treat missing/failed altimate-dbt startup as a pass;
change runDbtTest so the child "error" handler produces a failing TestRunOutput
(include error.message/text) instead of resolving null, and adjust the caller
that currently drops null results (the validation logic handling runDbtTest
results) to treat that TestRunOutput as a failure (ok:false, checked:0 or
appropriate error metadata) so startup failures fail the validation rather than
being ignored.
- Around line 94-95: The code uses full.split("/") which fails on Windows;
update the checks in packages/opencode/src/altimate/validators/dbt-tests-pass.ts
to use path helpers: import path from "path", normalize the path
(path.normalize(full)) and split by path.sep (or use path.parse/path.basename or
path.relative and then split by path.sep) when checking for the "models" segment
and when deriving the model name in modelNameFromPath; replace all occurrences
of full.split("/") (including the other block around lines 109-111) so the logic
works cross-platform.
- Around line 49-64: The validator currently uses isDbtProject(cwd) which only
returns a boolean and treats nested dbt_project.yml files as "supported" while
later scans and altimate-dbt test invocations still run from the parent cwd;
change the behavior so the actual project directory is propagated: refactor
isDbtProject into a finder that returns the path of the dbt project directory
(e.g., return the matched subdirectory path or cwd when dbt_project.yml is
directly present) or add a new findDbtProjectDir(cwd): string | null helper,
then update appliesTo(...) and any places that call isDbtProject (and the code
that invokes altimate-dbt test) to use that returned projectDir when running
scans/commands instead of always using the parent cwd (ensure calls to
altimate-dbt test use the resolved projectDir).

In `@packages/opencode/src/session/prompt.ts`:
- Around line 1118-1124: The validator context vCtx is using process.cwd() which
can point outside the project; update vCtx to pass the session's project path
(use the session instance's directory/worktree property used elsewhere in the
loop, e.g. Instance.directory or Instance.worktree) instead of process.cwd() so
validators scan the correct tree; locate the vCtx construction in
session/prompt.ts and replace workingDirectory: process.cwd() with the
appropriate session instance property referenced in the surrounding code so
validators receive the project directory.

In `@packages/opencode/src/session/system.ts`:
- Around line 147-149: The code currently logs a warning in the catch inside
applyPaths (autoLoadLog.warn("applyPaths glob scan failed", { skill: skill.name,
err })) but a later catch (the one that converts scan errors to []) swallows the
error so that warning never runs; update the catch that currently converts
errors to [] (inside applyPaths or its helper) to log the error using
autoLoadLog.warn with skill.name and err (or rethrow after logging) instead of
silently returning [] so the warning path runs and the error is not hidden.
Ensure you reference and update the catch handling in applyPaths (and any helper
that returns [] on error) to either log via autoLoadLog.warn(...) or rethrow the
error.
- Line 109: The code injects skill.name directly into an XML attribute in the
parts.push call that builds `<auto_loaded_skill name="...">`; escape XML
attribute characters instead to prevent breaking the prompt (convert &, <, >, "
and ' to entities). Add or reuse a helper like escapeXmlAttr(name: string) and
call it where the code currently uses skill.name so parts.push uses the escaped
value (e.g., parts.push(`<auto_loaded_skill
name="${escapeXmlAttr(skill.name)}">`)). Ensure the helper is applied
consistently for any other XML attribute insertions in the same module.

In `@packages/opencode/src/session/validators/registry.ts`:
- Around line 14-71: ValidatorRegistry.runAll uses raw async/Promise calls and
must be converted to return an Effect so tracing via Effect.gen/Effect.fn is
preserved; update the Validator interface so appliesTo and check return
Effect<...> instead of Promise, change runAll (and its signature) to build and
return an Effect that iterates validators.values() inside an Effect.gen,
invoking v.appliesTo and v.check via
Effect.fn("Validator.appliesTo")/Effect.fn("Validator.check") (or wrap calls
with Effect.attempt or Effect.try) and translate thrown errors into the same
skipped/soft-pass results inside the Effect; finally, update callers (e.g., the
caller in packages/opencode/src/session/prompt.ts) to execute/run the returned
Effect rather than awaiting Promises so existing behavior and logging of
validator errors is preserved.

In `@research/kimi-k26-ade-bench-2026-05-10/findings.md`:
- Around line 511-513: The summary line "Distinct trials that invoked any skill:
**18 / 81**" is using the wrong denominator; update that fraction to use the
consistent 78 graded trials used elsewhere (i.e., change "**18 / 81**" to "**18
/ 78**" in the findings.md summary) and scan nearby summary lines like "Total
skill invocations: **19**" to ensure all denominators match the canonical 78.
- Around line 87-92: The fenced code blocks in findings.md are missing language
identifiers (e.g., the triple-backtick lines are just "```"); update each
untyped fenced block (the block around the checklist shown and the other block
at lines referenced in the review) to include a language tag such as ```text or
```md (or ```bash if it's shell commands) so they comply with MD040 and render
consistently; locate the untyped triple-backtick fences in the file and replace
them with language-specified fences for both occurrences.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 30137a2f-09d3-4752-8379-35d41acc6a61

📥 Commits

Reviewing files that changed from the base of the PR and between 56259a3 and 7ca5a36.

📒 Files selected for processing (28)
  • .opencode/skills/dbt-develop/SKILL.md
  • .opencode/skills/dbt-schema-verify/SKILL.md
  • .opencode/skills/dbt-unit-tests/SKILL.md
  • benchmark/ade-bench/README.md
  • benchmark/ade-bench/altimate_code_agent/__init__.py
  • benchmark/ade-bench/altimate_code_agent/altimate-code-setup.sh
  • benchmark/ade-bench/altimate_code_agent/altimate_code_agent.py
  • benchmark/ade-bench/altimate_code_agent/build-local-tarball.sh
  • benchmark/ade-bench/patches/01-agent_name.py.patch
  • benchmark/ade-bench/patches/02-agent_factory.py.patch
  • benchmark/ade-bench/patches/03-installed_agents_init.py.patch
  • benchmark/ade-bench/patches/04-agent_setup.py.patch
  • docs/docs/configure/skills.md
  • packages/dbt-tools/src/commands/build.ts
  • packages/dbt-tools/src/commands/schema-verify.ts
  • packages/dbt-tools/src/index.ts
  • packages/dbt-tools/test/build.test.ts
  • packages/dbt-tools/test/schema-verify.test.ts
  • packages/opencode/src/altimate/validators/dbt-schema-verify.ts
  • packages/opencode/src/altimate/validators/dbt-tests-pass.ts
  • packages/opencode/src/altimate/validators/index.ts
  • packages/opencode/src/session/prompt.ts
  • packages/opencode/src/session/system.ts
  • packages/opencode/src/session/validators/registry.ts
  • packages/opencode/src/session/validators/types.ts
  • packages/opencode/src/skill/skill.ts
  • research/kimi-k26-ade-bench-2026-05-10/README.md
  • research/kimi-k26-ade-bench-2026-05-10/findings.md

Comment on lines +227 to +229
if self._model_name:
command += f" --model {self._model_name}"
command += f" --max-turns 80 {escaped_prompt}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Escape self._model_name before shell interpolation.

On Line 228, self._model_name is appended directly into the shell command. A model string with shell metacharacters can alter command execution.

Suggested fix
         if self._model_name:
-            command += f" --model {self._model_name}"
+            command += f" --model {shlex.quote(self._model_name)}"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if self._model_name:
command += f" --model {self._model_name}"
command += f" --max-turns 80 {escaped_prompt}"
if self._model_name:
command += f" --model {shlex.quote(self._model_name)}"
command += f" --max-turns 80 {escaped_prompt}"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@benchmark/ade-bench/altimate_code_agent/altimate_code_agent.py` around lines
227 - 229, The code appends self._model_name directly into the shell command
string (building variable command) which is unsafe; modify the command
construction in altimate_code_agent.py where command is built so that the model
name is safely escaped (e.g., wrap self._model_name with shlex.quote OR
preferably stop using a single shell-interpolated string and build command as a
list and pass to subprocess without shell=True), replacing the direct f" --model
{self._model_name}" interpolation to an escaped or argument-list approach to
prevent shell injection.

Comment on lines +27 to +30
else
echo "Local tarball not staged; falling back to latest published"
npm install -g --no-audit --no-fund @altimateai/altimate-code@latest
fi
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid @latest fallback in benchmark setup.

Line 29 installs @latest, which makes runs non-deterministic across time. For benchmark reproducibility, fail fast (or pin an explicit version) when the staged tarball is missing.

Suggested fix
 else
-  echo "Local tarball not staged; falling back to latest published"
-  npm install -g --no-audit --no-fund `@altimateai/altimate-code`@latest
+  echo "Local tarball not staged; refusing `@latest` for reproducibility" >&2
+  exit 1
 fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
else
echo "Local tarball not staged; falling back to latest published"
npm install -g --no-audit --no-fund @altimateai/altimate-code@latest
fi
else
echo "Local tarball not staged; refusing `@latest` for reproducibility" >&2
exit 1
fi
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@benchmark/ade-bench/altimate_code_agent/altimate-code-setup.sh` around lines
27 - 30, The script altimate-code-setup.sh currently falls back to installing
`@altimateai/altimate-code`@latest when a local tarball isn't staged; change this
to avoid non-deterministic installs by failing fast instead (or pinning a known
version). Modify the else branch that echoes "Local tarball not staged; falling
back to latest published" to either exit with a non-zero status (e.g., echo an
error and run exit 1) so the benchmark run stops when the tarball is missing, or
replace the npm install target with a fixed, documented version string if you
must auto-install; ensure the branch updates the message to reflect the new
behavior.

Comment on lines +11 to +14
REPO_ROOT="$(cd "$SCRIPT_DIR/../../../../../.." && pwd)"
PKG_DIR="$REPO_ROOT/packages/opencode"
DBT_TOOLS_DIR="$REPO_ROOT/packages/dbt-tools"
SKILLS_DIR="$REPO_ROOT/.opencode/skills"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Fix repository root traversal depth.

Line 11 climbs too many directories and can resolve REPO_ROOT outside the repository, which then breaks all downstream file checks and copies.

Suggested fix
-REPO_ROOT="$(cd "$SCRIPT_DIR/../../../../../.." && pwd)"
+REPO_ROOT="$(cd "$SCRIPT_DIR/../../.." && pwd)"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
REPO_ROOT="$(cd "$SCRIPT_DIR/../../../../../.." && pwd)"
PKG_DIR="$REPO_ROOT/packages/opencode"
DBT_TOOLS_DIR="$REPO_ROOT/packages/dbt-tools"
SKILLS_DIR="$REPO_ROOT/.opencode/skills"
REPO_ROOT="$(cd "$SCRIPT_DIR/../../.." && pwd)"
PKG_DIR="$REPO_ROOT/packages/opencode"
DBT_TOOLS_DIR="$REPO_ROOT/packages/dbt-tools"
SKILLS_DIR="$REPO_ROOT/.opencode/skills"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@benchmark/ade-bench/altimate_code_agent/build-local-tarball.sh` around lines
11 - 14, REPO_ROOT is computed by climbing too many directories ("$(cd
"$SCRIPT_DIR/../../../../../.." && pwd)") which can resolve outside the repo;
change the traversal to the correct depth (e.g. "$(cd "$SCRIPT_DIR/../../.." &&
pwd)") so REPO_ROOT points to the repository root and update dependent vars
PKG_DIR, DBT_TOOLS_DIR and SKILLS_DIR remain based on REPO_ROOT; locate and
replace the REPO_ROOT assignment in build-local-tarball.sh to use the corrected
relative path.

Comment on lines +9 to +22
```
benchmark/ade-bench/
├── README.md ← you are here
├── altimate_code_agent/ ← drop-in agent module for ade-bench
│ ├── __init__.py
│ ├── altimate_code_agent.py ← the AltimateCodeAgent class
│ ├── altimate-code-setup.sh ← installs altimate-code inside the trial container
│ └── build-local-tarball.sh ← builds the linux/x64+arm64 tarball from source
└── patches/ ← 4 small patches to upstream ade-bench
├── 01-agent_name.py.patch
├── 02-agent_factory.py.patch
├── 03-installed_agents_init.py.patch
└── 04-agent_setup.py.patch
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language identifier to the fenced tree block (MD040).

The directory tree code fence is missing a language tag, which triggers markdownlint (MD040).

Proposed fix
-```
+```text
 benchmark/ade-bench/
 ├── README.md                              ← you are here
 ├── altimate_code_agent/                   ← drop-in agent module for ade-bench
 │   ├── __init__.py
 │   ├── altimate_code_agent.py             ← the AltimateCodeAgent class
 │   ├── altimate-code-setup.sh             ← installs altimate-code inside the trial container
 │   └── build-local-tarball.sh             ← builds the linux/x64+arm64 tarball from source
 └── patches/                               ← 4 small patches to upstream ade-bench
     ├── 01-agent_name.py.patch
     ├── 02-agent_factory.py.patch
     ├── 03-installed_agents_init.py.patch
     └── 04-agent_setup.py.patch
</details>

<!-- suggestion_start -->

<details>
<summary>📝 Committable suggestion</summary>

> ‼️ **IMPORTANT**
> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

```suggestion

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 9-9: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@benchmark/ade-bench/README.md` around lines 9 - 22, The README.md's fenced
directory-tree code block is missing a language tag (MD040); update the opening
fence for the tree block in README.md to include a language identifier (e.g.,
change ``` to ```text) so the snippet is properly tagged, leaving the content
and closing fence unchanged.

Comment on lines +65 to +67
At session start, after the standard `<available_skills>` listing, every matched
skill body is appended to the system prompt under:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Docs order is reversed vs implementation.

Lines 65-67 say auto-loaded skills are appended after <available_skills>, but implementation prepends them first in the skills section. Please align wording with runtime behavior.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/docs/configure/skills.md` around lines 65 - 67, The docs currently say
auto-loaded skills are appended after <available_skills>, but runtime prepends
matched skill bodies into the skills section; update the wording around session
start / the skills section (lines mentioning "<available_skills>") to state that
every matched skill body is prepended to the system prompt in the skills section
(before the <available_skills> listing) so the documentation matches the
implementation.

)
for (const skill of autoLoaded) {
parts.push("")
parts.push(`<auto_loaded_skill name="${skill.name}">`)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Escape skill names before embedding in the XML attribute.

On Line 109, skill.name is injected raw into name="...". A quote or < in frontmatter can break prompt structure and alter the injected block.

Suggested fix
+  function escapeXmlAttr(value: string): string {
+    return value
+      .replaceAll("&", "&amp;")
+      .replaceAll('"', "&quot;")
+      .replaceAll("<", "&lt;")
+      .replaceAll(">", "&gt;")
+  }
+
   export namespace SystemPrompt {
@@
-        parts.push(`<auto_loaded_skill name="${skill.name}">`)
+        parts.push(`<auto_loaded_skill name="${escapeXmlAttr(skill.name)}">`)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
parts.push(`<auto_loaded_skill name="${skill.name}">`)
function escapeXmlAttr(value: string): string {
return value
.replaceAll("&", "&amp;")
.replaceAll('"', "&quot;")
.replaceAll("<", "&lt;")
.replaceAll(">", "&gt;")
}
export namespace SystemPrompt {
// ... existing code ...
parts.push(`<auto_loaded_skill name="${escapeXmlAttr(skill.name)}">`)
// ... rest of existing code ...
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/opencode/src/session/system.ts` at line 109, The code injects
skill.name directly into an XML attribute in the parts.push call that builds
`<auto_loaded_skill name="...">`; escape XML attribute characters instead to
prevent breaking the prompt (convert &, <, >, " and ' to entities). Add or reuse
a helper like escapeXmlAttr(name: string) and call it where the code currently
uses skill.name so parts.push uses the escaped value (e.g.,
parts.push(`<auto_loaded_skill name="${escapeXmlAttr(skill.name)}">`)). Ensure
the helper is applied consistently for any other XML attribute insertions in the
same module.

Comment on lines +147 to +149
} catch (err) {
autoLoadLog.warn("applyPaths glob scan failed", { skill: skill.name, err })
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Don’t swallow glob-scan failures if you intend to warn on them.

Line 171 catches and converts scan errors to [], so the warning path in Lines 147-149 never runs.

Suggested fix
-      }).catch(() => [] as string[])
+      })

Also applies to: 171-171

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/opencode/src/session/system.ts` around lines 147 - 149, The code
currently logs a warning in the catch inside applyPaths
(autoLoadLog.warn("applyPaths glob scan failed", { skill: skill.name, err }))
but a later catch (the one that converts scan errors to []) swallows the error
so that warning never runs; update the catch that currently converts errors to
[] (inside applyPaths or its helper) to log the error using autoLoadLog.warn
with skill.name and err (or rethrow after logging) instead of silently returning
[] so the warning path runs and the error is not hidden. Ensure you reference
and update the catch handling in applyPaths (and any helper that returns [] on
error) to either log via autoLoadLog.warn(...) or rethrow the error.

Comment on lines +14 to +71
export const ValidatorRegistry = {
/** Register a validator. Overwrites any prior registration with the same name. */
register(v: Validator): void {
validators.set(v.name, v)
},

/** Remove a validator. Mostly useful for tests. */
unregister(name: string): void {
validators.delete(name)
},

/** Snapshot the current list of registered validators. */
list(): readonly Validator[] {
return Array.from(validators.values())
},

/** Reset all registrations. Tests only. */
clear(): void {
validators.clear()
},

/**
* Run all validators that apply to the given context. Returns one entry
* per validator that was relevant (skipped validators are NOT in the
* result list). Validators that themselves throw are caught and logged;
* the framework converts them to a {ok: true} skip so a buggy validator
* cannot brick the agent loop.
*/
async runAll(ctx: ValidatorContext): Promise<Array<{ validator: Validator; result: ValidatorResult }>> {
const out: Array<{ validator: Validator; result: ValidatorResult }> = []
for (const v of validators.values()) {
let applies = false
try {
applies = await v.appliesTo(ctx)
} catch {
// appliesTo() throwing is a validator bug; skip rather than block agent.
continue
}
if (!applies) continue
try {
const result = await v.check(ctx)
out.push({ validator: v, result })
} catch (e) {
// check() throwing is also a validator bug; record as a soft pass so
// the agent isn't stuck behind a broken validator. The thrown error
// is logged by the caller.
out.push({
validator: v,
result: {
ok: true,
details: { error: e instanceof Error ? e.message : String(e), skipped_due_to_validator_error: true },
},
})
}
}
return out
},
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Locate and print the reviewed file
FILE="packages/opencode/src/session/validators/registry.ts"
echo "=== File: $FILE ==="
wc -l "$FILE"
sed -n '1,220p' "$FILE" | nl -ba

# Check for Effect usage and any Promise-based exports in the file
echo
echo "=== Search within file for Effect / Promise ==="
rg -n "Effect\.|Promise<|async " "$FILE" || true

# Find usages of ValidatorRegistry.runAll (and register/unregister/list) across the repo
echo
echo "=== Usages of ValidatorRegistry.runAll ==="
rg -n "ValidatorRegistry\.runAll" -S . || true

echo
echo "=== Usages of ValidatorRegistry.register/unregister/list/clear ==="
rg -n "ValidatorRegistry\.(register|unregister|list|clear)" -S . || true

# Find the Validator interface/type to see expected return types for appliesTo/check
echo
echo "=== Find Validator / ValidatorContext / ValidatorResult definitions ==="
rg -n "type Validator|interface Validator|ValidatorContext|ValidatorResult" packages/opencode/src/session/validators -S || true

Repository: AltimateAI/altimate-code

Length of output: 252


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="packages/opencode/src/session/validators/registry.ts"
echo "=== File: $FILE ==="
wc -l "$FILE"
echo "--- content (with line numbers) ---"
awk '{printf "%5d\t%s\n", NR, $0}' "$FILE" | sed -n '1,200p'

echo
echo "=== Search within file for Effect / Promise ==="
rg -n "Effect\.|Promise<|async " "$FILE" || true

echo
echo "=== Usages of ValidatorRegistry.runAll ==="
rg -n "ValidatorRegistry\.runAll" -S packages/opencode || rg -n "ValidatorRegistry\.runAll" -S . || true

echo
echo "=== Usages of ValidatorRegistry.register/unregister/list/clear ==="
rg -n "ValidatorRegistry\.(register|unregister|list|clear)" -S packages/opencode || rg -n "ValidatorRegistry\.(register|unregister|list|clear)" -S . || true

echo
echo "=== Find Validator / ValidatorContext / ValidatorResult definitions ==="
rg -n "interface Validator|type Validator|ValidatorContext|ValidatorResult" packages/opencode/src/session/validators -S || true

Repository: AltimateAI/altimate-code

Length of output: 4941


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="packages/opencode/src/session/prompt.ts"
echo "=== File: $FILE ==="
wc -l "$FILE"
echo "--- snippet around ValidatorRegistry.runAll usage ---"
# Use sed to avoid huge output
sed -n '1060,1160p' "$FILE" | awk '{printf "%5d\t%s\n", NR+1060, $0}'

echo
echo "=== Search for surrounding Effect usage in prompt.ts ---"
rg -n "Effect\.gen|Effect\.fn|yield\*|runAll\(vCtx\)|ValidatorRegistry\.runAll" "$FILE" || true

Repository: AltimateAI/altimate-code

Length of output: 5417


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="packages/opencode/src/session/prompt.ts"

# Show a bit more context above and below the call site
sed -n '980,1165p' "$FILE" | awk '{print NR+980-1 "\t" $0}' | sed -n '1,220p'

echo
echo "=== Quick scan for Effect usage in prompt.ts ==="
rg -n "Effect\.gen|Effect\.fnUntraced|Effect\.fn\(|yield\*|runPromise|runFork" "$FILE" || true

echo
echo "=== Scan for async/Promise-only patterns near the call site ==="
sed -n '1040,1160p' "$FILE" | rg -n "async |await |Promise<" || true

Repository: AltimateAI/altimate-code

Length of output: 9636


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Show the altimate validators index
IDX="packages/opencode/src/altimate/validators/index.ts"
echo "=== $IDX ==="
wc -l "$IDX"
sed -n '1,200p' "$IDX" | nl -ba 2>/dev/null || sed -n '1,200p' "$IDX"

echo
echo "=== Find validator implementation files referenced in that index ==="
rg -n "register\\(([^)]+)\\)" "$IDX" || true

echo
echo "=== Search for implementations of appliesTo/check ==="
rg -n "appliesTo\\(|check\\(" packages/opencode/src/altimate/validators -S || true

echo
echo "=== Show the registered validator(s) files (best-effort) ==="
# Extract identifiers used in register(...) and locate their definition
IDS=$(rg -n "ValidatorRegistry\\.register\\(([^)]+)\\)" "$IDX" -o --replace '$1' | tr -d ' ' | tr '\n' ' ' | sed 's/;*$//')
echo "Identifiers: $IDS"

# Use fd to find candidate TS files containing each identifier
for id in $(echo "$IDS" | tr ' ' '\n' | sed '/^$/d'); do
  echo
  echo "--- searching for $id ---"
  fd -e ts -e tsx -t f ".*" packages/opencode/src/altimate/validators >/dev/null 2>&1 || true
  rg -n "\\b$id\\b" packages/opencode/src/altimate/validators -S --hidden || true
done

# Finally, inspect the specific types
TYPES="packages/opencode/src/session/validators/types.ts"
echo
echo "=== $TYPES (relevant parts) ==="
wc -l "$TYPES"
sed -n '1,220p' "$TYPES" | awk 'NR>=35 && NR<=160 {print NR "\t" $0}'

Repository: AltimateAI/altimate-code

Length of output: 5592


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Inspect prompt.ts imports / top-level for Effect usage
PROMPT="packages/opencode/src/session/prompt.ts"
echo "=== Top of $PROMPT ==="
sed -n '1,120p' "$PROMPT" | awk '{print NR "\t" $0}'

echo
echo "=== Effect usage scan in prompt.ts ==="
rg -n "from \"effect|Effect\\.|effect/" "$PROMPT" || true
rg -n "\\bEffect\\." "$PROMPT" || true

# 2) Inspect validator implementations around appliesTo/check
for V in \
  packages/opencode/src/altimate/validators/dbt-schema-verify.ts \
  packages/opencode/src/altimate/validators/dbt-tests-pass.ts
do
  echo
  echo "=== $V (appliesTo/check sections) ==="
  wc -l "$V"
  # show where appliesTo/check are defined
  rg -n "async appliesTo\\(|async check\\(" "$V"
  # show a window around the first occurrence
  START=$(rg -n "async appliesTo\\(|async check\\(" "$V" | head -n1 | cut -d: -f1)
  END=$((START+120))
  sed -n "${START},${END}p" "$V" | awk '{print NR+START-1 "\t" $0}'
  
  echo
  echo "=== Effect/FileSystem usage scan in $V ==="
  rg -n "Effect\\.|FileSystem\\.|HttpClient\\.|ChildProcess" "$V" || true
done

Repository: AltimateAI/altimate-code

Length of output: 10594


Effectify ValidatorRegistry.runAll (and validator dispatch) to preserve traced Effect composition

packages/opencode/src/session/validators/registry.ts implements ValidatorRegistry.runAll as raw async/Promise control flow (await v.appliesTo / await v.check), which bypasses the Effect.gen(...) + Effect.fn("Domain.method") traced Effect composition guidelines for packages/opencode/**/*.{ts,tsx}. Refactor runAll (and the Validator appliesTo/check contracts) to return Effect so the caller (packages/opencode/src/session/prompt.ts) can execute it within the Effect runtime.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/opencode/src/session/validators/registry.ts` around lines 14 - 71,
ValidatorRegistry.runAll uses raw async/Promise calls and must be converted to
return an Effect so tracing via Effect.gen/Effect.fn is preserved; update the
Validator interface so appliesTo and check return Effect<...> instead of
Promise, change runAll (and its signature) to build and return an Effect that
iterates validators.values() inside an Effect.gen, invoking v.appliesTo and
v.check via Effect.fn("Validator.appliesTo")/Effect.fn("Validator.check") (or
wrap calls with Effect.attempt or Effect.try) and translate thrown errors into
the same skipped/soft-pass results inside the Effect; finally, update callers
(e.g., the caller in packages/opencode/src/session/prompt.ts) to execute/run the
returned Effect rather than awaiting Promises so existing behavior and logging
of validator errors is preserved.

Comment on lines +87 to +92
```
[pending] Add position_descriptions to f1_dataset.yml sources
[pending] Create src_<model>.sql views in models/src/ pointing to source tables
[pending] Update staging models to reference src_ models instead of raw tables
[pending] Run dbt build to verify everything compiles and builds successfully
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add languages to fenced code blocks for markdownlint compliance.

Both fenced blocks are untyped; add a language (text, md, or bash) to satisfy MD040 and improve rendering consistency.

Also applies to: 209-216

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 87-87: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@research/kimi-k26-ade-bench-2026-05-10/findings.md` around lines 87 - 92, The
fenced code blocks in findings.md are missing language identifiers (e.g., the
triple-backtick lines are just "```"); update each untyped fenced block (the
block around the checklist shown and the other block at lines referenced in the
review) to include a language tag such as ```text or ```md (or ```bash if it's
shell commands) so they comply with MD040 and render consistently; locate the
untyped triple-backtick fences in the file and replace them with
language-specified fences for both occurrences.

Comment on lines +511 to +513
Total skill invocations: **19**
Distinct trials that invoked any skill: **18 / 81**

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix inconsistent trial denominator in skill-invocation summary.

Line 512 reports 18 / 81, but this report consistently uses 78 graded trials elsewhere, so this denominator likely needs correction.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@research/kimi-k26-ade-bench-2026-05-10/findings.md` around lines 511 - 513,
The summary line "Distinct trials that invoked any skill: **18 / 81**" is using
the wrong denominator; update that fraction to use the consistent 78 graded
trials used elsewhere (i.e., change "**18 / 81**" to "**18 / 78**" in the
findings.md summary) and scan nearby summary lines like "Total skill
invocations: **19**" to ensure all denominators match the canonical 78.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

18 issues found across 28 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name=".opencode/skills/dbt-unit-tests/SKILL.md">

<violation number="1" location=".opencode/skills/dbt-unit-tests/SKILL.md:43">
P2: The window-function guidance requires testing an "empty partition," but that case is not directly representable in SQL window output. This can mislead test authors into chasing an untestable condition.</violation>
</file>

<file name="packages/opencode/src/session/validators/registry.ts">

<violation number="1" location="packages/opencode/src/session/validators/registry.ts:48">
P2: `appliesTo()` exceptions are swallowed without any surfaced error details, so broken validators can be silently skipped and become hard to diagnose.</violation>
</file>

<file name=".opencode/skills/dbt-schema-verify/SKILL.md">

<violation number="1" location=".opencode/skills/dbt-schema-verify/SKILL.md:118">
P2: The fallback verification command is incomplete (and brittle): it only searches `schema.yml`, so `_models.yml` declarations can be missed during manual schema checks.</violation>
</file>

<file name="benchmark/ade-bench/altimate_code_agent/altimate-code-setup.sh">

<violation number="1" location="benchmark/ade-bench/altimate_code_agent/altimate-code-setup.sh:92">
P2: The config file contains API secrets but is written without restrictive permissions, which can leave credentials world-readable under default umask.</violation>
</file>

<file name="benchmark/ade-bench/altimate_code_agent/build-local-tarball.sh">

<violation number="1" location="benchmark/ade-bench/altimate_code_agent/build-local-tarball.sh:11">
P1: `REPO_ROOT` is derived with too many `..` segments, so the script can resolve paths outside the repository and fail to find build artifacts.</violation>
</file>

<file name="research/kimi-k26-ade-bench-2026-05-10/findings.md">

<violation number="1" location="research/kimi-k26-ade-bench-2026-05-10/findings.md:111">
P3: The “step-to-step gaps” row uses a start-to-start interval, which conflicts with the glossary’s gap definition (finish-to-next-start) and can mislead interpretation of timing breakdowns.</violation>

<violation number="2" location="research/kimi-k26-ade-bench-2026-05-10/findings.md:236">
P2: The f1011 failure description contradicts the appendix data (`check_option_b` is described as passed, but appendix rows indicate it failed).</violation>

<violation number="3" location="research/kimi-k26-ade-bench-2026-05-10/findings.md:250">
P3: The lead-in failure count is inconsistent with the per-domain breakdown in the same sentence (listed counts sum to 21, not 19).</violation>
</file>

<file name="packages/opencode/src/session/system.ts">

<violation number="1" location="packages/opencode/src/session/system.ts:109">
P2: Escape `skill.name` before interpolating it into the `name="..."` attribute to avoid malformed prompt XML and prompt-structure injection.</violation>
</file>

<file name="benchmark/ade-bench/altimate_code_agent/altimate_code_agent.py">

<violation number="1" location="benchmark/ade-bench/altimate_code_agent/altimate_code_agent.py:134">
P2: `num_turns` is undercounted because it only increments on `reason == "stop"` instead of every `step_finish` event.</violation>

<violation number="2" location="benchmark/ade-bench/altimate_code_agent/altimate_code_agent.py:228">
P1: `--model` value is not shell-escaped, which allows malformed or injected shell arguments from model configuration.</violation>
</file>

<file name="packages/dbt-tools/src/commands/build.ts">

<violation number="1" location="packages/dbt-tools/src/commands/build.ts:76">
P1: Project-wide schema verification uses non-unique model selectors (`name`), so duplicate model names can be verified against the wrong manifest node.</violation>
</file>

<file name="docs/docs/configure/skills.md">

<violation number="1" location="docs/docs/configure/skills.md:65">
P2: The docs describe auto-loaded skills as being appended after `<available_skills>`, but the implementation places them before that block. This mismatch can mislead users debugging prompt composition.</violation>
</file>

<file name=".opencode/skills/dbt-develop/SKILL.md">

<violation number="1" location=".opencode/skills/dbt-develop/SKILL.md:278">
P2: This guidance incorrectly describes `CONCAT_WS` behavior/support for Snowflake and BigQuery. It can lead to invalid BigQuery SQL and wrong NULL-handling assumptions in Snowflake.</violation>
</file>

Tip: cubic can generate docs of your entire codebase and keep them up to date. Try it here.

Re-trigger cubic

set -euo pipefail

SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
REPO_ROOT="$(cd "$SCRIPT_DIR/../../../../../.." && pwd)"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: REPO_ROOT is derived with too many .. segments, so the script can resolve paths outside the repository and fail to find build artifacts.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At benchmark/ade-bench/altimate_code_agent/build-local-tarball.sh, line 11:

<comment>`REPO_ROOT` is derived with too many `..` segments, so the script can resolve paths outside the repository and fail to find build artifacts.</comment>

<file context>
@@ -0,0 +1,91 @@
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
+REPO_ROOT="$(cd "$SCRIPT_DIR/../../../../../.." && pwd)"
+PKG_DIR="$REPO_ROOT/packages/opencode"
+DBT_TOOLS_DIR="$REPO_ROOT/packages/dbt-tools"
</file context>

Comment thread packages/opencode/src/altimate/validators/dbt-schema-verify.ts
command = f"echo 'AGENT RESPONSE: ' && altimate-code run --format json --yolo"

if self._model_name:
command += f" --model {self._model_name}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: --model value is not shell-escaped, which allows malformed or injected shell arguments from model configuration.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At benchmark/ade-bench/altimate_code_agent/altimate_code_agent.py, line 228:

<comment>`--model` value is not shell-escaped, which allows malformed or injected shell arguments from model configuration.</comment>

<file context>
@@ -0,0 +1,264 @@
+        command = f"echo 'AGENT RESPONSE: ' && altimate-code run --format json --yolo"
+
+        if self._model_name:
+            command += f" --model {self._model_name}"
+        command += f" --max-turns 80 {escaped_prompt}"
+
</file context>

Comment thread packages/opencode/src/altimate/validators/dbt-tests-pass.ts
continue
}
try {
const v = await schemaVerify(adapter, ["--model", name])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Project-wide schema verification uses non-unique model selectors (name), so duplicate model names can be verified against the wrong manifest node.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/dbt-tools/src/commands/build.ts, line 76:

<comment>Project-wide schema verification uses non-unique model selectors (`name`), so duplicate model names can be verified against the wrong manifest node.</comment>

<file context>
@@ -36,7 +46,77 @@ export async function test(adapter: DBTProjectIntegrationAdapter, args: string[]
+        continue
+      }
+      try {
+        const v = await schemaVerify(adapter, ["--model", name])
+        if ("error" in v) {
+          errored.push({ model: name, error: String((v as { error: unknown }).error) })
</file context>

Comment thread packages/opencode/src/altimate/validators/dbt-tests-pass.ts Outdated
---
```

At session start, after the standard `<available_skills>` listing, every matched
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: The docs describe auto-loaded skills as being appended after <available_skills>, but the implementation places them before that block. This mismatch can mislead users debugging prompt composition.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At docs/docs/configure/skills.md, line 65:

<comment>The docs describe auto-loaded skills as being appended after `<available_skills>`, but the implementation places them before that block. This mismatch can mislead users debugging prompt composition.</comment>

<file context>
@@ -28,7 +28,75 @@ Focus on the query: $ARGUMENTS
+---
+```
+
+At session start, after the standard `<available_skills>` listing, every matched
+skill body is appended to the system prompt under:
+
</file context>
Suggested change
At session start, after the standard `<available_skills>` listing, every matched
At session start, before the standard `<available_skills>` listing, every matched

Comment on lines +278 to +280
Use `CONCAT_WS()` if your dialect supports it (Snowflake, BigQuery) — it
skips `NULL` operands instead of propagating them, which is usually safer
than a static placeholder.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: This guidance incorrectly describes CONCAT_WS behavior/support for Snowflake and BigQuery. It can lead to invalid BigQuery SQL and wrong NULL-handling assumptions in Snowflake.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At .opencode/skills/dbt-develop/SKILL.md, line 278:

<comment>This guidance incorrectly describes `CONCAT_WS` behavior/support for Snowflake and BigQuery. It can lead to invalid BigQuery SQL and wrong NULL-handling assumptions in Snowflake.</comment>

<file context>
@@ -127,6 +202,203 @@ Use `altimate-dbt children` and `altimate-dbt parents` to verify the DAG is inta
+-- Right: explicit placeholder
+COALESCE(region, 'UNKNOWN') || '-' || COALESCE(segment, 'UNKNOWN') AS geo_segment
+```
+Use `CONCAT_WS()` if your dialect supports it (Snowflake, BigQuery) — it
+skips `NULL` operands instead of propagating them, which is usually safer
+than a static placeholder.
</file context>
Suggested change
Use `CONCAT_WS()` if your dialect supports it (Snowflake, BigQuery) — it
skips `NULL` operands instead of propagating them, which is usually safer
than a static placeholder.
Use dialect-specific null-safe concatenation patterns instead. BigQuery does not provide `CONCAT_WS()`; use `ARRAY_TO_STRING([region, segment], '-')` when you want to skip `NULL` values.
In Snowflake, `CONCAT_WS()` returns `NULL` if any argument is `NULL`, so keep explicit `COALESCE(...)` placeholders there.


### Failure distribution

The 19 failures span every task domain: 3 of 13 airbnb, 2 of 7 analytics_engineering, 4 of 5 asana variants, 5 of 13 f1 variants, 4 of 18 helixops_saas variants, 1 of 3 intercom, 1 of 2 simple, 1 of 1 workday. No domain-specific weakness — failures are uniformly distributed by domain and concentrated by failure mode.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3: The lead-in failure count is inconsistent with the per-domain breakdown in the same sentence (listed counts sum to 21, not 19).

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At research/kimi-k26-ade-bench-2026-05-10/findings.md, line 250:

<comment>The lead-in failure count is inconsistent with the per-domain breakdown in the same sentence (listed counts sum to 21, not 19).</comment>

<file context>
@@ -0,0 +1,601 @@
+
+### Failure distribution
+
+The 19 failures span every task domain: 3 of 13 airbnb, 2 of 7 analytics_engineering, 4 of 5 asana variants, 5 of 13 f1 variants, 4 of 18 helixops_saas variants, 1 of 3 intercom, 1 of 2 simple, 1 of 1 workday. No domain-specific weakness — failures are uniformly distributed by domain and concentrated by failure mode.
+
+---
</file context>
Suggested change
The 19 failures span every task domain: 3 of 13 airbnb, 2 of 7 analytics_engineering, 4 of 5 asana variants, 5 of 13 f1 variants, 4 of 18 helixops_saas variants, 1 of 3 intercom, 1 of 2 simple, 1 of 1 workday. No domain-specific weakness — failures are uniformly distributed by domain and concentrated by failure mode.
The 21 failures span every task domain: 3 of 13 airbnb, 2 of 7 analytics_engineering, 4 of 5 asana variants, 5 of 13 f1 variants, 4 of 18 helixops_saas variants, 1 of 3 intercom, 1 of 2 simple, 1 of 1 workday. No domain-specific weakness — failures are uniformly distributed by domain and concentrated by failure mode.

| Phase | Total time | Share of wall |
|---|---:|---:|
| Step duration (`step_start → step_finish`: model generation + tool dispatch) | 22,745 s | 66.1% |
| Step-to-step gaps (`step_start → next step_start`) | 30,672 s | 89.2% |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3: The “step-to-step gaps” row uses a start-to-start interval, which conflicts with the glossary’s gap definition (finish-to-next-start) and can mislead interpretation of timing breakdowns.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At research/kimi-k26-ade-bench-2026-05-10/findings.md, line 111:

<comment>The “step-to-step gaps” row uses a start-to-start interval, which conflicts with the glossary’s gap definition (finish-to-next-start) and can mislead interpretation of timing breakdowns.</comment>

<file context>
@@ -0,0 +1,601 @@
+| Phase | Total time | Share of wall |
+|---|---:|---:|
+| Step duration (`step_start → step_finish`: model generation + tool dispatch) | 22,745 s | 66.1% |
+| Step-to-step gaps (`step_start → next step_start`) | 30,672 s | 89.2% |
+| Tool execution (sum of all individual `tool_use` durations) | 1,690 s | 4.9% |
+| Total runtime | 34,402 s | 100% |
</file context>

Copy link
Copy Markdown

@dev-punia-altimate dev-punia-altimate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi-Persona Review — Verdict: comment

Multi-persona review completed.

6/6 agents completed · 118s · 0 findings (0 critical, 0 high, 0 medium)


Multi-Persona Review · vllm:qwen3-next-80b (waves) + vllm-fallback (synth) ·

All issues identified in the consensus review (Claude + 8 models) are
addressed in this commit.

**MAJOR fixes (blocking merge)**

- Add subprocess timeout (`ALTIMATE_VALIDATORS_TIMEOUT_MS`, default 60 s)
  to `runDbtTest` and `runSchemaVerify` — prevents the agent loop from
  hanging indefinitely on stalled DuckDB connections or warehouse I/O.
  Kills the child process with SIGKILL on timeout.

- Fix shadow telemetry gate: move `ValidatorRegistry.runAll()` and the
  per-validator `Telemetry.track()` loop outside the `validatorsEnabled`
  check in `prompt.ts`.  Telemetry now fires regardless of the feature
  flag, fulfilling the "measure before enforce" promise stated in the
  inline comment.

- Fix nested dbt project cwd bug: replace `isDbtProject(cwd): boolean`
  with `findDbtProjectRoot(cwd): Promise<string | null>` which returns
  the directory that actually contains `dbt_project.yml`.  Both
  validators now pass that root as `cwd` to subprocess invocations and
  as the base for `modelsModifiedSince`, preventing the "not a dbt
  project" error when the project is one level below the working dir.

- Extract shared helpers into `validator-utils.ts`: `findDbtProjectRoot`,
  `modelsModifiedSince`, `modelNameFromPath`, `extractLastJsonObject`.
  Both validator files now import from the shared module.  The validated
  `extractLastJsonObject` rejects stray JSON fragments (checks for
  `verdict`/`error`/`model`/`stdout`/`columns_*` keys) — the laxer
  version that was only in `dbt-tests-pass.ts` is gone.

- Add tests: 39 unit tests covering `extractLastJsonObject` (8 cases),
  `modelNameFromPath`, `findDbtProjectRoot` (5 cases),
  `modelsModifiedSince` (7 cases), and `parseDbtTestOutput` (10 cases
  including dbt 1.x format, ANSI prefixes, NO-OP variant, duplicate
  names, `[FAIL`/`[ERROR` token exclusion).

**MINOR fixes**

- Track spawn failures separately in `dbt-tests-pass.check()` and
  `dbt-schema-verify.check()`.  `details.spawn_failures` now appears in
  the validator result so operators can distinguish "skipped model" from
  "passed model".

- Add retries-exhausted telemetry: when `validatorRetryCount >=
  maxValidatorRetries` with failures outstanding, emit
  `validator_retries_exhausted` event and a `log.warn` so the session
  doesn't silently appear as "completed" in the operator dashboard.

- Parallel model checking: both `check()` functions now use
  `Promise.all` instead of a sequential `for` loop.

**NITS**

- Named regex groups in `parseDbtTestOutput` — replaces positional
  `summaryMatch[1]` / `[3]` / `[5]` captures; resilient to dbt
  reordering summary fields.

- Path separator: `modelsModifiedSince` and `modelNameFromPath` now use
  `path.sep` / `path.basename` instead of hardcoded `"/"`.

- Fix `ls | head -1` in `build-local-tarball.sh`: derive exact tarball
  name from `VERSION` variable; error out explicitly if not present.

- Fix stale comment "Limited to two-level deep search" in
  `dbt-schema-verify.ts` (actual depth was 4; comment is removed).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (3)
packages/opencode/src/session/prompt.ts (1)

1117-1120: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Pass the session/project directory to validators, not process cwd.

Using process.cwd() can point outside the active session tree, causing validators to scan/run in the wrong location.

Suggested fix
           const vCtx = {
             sessionID,
-            workingDirectory: process.cwd(),
+            workingDirectory: Instance.directory,
             sessionStartMs: sessionStartTime,
             step,
             retryCount: validatorRetryCount,
           }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/opencode/src/session/prompt.ts` around lines 1117 - 1120, vCtx
currently sets workingDirectory to process.cwd(), which can point outside the
active session; locate the session's actual directory variable (e.g.,
session.root, sessionDir, projectDir, or the value used when the session was
created) and set vCtx.workingDirectory to that session/project directory instead
of process.cwd(); update any callers that rely on vCtx (the vCtx object created
near the sessionID/workingDirectory/sessionStartMs block) so validators receive
the correct session path.
packages/opencode/src/altimate/validators/dbt-schema-verify.ts (1)

170-172: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Fail closed when schema checks cannot be executed.

Spawn failures are counted but still allow ok: true when there are no mismatches, which can bypass completion gating for unverified models.

Suggested fix
-    if (mismatches.length === 0) {
+    if (mismatches.length === 0 && spawnFailures === 0 && errored === 0) {
       return {
         ok: true,
         details: {
@@
       }
     }
+
+    if (mismatches.length === 0 && (spawnFailures > 0 || errored > 0)) {
+      return {
+        ok: false,
+        reason: "Some touched models could not be schema-verified.",
+        fixHint: "Resolve validator/runtime issues and rerun so every touched model is verified.",
+        details: {
+          models_touched: touched.length,
+          verified: results.length,
+          errored,
+          spawn_failures: spawnFailures,
+        },
+      }
+    }

Also applies to: 180-191

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/opencode/src/altimate/validators/dbt-schema-verify.ts` around lines
170 - 172, The code currently increments spawnFailures but still allows ok: true
when no mismatches are found; update the final decision logic so any
spawnFailures force a failure: when computing the overall result (the
variable/property named ok and/or the results object), set ok = false if
spawnFailures > 0 and add a descriptive failure entry (or push an error into the
mismatches/diagnostics array) so spawn failures are treated the same as schema
mismatches; adjust the block around spawnFailures++ and the later aggregation
(the code that produces ok) to ensure spawn failures between lines ~170 and ~191
flip the overall status and surface an error message.
packages/opencode/src/altimate/validators/dbt-tests-pass.ts (1)

113-116: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Do not treat subprocess startup failures as passing validation.

A spawn failure currently resolves to null and can still end in ok: true, which bypasses the completion gate for unverified models.

Suggested fix
-    child.on("error", () => {
+    child.on("error", (e) => {
       clearTimeout(timer)
-      resolve(null)
+      resolve({ model, error: `failed to start altimate-dbt: ${e.message}` })
     })
@@
-    const results: TestRunOutput[] = []
-    for (const out of outputs) {
-      if (out) results.push(out)
-      else spawnFailures++
-    }
+    const results: TestRunOutput[] = []
+    for (const out of outputs) {
+      if (out) results.push(out)
+      else spawnFailures++
+    }
@@
-    if (failures.length === 0 && errored.length === 0) {
+    if (failures.length === 0 && errored.length === 0 && spawnFailures === 0) {
       return {
         ok: true,
         details: {

Also applies to: 193-215

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/opencode/src/altimate/validators/dbt-tests-pass.ts` around lines 113
- 116, The child.on("error") handler currently clears the timeout and calls
resolve(null), which lets startup failures be treated as successful/unknown and
can lead to ok:true; change the handler to clearTimeout(timer) and instead
reject the Promise with the error (e.g., reject(err)) or resolve a failure
result object (e.g., { ok: false, error: err.message }) so startup errors are
propagated as failures; apply this change to both occurrences of
child.on("error") (the handler that uses clearTimeout(timer) and resolve(null)
around lines with child.on("error") and the similar block at lines 193–215), and
ensure the surrounding code that awaits this Promise handles the
rejection/failure shape.
🧹 Nitpick comments (1)
packages/opencode/test/altimate/validators/validator-utils.test.ts (1)

131-140: ⚡ Quick win

Use tmpdir() + await using fixture pattern for temp dirs in tests.

This test manages temp directories manually; switch to the shared tmpdir helper so setup/cleanup behavior is consistent and less error-prone.

As per coding guidelines, "Use the tmpdir function from fixture/fixture.ts to create temporary directories for tests with automatic cleanup in test files" and "Always use await using syntax with tmpdir() for automatic cleanup when the variable goes out of scope".

Also applies to: 185-191

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/opencode/test/altimate/validators/validator-utils.test.ts` around
lines 131 - 140, Replace the manual temp-dir setup/teardown in the
findDbtProjectRoot test (the tmpDir variable and beforeEach/afterEach that call
fs.mkdtemp/fs.rm) with the shared tmpdir fixture using the "await using"
pattern: call await using (const tmpDir = await tmpdir()) inside the test scope
(or top-level describe where appropriate) so temporary directories are created
and automatically cleaned up; update both the findDbtProjectRoot block and the
other similar manual setup (the second occurrence around the same file) to use
tmpdir() instead of manual fs.mkdtemp/fs.rm and remove the beforeEach/afterEach
teardown functions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/opencode/src/altimate/validators/validator-utils.ts`:
- Line 87: The scanner's file check uses entry.name.endsWith(".sql") which is
case-sensitive and misses files like "FOO.SQL"; change that check to a
case-insensitive comparison (e.g., use entry.name.toLowerCase().endsWith(".sql")
or a case-insensitive regex) so it matches any .sql variant and stays consistent
with modelNameFromPath's handling; update the branch that uses entry.isFile() &&
entry.name.endsWith(".sql") accordingly.
- Line 25: The exported VALIDATOR_TIMEOUT_MS currently uses
Number(process.env.ALTIMATE_VALIDATORS_TIMEOUT_MS ?? "60000") which can produce
NaN, 0, or negative values and lead to immediate/no-op timeouts; change the
initialization in validator-utils.ts to parse the env value safely (e.g.,
parseInt or Number), validate that it's a finite positive integer, and if not
set or invalid fall back to the default 60000 (optionally clamp to a minimum
like 1 ms). Update the VALIDATOR_TIMEOUT_MS assignment to perform this
validation and fallback so the constant is always a sane positive timeout value.

In `@packages/opencode/test/altimate/validators/validator-utils.test.ts`:
- Line 255: The test currently uses platform-dependent splitting: replace the
expression result.map((p) => p.split("/").pop()) with result.map((p) =>
path.basename(p)) and add an import for Node's path module (e.g., import path
from 'path') at the top of the test file so basename is used for OS-independent
extraction; update any expectations if necessary to match path.basename
behavior.

---

Duplicate comments:
In `@packages/opencode/src/altimate/validators/dbt-schema-verify.ts`:
- Around line 170-172: The code currently increments spawnFailures but still
allows ok: true when no mismatches are found; update the final decision logic so
any spawnFailures force a failure: when computing the overall result (the
variable/property named ok and/or the results object), set ok = false if
spawnFailures > 0 and add a descriptive failure entry (or push an error into the
mismatches/diagnostics array) so spawn failures are treated the same as schema
mismatches; adjust the block around spawnFailures++ and the later aggregation
(the code that produces ok) to ensure spawn failures between lines ~170 and ~191
flip the overall status and surface an error message.

In `@packages/opencode/src/altimate/validators/dbt-tests-pass.ts`:
- Around line 113-116: The child.on("error") handler currently clears the
timeout and calls resolve(null), which lets startup failures be treated as
successful/unknown and can lead to ok:true; change the handler to
clearTimeout(timer) and instead reject the Promise with the error (e.g.,
reject(err)) or resolve a failure result object (e.g., { ok: false, error:
err.message }) so startup errors are propagated as failures; apply this change
to both occurrences of child.on("error") (the handler that uses
clearTimeout(timer) and resolve(null) around lines with child.on("error") and
the similar block at lines 193–215), and ensure the surrounding code that awaits
this Promise handles the rejection/failure shape.

In `@packages/opencode/src/session/prompt.ts`:
- Around line 1117-1120: vCtx currently sets workingDirectory to process.cwd(),
which can point outside the active session; locate the session's actual
directory variable (e.g., session.root, sessionDir, projectDir, or the value
used when the session was created) and set vCtx.workingDirectory to that
session/project directory instead of process.cwd(); update any callers that rely
on vCtx (the vCtx object created near the
sessionID/workingDirectory/sessionStartMs block) so validators receive the
correct session path.

---

Nitpick comments:
In `@packages/opencode/test/altimate/validators/validator-utils.test.ts`:
- Around line 131-140: Replace the manual temp-dir setup/teardown in the
findDbtProjectRoot test (the tmpDir variable and beforeEach/afterEach that call
fs.mkdtemp/fs.rm) with the shared tmpdir fixture using the "await using"
pattern: call await using (const tmpDir = await tmpdir()) inside the test scope
(or top-level describe where appropriate) so temporary directories are created
and automatically cleaned up; update both the findDbtProjectRoot block and the
other similar manual setup (the second occurrence around the same file) to use
tmpdir() instead of manual fs.mkdtemp/fs.rm and remove the beforeEach/afterEach
teardown functions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 9c935fc7-0292-4705-9d8a-ebf9833196e8

📥 Commits

Reviewing files that changed from the base of the PR and between 7ca5a36 and 0724de3.

📒 Files selected for processing (7)
  • benchmark/ade-bench/altimate_code_agent/build-local-tarball.sh
  • packages/opencode/src/altimate/validators/dbt-schema-verify.ts
  • packages/opencode/src/altimate/validators/dbt-tests-pass.ts
  • packages/opencode/src/altimate/validators/validator-utils.ts
  • packages/opencode/src/session/prompt.ts
  • packages/opencode/test/altimate/validators/dbt-tests-pass.test.ts
  • packages/opencode/test/altimate/validators/validator-utils.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • benchmark/ade-bench/altimate_code_agent/build-local-tarball.sh

* ALTIMATE_VALIDATORS_TIMEOUT_MS for benchmark environments where dbt startup
* time varies.
*/
export const VALIDATOR_TIMEOUT_MS = Number(process.env.ALTIMATE_VALIDATORS_TIMEOUT_MS ?? "60000")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Harden timeout env parsing to avoid unintended immediate/no-op timeouts.

Number(process.env...) can yield NaN/0/negative values, which breaks timeout semantics and can cause immediate kill behavior.

Suggested fix
-export const VALIDATOR_TIMEOUT_MS = Number(process.env.ALTIMATE_VALIDATORS_TIMEOUT_MS ?? "60000")
+const parsedTimeout = Number.parseInt(process.env.ALTIMATE_VALIDATORS_TIMEOUT_MS ?? "60000", 10)
+export const VALIDATOR_TIMEOUT_MS =
+  Number.isFinite(parsedTimeout) && parsedTimeout > 0 ? parsedTimeout : 60000
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/opencode/src/altimate/validators/validator-utils.ts` at line 25, The
exported VALIDATOR_TIMEOUT_MS currently uses
Number(process.env.ALTIMATE_VALIDATORS_TIMEOUT_MS ?? "60000") which can produce
NaN, 0, or negative values and lead to immediate/no-op timeouts; change the
initialization in validator-utils.ts to parse the env value safely (e.g.,
parseInt or Number), validate that it's a finite positive integer, and if not
set or invalid fall back to the default 60000 (optionally clamp to a minimum
like 1 ms). Update the VALIDATOR_TIMEOUT_MS assignment to perform this
validation and fallback so the constant is always a sane positive timeout value.

const full = join(dir, entry.name)
if (entry.isDirectory()) {
await scan(full, depth + 1)
} else if (entry.isFile() && entry.name.endsWith(".sql")) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use case-insensitive .sql matching for model discovery.

The scanner currently misses *.SQL files, while modelNameFromPath already handles .sql case-insensitively.

Suggested fix
-      } else if (entry.isFile() && entry.name.endsWith(".sql")) {
+      } else if (entry.isFile() && /\.sql$/i.test(entry.name)) {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
} else if (entry.isFile() && entry.name.endsWith(".sql")) {
} else if (entry.isFile() && /\.sql$/i.test(entry.name)) {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/opencode/src/altimate/validators/validator-utils.ts` at line 87, The
scanner's file check uses entry.name.endsWith(".sql") which is case-sensitive
and misses files like "FOO.SQL"; change that check to a case-insensitive
comparison (e.g., use entry.name.toLowerCase().endsWith(".sql") or a
case-insensitive regex) so it matches any .sql variant and stays consistent with
modelNameFromPath's handling; update the branch that uses entry.isFile() &&
entry.name.endsWith(".sql") accordingly.

Comment thread packages/opencode/test/altimate/validators/validator-utils.test.ts Outdated
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 issues found across 7 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/opencode/src/altimate/validators/dbt-tests-pass.ts">

<violation number="1" location="packages/opencode/src/altimate/validators/dbt-tests-pass.ts:189">
P1: Running all model tests via unbounded `Promise.all` can make the validator flaky under load; concurrent dbt processes in one project can contend for shared resources and produce false failures.</violation>
</file>

<file name="packages/opencode/src/altimate/validators/dbt-schema-verify.ts">

<violation number="1" location="packages/opencode/src/altimate/validators/dbt-schema-verify.ts:69">
P1: Timeout/error outcomes are fail-open: the validator still returns `ok: true` when schema verification cannot complete, which can let invalid model/schema drift pass the completion gate.</violation>
</file>

<file name="packages/opencode/src/altimate/validators/validator-utils.ts">

<violation number="1" location="packages/opencode/src/altimate/validators/validator-utils.ts:25">
P2: Harden env-var parsing: `Number(process.env.ALTIMATE_VALIDATORS_TIMEOUT_MS)` can produce `NaN` (non-numeric input), `0`, or a negative value. Node.js coerces all of these to a 1 ms `setTimeout` delay, which would immediately kill the subprocess. Add a finite/positive guard with a fallback to the default.</violation>

<violation number="2" location="packages/opencode/src/altimate/validators/validator-utils.ts:87">
P3: Inconsistent case handling: `modelNameFromPath` strips the extension case-insensitively (`/\.sql$/i`), but this filter uses a case-sensitive `.endsWith(".sql")`. Files named `*.SQL` would be missed by the scanner even though downstream code handles them fine. Use a case-insensitive check for consistency.</violation>
</file>

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

// Run all model tests in parallel; track spawn failures separately so the
// caller can see which models were not verifiable vs which passed/failed.
let spawnFailures = 0
const outputs = await Promise.all(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Running all model tests via unbounded Promise.all can make the validator flaky under load; concurrent dbt processes in one project can contend for shared resources and produce false failures.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/opencode/src/altimate/validators/dbt-tests-pass.ts, line 189:

<comment>Running all model tests via unbounded `Promise.all` can make the validator flaky under load; concurrent dbt processes in one project can contend for shared resources and produce false failures.</comment>

<file context>
@@ -275,20 +171,28 @@ export const DbtTestsPassValidator: Validator = {
+    // Run all model tests in parallel; track spawn failures separately so the
+    // caller can see which models were not verifiable vs which passed/failed.
+    let spawnFailures = 0
+    const outputs = await Promise.all(
+      touched.map((path) => runDbtTest(modelNameFromPath(path), dbtRoot)),
+    )
</file context>

})
const timer = setTimeout(() => {
child.kill("SIGKILL")
resolve({ error: `timed out after ${VALIDATOR_TIMEOUT_MS}ms` })
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Timeout/error outcomes are fail-open: the validator still returns ok: true when schema verification cannot complete, which can let invalid model/schema drift pass the completion gate.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/opencode/src/altimate/validators/dbt-schema-verify.ts, line 69:

<comment>Timeout/error outcomes are fail-open: the validator still returns `ok: true` when schema verification cannot complete, which can let invalid model/schema drift pass the completion gate.</comment>

<file context>
@@ -182,11 +64,16 @@ async function runSchemaVerify(model: string, cwd: string): Promise<SchemaVerify
     })
+    const timer = setTimeout(() => {
+      child.kill("SIGKILL")
+      resolve({ error: `timed out after ${VALIDATOR_TIMEOUT_MS}ms` })
+    }, VALIDATOR_TIMEOUT_MS)
     let stdout = ""
</file context>

* ALTIMATE_VALIDATORS_TIMEOUT_MS for benchmark environments where dbt startup
* time varies.
*/
export const VALIDATOR_TIMEOUT_MS = Number(process.env.ALTIMATE_VALIDATORS_TIMEOUT_MS ?? "60000")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Harden env-var parsing: Number(process.env.ALTIMATE_VALIDATORS_TIMEOUT_MS) can produce NaN (non-numeric input), 0, or a negative value. Node.js coerces all of these to a 1 ms setTimeout delay, which would immediately kill the subprocess. Add a finite/positive guard with a fallback to the default.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/opencode/src/altimate/validators/validator-utils.ts, line 25:

<comment>Harden env-var parsing: `Number(process.env.ALTIMATE_VALIDATORS_TIMEOUT_MS)` can produce `NaN` (non-numeric input), `0`, or a negative value. Node.js coerces all of these to a 1 ms `setTimeout` delay, which would immediately kill the subprocess. Add a finite/positive guard with a fallback to the default.</comment>

<file context>
@@ -0,0 +1,207 @@
+ * ALTIMATE_VALIDATORS_TIMEOUT_MS for benchmark environments where dbt startup
+ * time varies.
+ */
+export const VALIDATOR_TIMEOUT_MS = Number(process.env.ALTIMATE_VALIDATORS_TIMEOUT_MS ?? "60000")
+
+// ---------------------------------------------------------------------------
</file context>

const full = join(dir, entry.name)
if (entry.isDirectory()) {
await scan(full, depth + 1)
} else if (entry.isFile() && entry.name.endsWith(".sql")) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3: Inconsistent case handling: modelNameFromPath strips the extension case-insensitively (/\.sql$/i), but this filter uses a case-sensitive .endsWith(".sql"). Files named *.SQL would be missed by the scanner even though downstream code handles them fine. Use a case-insensitive check for consistency.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/opencode/src/altimate/validators/validator-utils.ts, line 87:

<comment>Inconsistent case handling: `modelNameFromPath` strips the extension case-insensitively (`/\.sql$/i`), but this filter uses a case-sensitive `.endsWith(".sql")`. Files named `*.SQL` would be missed by the scanner even though downstream code handles them fine. Use a case-insensitive check for consistency.</comment>

<file context>
@@ -0,0 +1,207 @@
+      const full = join(dir, entry.name)
+      if (entry.isDirectory()) {
+        await scan(full, depth + 1)
+      } else if (entry.isFile() && entry.name.endsWith(".sql")) {
+        try {
+          const stat = await fs.stat(full)
</file context>
Suggested change
} else if (entry.isFile() && entry.name.endsWith(".sql")) {
} else if (entry.isFile() && /\.sql$/i.test(entry.name)) {

Expands the validator test suite from 39 to 117 tests, covering
boundary conditions, malformed inputs, and realistic dbt output patterns
that the original suite did not exercise.

**extractLastJsonObject adversarial cases**
- Stray JSON rejection: empty object, array with no envelope keys,
  objects with only unknown keys, numeric keys
- Envelope key guard: null/false/empty-string values still accepted when
  key is present; `error: null` does not invalidate the envelope
- Noisy stdout scanning: Python traceback + JSON, 50-line progress
  noise, BOM prefix, CRLF line endings, > 10 KB leading noise,
  whitespace-padded JSON
- Last-wins semantics: two valid envelopes (last wins), three valid
  envelopes (last wins), stray fragments between valid envelopes,
  same-line consecutive objects
- Brace/string parsing: nested braces in string values, escaped
  backslashes, escaped double-quotes, multiline string values, stdout
  field containing inner JSON, unicode characters, unicode escape
  sequences, multi-line formatted JSON, unbalanced `{` in log noise

**parseDbtTestOutput adversarial cases**
- Null/empty guard: null, undefined, whitespace-only, truncated output,
  dbt compile error (no Done. line)
- All-pass: clean run, SKIP-only, WARN-only
- NO-OP variant: zero tests, multiple NO-OP counts
- Failure extraction: deduplication of repeated test names, FAIL vs
  ERROR lines, `[FAIL`/`[ERROR` token exclusion, test names with dots,
  15+ failing tests captured
- Large counts: 99999 pass, 99999 error, zero counts, single test
- Format resilience: case-insensitive Done., named groups vs positional
  (PASS=7 ERROR=3 TOTAL=11), timestamps, ANSI colours, CRLF line
  endings, summary at very start/end of string, multiple summary lines
- Realistic full-output scenarios: dbt 1.8 all-pass, dbt 1.8 partial
  failures, ANSI-coloured Docker output, no-tests-defined NO-OP,
  SKIP from --exclude flag

**findDbtProjectRoot adversarial cases**
- Two-level deep search limit: project at depth 2 is NOT found
- dbt_project.yml is a directory (documents stat behavior)
- Many subdirs, only one has the file
- Direct takes precedence over nested

**modelsModifiedSince adversarial cases**
- Depth boundary: depth 4 included, depth 5 excluded (with path counts)
- Non-.sql files inside models/: yml, md, py, json all excluded
- File named `models.sql` outside a `models/` path component excluded
- Mtime boundary: file with mtime === sinceMs is included (>= semantics)
- Mixed modified/unmodified files
- Empty models/ directory (no SQL files)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/opencode/test/altimate/validators/validator-utils.test.ts (1)

353-362: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Replace manual mkdtemp lifecycle with await using tmpdir() in tests.

Line 357 and Line 444 (plus paired afterEach cleanup blocks) should use the shared test fixture helper instead of manual temp-dir management, per repo test conventions.

Suggested refactor pattern
-import { promises as fs } from "fs"
+import { promises as fs } from "fs"
+import { tmpdir } from "../fixture/fixture"
@@
-describe("findDbtProjectRoot", () => {
-  let tmpDir: string
-
-  beforeEach(async () => {
-    tmpDir = await fs.mkdtemp(join(osTmpdir(), "dbt-root-test-"))
-  })
-
-  afterEach(async () => {
-    await fs.rm(tmpDir, { recursive: true, force: true })
-  })
+describe("findDbtProjectRoot", () => {
@@
-  test("returns cwd when dbt_project.yml is directly present", async () => {
-    await fs.writeFile(join(tmpDir, "dbt_project.yml"), "name: test_project\n")
-    expect(await findDbtProjectRoot(tmpDir)).toBe(tmpDir)
+  test("returns cwd when dbt_project.yml is directly present", async () => {
+    await using tmp = await tmpdir()
+    await fs.writeFile(join(tmp.path, "dbt_project.yml"), "name: test_project\n")
+    expect(await findDbtProjectRoot(tmp.path)).toBe(tmp.path)
   })

Apply the same pattern in modelsModifiedSince tests as well.

As per coding guidelines, “Use the tmpdir function from fixture/fixture.ts to create temporary directories for tests with automatic cleanup in test files” and “Always use await using syntax with tmpdir() for automatic cleanup when the variable goes out of scope.”

Also applies to: 443-449

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/opencode/test/altimate/validators/validator-utils.test.ts` around
lines 353 - 362, Replace the manual temp-dir lifecycle in the findDbtProjectRoot
(and modelsModifiedSince) tests: remove the beforeEach/afterEach using
fs.mkdtemp/fs.rm and instead acquire the temporary directory via the shared
fixture by using "await using tmpdir()" to create the tmpDir variable; update
any references to tmpDir in those tests to use the fixture-provided path and
rely on automatic cleanup when the using scope ends; ensure you import tmpdir
from fixture/fixture.ts and remove the manual fs.mkdtemp/fs.rm teardown code
around the describe blocks.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@packages/opencode/test/altimate/validators/validator-utils.test.ts`:
- Around line 353-362: Replace the manual temp-dir lifecycle in the
findDbtProjectRoot (and modelsModifiedSince) tests: remove the
beforeEach/afterEach using fs.mkdtemp/fs.rm and instead acquire the temporary
directory via the shared fixture by using "await using tmpdir()" to create the
tmpDir variable; update any references to tmpDir in those tests to use the
fixture-provided path and rely on automatic cleanup when the using scope ends;
ensure you import tmpdir from fixture/fixture.ts and remove the manual
fs.mkdtemp/fs.rm teardown code around the describe blocks.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 20fd2692-6a4c-4bc9-a768-f65af7d11a14

📥 Commits

Reviewing files that changed from the base of the PR and between 0724de3 and 04abef0.

📒 Files selected for processing (2)
  • packages/opencode/test/altimate/validators/dbt-tests-pass.test.ts
  • packages/opencode/test/altimate/validators/validator-utils.test.ts

@dev-punia-altimate
Copy link
Copy Markdown

❌ Tests — Failures Detected

TypeScript — 15 failure(s)

  • baseline [0.46ms]
  • baseline [0.31ms]
  • baseline [0.07ms]
  • baseline [0.22ms]
  • connection_refused [0.20ms]
  • timeout [0.04ms]
  • permission_denied [0.03ms]
  • parse_error [0.04ms]
  • oom [0.03ms]
  • network_error [0.03ms]
  • auth_failure [0.04ms]
  • rate_limit [0.03ms]
  • internal_error [0.04ms]
  • empty_error [0.03ms]
  • connection_refused [0.07ms]

Next Step

Please address the failing cases above and re-run verification.

cc @anandgupta42

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: harness-side dbt validator framework (dbt-tests-pass + schema-verify hardening)

2 participants