feat: branch test runner data layer + parallel orchestration (FEIP-7092) by kevin-hartman · Pull Request #48 · databricks-solutions/lakebase-app-dev-kit

kevin-hartman · 2026-05-28T01:03:26Z

Summary

Foundation work for FEIP-7092 (branch test runner) plus collateral substrate hardening surfaced during a real-Lakebase end-to-end exercise of the surface.

Original scope (4 slices)

Per-tag outcomes shape on ExperimentOutcomes (api / e2e / infra → {passed, failed}); backwards compatible — totals stay authoritative.
Per-cycle artifact persistence: writeArtifact / listArtifacts / readArtifact over .tdd/experiments/<F>/<exp>/artifacts/<cycle-id>/<name>. Path-traversal guarded.
Bounded-concurrency parallel runner: runExperimentsInParallel<T> with hard concurrency cap, fail-isolation per experiment, peak-in-flight tracking, per-experiment duration measurement.
Structured compareExperiments payload: per-row by_tag + cycle_count + artifact_count + duration_ms; new matrix (tag × experiment cells) ready for FEIP-7208's renderer.

Collateral substrate hardening (commit `bdfc40a`)

Three substrate gaps surfaced when this PR's surface was exercised end-to-end against a real Lakebase project. All are pre-existing concerns hermetic tests didn't catch; live list-branches / create-branch responses exposed them.

Branch identifier (uid vs name) confusion. A Lakebase branch has TWO identifiers and they are NOT interchangeable in the API. getDefaultBranchId violated the kit's own contract by preferring def.uid over the leaf of def.name, causing a confusing "branch id not found" error from the service. Fixes:

new scripts/lakebase/branch-id.ts: branded BranchName / BranchUid types + asBranchName / asBranchUid validators + branchNameFromResourcePath leaf extractor
getDefaultBranchId → getDefaultBranchName, returning BranchName | null derived from leaf-of-name (never uid); old name kept as @deprecated shim that also returns the name string
LakebaseBranchInfo.uid typed BranchUid; new nameLeaf: BranchName field
runtime guard in createBranch.parentBranch rejects uid-shaped input with a typed LakebaseBranchError
parseBranch funnels through the validators

Workspace TTL policy. Some workspaces enforce a tighter maximum-expiration policy than the PSA convention's 30-day feature TTL. Substrate now wraps the CLI's opaque "expiration time exceeds the maximum expiration time" error with a typed LakebaseBranchTtlTooLongError naming the attempted TTL + the override paths.

create-project.test.ts hardcoded placeholder host. The E2E test fell back to https://workspace.cloud.databricks.com (DNS fails) when LAKEBASE_TEST_HOST was unset. Added a resolveTestHost() helper that prefers the explicit env, then databricks auth env --profile $DATABRICKS_CONFIG_PROFILE. Documented skip block parallels the existing skip-when-e2e-disabled. Same test: .env contract updated to match the substrate's deliberate deployEnv() seeding (avoids the gated-hook chicken-and-egg).

Per-test timeouts on 2 live-pg-pair tests. queryBranchSchema(uid) === queryBranchSchema(branchId) and get-connection-equivalence each do two live pg connections back-to-back; the 5s vitest default is tight under parallel-suite load. Bumped to 30s with comments — legitimate per-test budget, not a workaround.

Slice breakdown

Slice	Surface	Commit
1	`ExperimentOutcomes.by_tag` + ExperimentTag / TagOutcome types	`5c72108`
2	`scripts/tdd/artifacts.ts` (new) — write/list/read primitives	`35a545c`
3	`scripts/tdd/parallel-runner.ts` (new) — bounded concurrency	`9321c5c`
4	`compareExperiments` structured payload (rows + matrix)	`0a57f40`
Collateral	branch-id branded types + workspace-TTL handling	`bdfc40a`

Live verification

Substrate exercised end-to-end against a real Lakebase project on a workspace that imposes a tighter TTL policy than the convention. After the fixes:

3 fresh Lakebase projects provisioned across the session. 2 child branches cut in parallel via runExperimentsInParallel, peak_in_flight=2, wall 7.5s.
compareExperiments matrix populated across all 3 tags: [api] both pass 3p/0f; [e2e] exp-pg wins 1p/0f vs exp-json 1p/1f; [infra] exp-pg has data, exp-json null. signal=winning correctly inferred for exp-pg.
Artifacts persisted (Playwright trace + vitest junit per experiment, listed via listArtifacts).
recommendation=continue correct given 1 winning + 1 running.

All 3 test projects deleted after verification — no leaked resources.

Test plan

npm run typecheck clean
Hermetic suite: 403 → 404 passed, 44 skipped, 0 failed
Live env-gated suite (LAKEBASE_TEST_INSTANCE=… LAKEBASE_TEST_E2E=1): 432 passed, 16 skipped, 0 failed (+29 over hermetic)
npm run build clean
Manual: live driver against real Lakebase exercises slices 1-4 end-to-end

Composition with sibling PRs / tickets

FEIP-7094 (Playwright E2E end-to-end): will populate by_tag.e2e from the Driver's tag-aware runner. Slice 1's shape is the contract that the Driver writes to.
FEIP-7208 (comparison-report renderer): consumes the matrix + per-row by_tag + artifact_count to render the promote-vs-synthesize HITL decision aid.
FEIP-7215 (lakebase-feature-status, PR feat: lakebase-feature-status bin + MCP tool (FEIP-7215) #47): can surface per-tag info once it merges + this lands; minor follow-up to enhance the renderer.

This pull request and its description were written by Isaac.

Adds ExperimentTag = "api" | "e2e" | "infra" and TagOutcome { passed, failed }. ExperimentOutcomes gains an optional by_tag field — Partial<Record<ExperimentTag, TagOutcome>>. Top-level tests_passed / tests_failed remain authoritative totals; by_tag is a breakdown for downstream renderers (comparison-report renderer in FEIP-7208, feature-status snapshot in FEIP-7215) and the per-tag smell detectors (FEIP-7094's e2e-row-perma-red). Backwards compatible: existing callers reading the prior shape see no change. The breakdown is partial-reportable (api but not e2e is valid) and the sum across tags is not enforced to match the totals (mid-cycle reporting and untagged tests both valid). Tests - tests/bdd/tdd-experiment-lifecycle.test.ts: 3 new round-trip tests covering full breakdown, partial breakdown, by_tag-omitted backwards compat. Co-authored-by: Isaac

New scripts/tdd/artifacts.ts module with writeArtifact / listArtifacts / readArtifact over .tdd/experiments/<F>/<exp>/artifacts/<cycle-id>/<name>. Names may include subdirs (e.g. "traces/network.har"); intermediate dirs created on demand. Path-traversal guarded: absolute paths and ".." rejected. The orchestrator writes here after every cycle (Playwright traces, vitest junit output, screenshots, repro scripts). The comparison-report renderer (FEIP-7208) reads listings to surface what is available when the PO is deciding promote vs synthesize. Gitignored by default in scaffolded projects: artifacts can be large and rebuilding them from logs is cheap. The scaffold step that writes the project's .gitignore is owned by lakebase-create-project, not this module. Tests - tests/bdd/tdd-artifacts.test.ts: 10 tests covering write, list (scoped + cross-cycle), read round-trip, security guards (path traversal, absolute paths), empty cases, and entry shape stability. Co-authored-by: Isaac

New scripts/tdd/parallel-runner.ts: runExperimentsInParallel<T> schedules N experiment runs against an injected runner callback, honoring a hard concurrency cap sourced from .tdd/features/<F>/plan.json budget.concurrent_branches by the orchestrator. Failure in one experiment does NOT abort the others. Each runner invocation is try/catch-isolated and produces an ExperimentRunResult (succeeded or failed) in the aggregate output. Tracks peak_in_flight observed and per-experiment duration_ms. The runner is injected so tests stay hermetic (no real Lakebase calls). The orchestrator's real runner integrates with experiment.ts + run-cycle.ts to drive an actual TDD cycle; that integration is the next slice past this ticket and will compose on top. Tests - tests/bdd/tdd-parallel-runner.test.ts: 8 tests covering input completeness, concurrency cap honored (parallelization observed via peak_in_flight), failure isolation, duration tracking, empty-list short-circuit, concurrency=1 forces sequential mode, rejection of concurrency < 1, context pass-through to the runner. Co-authored-by: Isaac

Extends ComparisonReport for the FEIP-7208 comparison-report renderer to consume: - Each ExperimentRow gains by_tag (the per-tag breakdown surfaced through ExperimentOutcomes in slice 1), cycle_count (count of entries in the experiment's timeline.json), artifact_count (from listArtifacts in slice 2), and duration_ms (from runtime.json when written by the orchestrator). - New matrix: TagMatrixRow[] — one row per tag any experiment reported, with per-experiment cells. Cell is TagOutcome { passed, failed } when the experiment reported the tag, null when it did not. Tags ordered api → e2e → infra. Empty when no experiment recorded per-tag outcomes (early-stage race or projects that do not use the tag-aware runner yet). - Existing fields (rows, recommendation, rationale) unchanged. Backwards compatible with the current ComparisonReport consumers. Tests - tests/bdd/tdd-compare-experiments.test.ts: 6 new tests covering cycle_count population (0 fallback when timeline missing), artifact_count population, duration_ms when runtime.json is written, by_tag pass-through from outcomes, full matrix shape across 3 tags with null cells for missing data, empty matrix when no experiment uses tags. Co-authored-by: Isaac

@deprecated

…urfaced during FEIP-7092 live E2E) The live exercise of FEIP-7092's substrate against a real Lakebase project surfaced three related substrate gaps. All three are pre- existing concerns that hermetic tests didn't catch; the live run with real list-branches + create-branch responses exposed them. This commit hardens the affected seams + adds hermetic coverage so they stay caught. ### 1. BranchName vs BranchUid distinction (the headline fix) A Lakebase branch has TWO identifiers and they are NOT interchangeable: - BranchName: the resource-path leaf (`production`, `feature-x`). The Lakebase API requires this in every path-shaped field — source_branch in create-branch specs, `{branch}` segments in subresource URLs. - BranchUid: the system-assigned `br-…` form. Returned in the `uid` field of list-branches. Used only for direct uid lookups. Passing a BranchUid where a BranchName is expected fails with a confusing "branch id not found" error from the service. `getDefaultBranchId` violated the kit's own contract (LakebaseBranchInfo.uid's docstring is explicit about uids NOT being accepted in path fields) by preferring `def.uid` over the leaf of `def.name`. Fixes: - new scripts/lakebase/branch-id.ts: branded types BranchName / BranchUid + asBranchName / asBranchUid runtime validators + branchNameFromResourcePath leaf extractor. Both validators throw with typed, actionable messages on a swap attempt. - getDefaultBranchId renamed to getDefaultBranchName. Returns BranchName | null derived from leaf-of-name, never from uid. The old name is kept as a thin @deprecated shim that now also returns the name (not the uid) so any transitional caller is unblocked. - LakebaseBranchInfo.uid typed as BranchUid; new nameLeaf: BranchName field for direct consumption. parseBranch funnels through the validators so the kit's view of a branch is always brand-correct. - createBranch's parentBranch arg gets a runtime guard via asBranchName that throws LakebaseBranchError immediately if a uid-shaped value is passed — surfaces the swap at the boundary instead of letting the CLI's "branch id not found" bubble up. - findDefaultBranchName helper extracted as pure for hermetic testing. ### 2. Workspace TTL policy CONVENTION_TIER_DEFAULTS.feature defaults to 30-day TTL ("2592000s"), matching the documented PSA branching convention. Some workspaces enforce a tighter maximum-expiration policy (the test workspace caps somewhere between 14 and 30 days). The substrate previously surfaced this as an opaque "expiration time exceeds the maximum expiration time" error from the CLI. Fix: new LakebaseBranchTtlTooLongError (extends LakebaseBranchError), wrapped by createBranch when it detects the specific stderr signal. Message names the attempted TTL, the override options (shorter ttl arg OR noExpiry: true), and the history_retention_duration probe path. CONVENTION_TIER_DEFAULTS docstring updated to call out the caveat. ### 3. create-project live test hardcoded placeholder host create-project.test.ts's E2E test fell back to `https://workspace.cloud.databricks.com` (placeholder) when LAKEBASE_TEST_HOST was unset. That host DNS-fails on every workspace — the test had been silently unrunnable for anyone using DATABRICKS_CONFIG_PROFILE only. Fix: added a resolveTestHost() helper that prefers LAKEBASE_TEST_HOST, falls back to running `databricks auth env --profile $DATABRICKS_CONFIG_PROFILE` to extract DATABRICKS_HOST, and returns null (so the describe block skips cleanly with a documented reason) when neither is set. Documented skip block parallels the existing skip-when-e2e-disabled. Same test: contract update on the `.env` assertion. scaffold.ts's deployEnv() deliberately seeds .env on create-project to avoid a gated-hook chicken-and-egg (post-checkout bails on empty LAKEBASE_PROJECT_ID). Test now asserts .env DOES exist with the expected seeded fields; comment updated to match the substrate's documented design. ### 4. Per-test timeout bumps on two live-pg-pair tests `branch-identifier > queryBranchSchema(uid) === queryBranchSchema(branchId)` and `get-connection-equivalence > returns identical current_database()/host across both output shapes` each do TWO live pg connections back-to-back. Each connection takes ~3-4s end-to-end (credential mint + TLS + query). The 5s vitest default per-test budget was tight enough that under parallel-suite load the pair routinely timed out, even though neither substrate path is actually slow. Bumped both to 30000ms with comments explaining the budget (not a workaround — same fix pattern as the MCP handshake test). Tests - new tests/bdd/branch-id.test.ts: 14 hermetic tests covering both validators (positive + negative), leaf extractor, and @ts-expect-error type-level swap assertions - new tests/bdd/branch-ttl-error.test.ts: 7 tests covering the CLI-stderr detector (case-insensitive, defensive against rewording) and the typed error contract (extends LakebaseBranchError, names the override paths, includes the underlying trace) - tests/bdd/lakebase-project.test.ts: +6 regression tests against the exact CLI list-branches response shape. The headline assertion guards the bug: "returns 'production' not 'br-crimson-fire-d28lb2ez'" - 396 → 403 passing in the hermetic suite (+22 net). Live env-gated suite goes 403 → 432 passing (+29 net) when LAKEBASE_TEST_INSTANCE is set against a real project. Co-authored-by: Isaac

… LIVE gate The live test body REQUIRES LAKEBASE_TEST_PROJECT_PATH (throws "required for live test" otherwise). The describe block's LIVE gate previously accepted just LAKEBASE_TEST_E2E=1 + DATABRICKS_HOST, so a run with E2E=1 but no PROJECT_PATH would enable the describe and then the test body would throw — surfacing as a hard FAIL instead of a clean SKIP. Surfaced during the full live-suite run via scripts/run-all-live-tests.sh: the kit's auto-provision flow sets E2E + HOST but does not set PROJECT_PATH (which refers to a Databricks workspace path the test would normally treat as a "this project is already published in this workspace" pointer). Tightening the gate keeps the test correctly skipped in that scenario while preserving its full assertion set when all three env vars are supplied together. Co-authored-by: Isaac

…d suite New script that orchestrates a clean live-test run end-to-end: scripts/run-all-live-tests.sh --profile <name> Differs from the existing scripts/run-live-tests.sh: - Auto-resolves DATABRICKS_HOST from a `databricks` CLI profile (no need for the user to export it separately). - Auto-provisions a fresh Lakebase project on demand (with a 5s creation-grace prompt; --no-prompt for CI) and resolves its default branch via list-branches. The default-branch leaf is used as both LAKEBASE_TEST_BRANCH and LAKEBASE_TEST_PARENT unless overridden. - Sets EVERY env var the kit's tests gate on (the LAKEBASE_TEST_* suite, LAKEBASE_TEST_INITIALIZR, PEER_DEP_INTEGRATION, plus LAKEBASE_TEST_HOST so the create-project test fixture's profile- resolver finds the host). - Defaults to LAKEBASE_TEST_NO_TEARDOWN=1 so the substrate's never-teardown-on-failure convention holds. The --teardown flag opts in to post-run cleanup, and even then only on a fully-green run (the script preserves the project on any non-zero exit). Usage: scripts/run-all-live-tests.sh --profile <name> # provision + run scripts/run-all-live-tests.sh --profile <name> --project <id> # reuse scripts/run-all-live-tests.sh --profile <name> --teardown # cleanup on green scripts/run-all-live-tests.sh --profile <name> --no-prompt # CI Not unlocked by this script (separate setup required): - detect-language-via-self-hosted-runner.test.ts (needs a self-hosted GitHub Actions runner registered). - LAKEBASE_TEST_PROJECT_PATH-gated TDD live tests (need a Databricks workspace path; gated test now skips cleanly thanks to the sibling tdd-experiment-lifecycle LIVE-gate commit). Co-authored-by: Isaac

kevin-hartman added 4 commits May 27, 2026 20:01

kevin-hartman mentioned this pull request May 28, 2026

feat(WIP): lakebase-apps-deploy lift (FEIP-7130) #49

Draft

9 tasks

kevin-hartman added 3 commits May 27, 2026 21:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: branch test runner data layer + parallel orchestration (FEIP-7092)#48

feat: branch test runner data layer + parallel orchestration (FEIP-7092)#48
kevin-hartman wants to merge 7 commits into
mainfrom
kevin.hartman/feip-7092-branch-test-runner

kevin-hartman commented May 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kevin-hartman commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Original scope (4 slices)

Collateral substrate hardening (commit bdfc40a)

Slice breakdown

Live verification

Test plan

Composition with sibling PRs / tickets

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kevin-hartman commented May 28, 2026 •

edited

Loading

Collateral substrate hardening (commit `bdfc40a`)