fix(db): tolerate missing workers column in benchmark read queries / 基准测试读取查询兼容缺失的 workers 列 by arygupt · Pull Request #407 · SemiAnalysisAI/InferenceX-app

arygupt · 2026-05-29T19:00:42Z

Problem

Production error rate on /api/v1/benchmarks spiked to ~63% immediately after #405 merged (Vercel error chart). Root cause:

feat(power): measured-power multinode support (workers[] + per-stage joules) #405 changed the read queries to SELECT ... br.workers / lb.workers — a column added by migration 006_benchmark_results_workers.sql.
Migrations in this repo are applied manually (pnpm admin:db:migrate), separately from the Vercel app deploy. The app deploy shipped the new query, but migration 006 was not applied to prod, so the column doesn't exist there.
Postgres fails to plan a query that names a non-existent column (column "workers" does not exist), so the statement throws and the route returns 500.

Why ~63% and not 100%: /api/v1/benchmarks is wrapped in cachedQuery({ blobOnly: true }). Cache hits return pre-#405 blobs (200); only cache misses run the SQL → 500. The Neon panel shows 0% connection errors because the connection succeeds — only the query fails.

Fix

Read workers via to_jsonb(row) -> 'workers' in all three read paths:

getLatestBenchmarks — base-table (date) branch
getLatestBenchmarks — materialized-view (no-date) branch
getAllBenchmarksForHistory

A JSON key lookup is a runtime operation that returns NULL for an absent key, instead of failing at parse time like a bare column reference. So the endpoint keeps serving every other field whether or not migration 006 has been applied. Behavior is identical once the column exists (jsonb in → jsonb out).

This makes the read path resilient to migration/deploy ordering — a forgotten or not-yet-run migration degrades to "workers is null" instead of taking down the endpoint.

Note on the operational fix

This PR stops the 500s on its own. Migration 006 should still be applied to prod (pnpm admin:db:migrate) so the measured-power workers data is actually populated — and the cache purged afterward. This change just removes the hard dependency on migration-before-deploy ordering.

Tests

New packages/db/src/queries/benchmarks.test.ts — recording-mock DbClient asserts none of the three read queries reference the workers column directly (regression guard).
pnpm typecheck, pnpm lint, pnpm fmt, and full db suite (226 tests) pass.

Note

Low Risk
Targeted SQL selection change on read-only benchmark queries plus unit tests; no auth or write-path changes, with identical results once migration 006 exists.

Overview
Fixes production 500s on /api/v1/benchmarks when the app ships before migration 006 adds the workers column: bare br.workers / lb.workers in SELECT fails at plan time, while to_jsonb(row) -> 'workers' still plans and yields null until the column exists.

All three benchmark read paths in packages/db/src/queries/benchmarks.ts are updated—getLatestBenchmarks (date-filtered base table and materialized-view branches) and getAllBenchmarksForHistory. Behavior is unchanged once migration 006 is applied.

Adds benchmarks.test.ts with a recording mock DbClient so generated SQL must use the to_jsonb form and must not match \b(br|lb)\.workers\b.

^{Reviewed by Cursor Bugbot for commit d9f2a93. Bugbot is set up for automated code reviews on this repo. Configure here.}

中文说明

修复 #405 合并后 /api/v1/benchmarks 约 63% 请求返回 500 的问题。根因：新查询引用了 Migration 006 添加的 workers 列，但该迁移尚未在生产环境执行，Postgres 在查询规划阶段即报错。修复方法：在三个读取路径中改用 to_jsonb(row) -> 'workers'，该操作在列不存在时返回 NULL 而非报错，从而解耦迁移与部署顺序。新增回归测试确保读取查询不直接引用 workers 列。

#405's read queries select the `workers` column added by migration 006. Migrations are applied manually (pnpm admin:db:migrate), separately from the Vercel app deploy, so the deploy shipped a query referencing a column the production DB didn't have yet. Postgres fails to plan a bare reference to a non-existent column ("column \"workers\" does not exist"), so every cache-miss request to /api/v1/benchmarks 500'd — surfacing as the ~63% error rate after #405 (cache hits returned stale blobs; only misses hit the DB). Read `workers` via `to_jsonb(row) -> 'workers'` in all three read paths (getLatestBenchmarks base-table + view branches, getAllBenchmarksForHistory). JSON key lookup is a runtime op that returns NULL for an absent key instead of failing at parse time, so the endpoint keeps serving every other field whether or not migration 006 has run. Identical behavior once the column exists. Adds a regression test asserting the read queries never reference the workers column directly.

vercel · 2026-05-29T19:00:56Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
inferencemax-app	Ready	Preview, Comment	May 29, 2026 7:01pm

claude · 2026-05-29T19:01:05Z

Claude finished @arygupt's task in 1m 6s —— View job

Claude Code Review

Read changed files (benchmarks.ts, benchmarks.test.ts)
Verify approach against migration 006 + materialized view
Check test coverage
Post review

🟢 LGTM — no blocking issues found

The fix correctly addresses the parse-time column lookup problem. Verified against migrations/006_benchmark_results_workers.sql:

Migration 006 adds workers to benchmark_results and drops/recreates latest_benchmarks with br.* — so without 006 applied, neither br.workers nor lb.workers exists. The PR correctly patches all three read paths.
to_jsonb(row) -> 'workers' is the right primitive: row-to-jsonb is a runtime op and -> returns NULL for missing keys (vs. bare column refs which fail at parse time). Behavior matches br.workers once the column exists (jsonb in → jsonb out).
Per-row to_jsonb(br) adds some overhead in the history path, but it's negligible compared to the existing per-row work and not worth optimizing for a resilience fix.
The regression test correctly reconstructs the SQL template by joining static segments with ? and asserts both the new form is present and the bare form is absent — covers all three branches (date / no-date / history).
Comments in benchmarks.ts:93-96, 136, 184 explain the why (migration ordering, parse-time vs. runtime), which is exactly the non-obvious context that belongs in code.

Operationally, as the PR body notes: migration 006 still needs to be applied to populate the column. This change just removes the hard ordering dependency between migration and deploy.
• Branch: fix/benchmarks-workers-column-tolerant

arygupt requested a review from adibarra as a code owner May 29, 2026 19:00

vercel Bot deployed to Preview May 29, 2026 19:01 View deployment

functionstackx changed the title ~~fix(db): tolerate missing workers column in benchmark read queries~~ fix(db): tolerate missing workers column in benchmark read queries / 基准测试读取查询兼容缺失的 workers 列 Jul 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(db): tolerate missing workers column in benchmark read queries / 基准测试读取查询兼容缺失的 workers 列#407

fix(db): tolerate missing workers column in benchmark read queries / 基准测试读取查询兼容缺失的 workers 列#407
arygupt wants to merge 1 commit into
masterfrom
fix/benchmarks-workers-column-tolerant

arygupt commented May 29, 2026 •

edited by functionstackx

Loading

Uh oh!

vercel Bot commented May 29, 2026 •

edited

Loading

Uh oh!

claude Bot commented May 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

arygupt commented May 29, 2026 • edited by functionstackx Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Note on the operational fix

Tests

中文说明

Uh oh!

vercel Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Claude Code Review

🟢 LGTM — no blocking issues found

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

arygupt commented May 29, 2026 •

edited by functionstackx

Loading

vercel Bot commented May 29, 2026 •

edited

Loading

claude Bot commented May 29, 2026 •

edited

Loading