Skip to content

Phase 18 PR-6: /health/build-info endpoint + migration-safety CI gate#117

Merged
mcheemaa merged 2 commits intomainfrom
feat/2026-05-01-phase18-pr6-build-info-endpoint
May 1, 2026
Merged

Phase 18 PR-6: /health/build-info endpoint + migration-safety CI gate#117
mcheemaa merged 2 commits intomainfrom
feat/2026-05-01-phase18-pr6-build-info-endpoint

Conversation

@mcheemaa
Copy link
Copy Markdown
Member

@mcheemaa mcheemaa commented May 1, 2026

Summary

  • Adds GET /health/build-info to the in-VM phantom HTTP server. Reads /etc/phantom-build-info (the JSON file embedded in every phantom-rootfs image at Docker build time) and returns it verbatim so operators can verify what phantom version is actually running inside a tenant VM.
  • Adds a migration-safety CI gate at src/db/check-migrations.ts that walks src/db/schema.ts and rejects any non-additive or non-idempotent migration. Wired into .github/workflows/ci.yml as a fail-closed step.
  • Adds a "Build identity" subsection to CLAUDE.md covering both surfaces.

Why this matters

This is part 6 of the Phase 18 phantom-updates flow. PR-1 (content-addressable snapshots) and PR-7 (operations runbook) are already merged in adjacent repos. The Phase 18 model is per-tenant snapshot-replace upgrades: each tenant pins a phantom version via tenants.image_tag = "phantom-rootfs:<sha7>", and an operator-driven upgrade swaps the tenant's ZFS clone to a new base while rsyncing /app/data/ (which contains phantom.sqlite) from the old clone to the new one.

Two truth-source gaps that this PR closes:

  1. Operators have no in-VM truth-source for "what code is actually running." phantomctl tenant get returns image_tag from the host's SQLite, but that is the operator's intended state, not the in-VM observed state. The new endpoint returns the JSON file phantom-rootfs's Dockerfile section 10b bakes in, with the resolved 40-char phantom_sha, the requested ref, the build wall-clock, the rootfs image name, and adjacent provenance. An operator reconciles the response against image_tag to detect drift (upgrade in flight, corrupted clone, or daemon not restarted after a swap).

  2. A destructive migration would corrupt every live tenant. Because /app/data/phantom.sqlite rsyncs forward across an upgrade, any DROP TABLE / DROP COLUMN / RENAME migration would break the previous version's reads and break rollback safety. The CI gate enforces the contract on every PR before it can land.

What ships

  • src/core/build-info.ts: read-and-parse helper. Returns {kind: "ok" | "missing" | "malformed"}. Pure file IO, no caching.
  • src/core/server.ts: new route handler at GET /health/build-info. 200 with the file contents, 404 with error: "build_info_unavailable" when missing, 500 with error: "build_info_malformed" when present-but-bad JSON. Cache-Control: no-store so the request-time read is honored end to end.
  • src/db/migration-safety.ts: the gate library. Forbids DROP TABLE, DROP COLUMN, DROP CONSTRAINT, DROP INDEX, RENAME COLUMN, RENAME TO. Requires CREATE TABLE IF NOT EXISTS and CREATE INDEX IF NOT EXISTS. ALTER TABLE ADD COLUMN is allowed because the runner's _migrations index makes it idempotent at the runner level. Strips SQL line comments before pattern matching so commented-out forbidden tokens do not trigger false positives.
  • src/db/check-migrations.ts: thin CLI runner. Exits 0 on clean, exits 1 with a human-readable violation list otherwise.
  • .github/workflows/ci.yml: new "Migration safety gate" step between Typecheck and Test.
  • CLAUDE.md: new "Build identity (Phase 18 PR-6)" section documenting both surfaces and the operator workflow.

Architectural invariants

  • Read at request-time, never cached. A future in-place upgrade that overwrites /etc/phantom-build-info is reflected on the next request without a phantom restart. Verified by an end-to-end test that overwrites the file mid-flight.
  • Read-only contract. Cache-Control: no-store, GET-only. POSTs return 404. No side-effects.
  • Unauthenticated by design. Matches the /health and /metrics precedent. Per-tenant isolation comes from the per-tenant URL behind Caddy. The build SHA is a public-repo value.
  • Schema version 1. Locked. Future schema changes bump the integer; the test pins it as a number, not a string.
  • Path overridable via PHANTOM_BUILD_INFO_PATH. Production reads the baked-in /etc/phantom-build-info; tests redirect at a tmp file.
  • Migration-safety gate is fail-closed. Any violation fails CI and blocks the merge.

Pre-existing migration audit

Walked all 51 entries in src/db/schema.ts against the new gate. Zero pre-existing violations. The gate passes cleanly on main today; this is the foundation for keeping it that way.

Test plan

  • bun typecheck clean
  • bun run lint clean
  • bun test clean: 2337 pass / 0 fail / 10 skip / 1 todo (was 2308 pass before this PR; +29 new tests)
  • bun run src/db/check-migrations.ts reports migration-safety: ok on the live MIGRATIONS array
  • Manual smoke against a real rootfs (deferred to Cheema's pre-merge check; the unit + integration tests cover the path that matters)

Authority

Ships to ghostwright/phantom (PUBLIC). Per repo rules, this PR is operator-merge-only. The orchestrator pushes the branch and opens the PR; Cheema reviews and merges.

Authoritative context

  • Phase 18 architect doc: phantom-cloud-deploy/local/2026-05-01-phase18-phantom-updates-flow-architect.md §5.2 (migration contract), §7.4 (build-info endpoint), §11.6 (this PR's spec).
  • phantom-rootfs/Dockerfile section 10b for the source of truth on the JSON shape.

…afety CI gate

Surfaces the build identity baked into every phantom-rootfs image so an
operator can verify what phantom version is actually running inside a
tenant VM. The endpoint at GET /health/build-info reads the JSON file
that phantom-rootfs's Dockerfile section 10b embeds at
/etc/phantom-build-info and returns it verbatim. Read at request-time,
never cached in process memory, so an in-place upgrade that overwrites
the file is reflected on the next request. 404 with a clean error when
the file is missing (a misconfigured dev container; production never
sees this). Tests override the path via PHANTOM_BUILD_INFO_PATH.

The migration-safety CI gate enforces phantom's startup migration
contract: every entry in src/db/schema.ts must be additive (no DROP
TABLE, DROP COLUMN, DROP CONSTRAINT, DROP INDEX, RENAME) and idempotent
(CREATE ... IF NOT EXISTS). The contract is load-bearing for the Phase
18 snapshot-replace upgrade flow: tenants survive an upgrade by having
their /app/data/ directory rsynced from the old ZFS clone to the new
clone, so a destructive migration would corrupt every live tenant on
the next upgrade. The gate ships as a Bun script invoked from CI; a
violation fails the workflow and blocks the merge. ALTER TABLE ADD
COLUMN is allowed because the runner's _migrations table makes it
idempotent at the runner level. The gate validates clean against all
51 existing migrations on main; no pre-existing violations.

Adds 29 tests across two new files: src/core/__tests__/build-info.test.ts
covers the read function, the route, the request-time read invariant,
and the 404/500 error paths; src/db/__tests__/migration-safety.test.ts
covers every forbidden pattern, every idempotency rule, the comment
stripper, and pins the live MIGRATIONS array as clean.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 27dc9caf75

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +90 to +91
const idx = line.indexOf("--");
return idx >= 0 ? line.slice(0, idx) : line;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Handle '--' only outside SQL string literals

The migration gate can be bypassed because stripComments removes everything after the first -- on a line even when that token appears inside a quoted SQL string. For example, a migration like INSERT INTO t(msg) VALUES('ok -- note'); DROP TABLE sessions yields no violations after stripping, but runMigrations still executes the original statement text, so destructive SQL can pass CI undetected. This breaks the fail-closed safety guarantee for migration checks.

Useful? React with 👍 / 👎.

@mcheemaa mcheemaa merged commit fc04b1d into main May 1, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant