feat: Add QA evaluation structured outputs for Starlight (Brent Council)#5
Open
roshan-vapi wants to merge 1 commit intomainfrom
Open
feat: Add QA evaluation structured outputs for Starlight (Brent Council)#5roshan-vapi wants to merge 1 commit intomainfrom
roshan-vapi wants to merge 1 commit intomainfrom
Conversation
Add 5 structured output YAML files for automated post-call QA evaluation of Brent Council Housing Benefits calls: - starlight-qa-engagement.yml: 7 questions (3 auto-fail: 1.3, 1.4, 1.5) - starlight-qa-right-first-time.yml: 8 questions (3 auto-fail: 2.3, 2.4, 2.5) - starlight-qa-signposting.yml: 2 questions (no auto-fail) - starlight-qa-explaining.yml: 2 questions (no auto-fail) - starlight-wrap-up-code.yml: call classification into 19 wrap-up codes Each QA structured output evaluates per-question with result (yes/no/not_applicable), reasoning, and transcript evidence. Auto-fail logic: if ANY auto-fail question receives "no", the entire evaluation fails across all categories. All outputs include multilingual transcript support, AI agent adaptation notes, and the full Brent Council Housing Benefits glossary. Closes PRO-846 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
dhruva-reddy
added a commit
that referenced
this pull request
May 2, 2026
## ELI5 **Problem.** `npm run push -- <env>` immediately starts hitting the live dashboard. There was no way to ask "what would this push do?" before firing it. So a fat-fingered command — wrong org, missing file path, wide-scope push when you meant scoped — hit production immediately, and recovery meant `pull` + manual revert. The only existing dry-run concept gated *deletions*, not creates or updates. **What this fix does.** Adds a `--dry-run` flag to `push`. Instead of firing POST/PATCH/DELETE, the engine counts the intent and prints `[dry-run] would <METHOD> <endpoint> <body-preview>` per resource. The state file is never written (so synthetic IDs don't pollute it), and the end-of-run summary shows `Would create N, would update M, would delete K`. GETs still run because drift detection (Stack G) and operator preview both need to see current platform state. **Outcome you'll notice.** Run `npm run push -- <env> --dry-run` to preview any push. Especially useful for "did I scope this right?" and "is the pre-push lint reporting drift I should address first?" before the real push. Cheapest individual operator-safety win in the stack — no schema changes, no engine architecture moves. --- Operators today can't validate "is this push doing what I think it's doing" before it lands on prod. push.ts has a dry-run concept only for deletions; updates and creates fire immediately. Cheapest individual operator-safety win (improvements.md #5). - src/config.ts: parseFlags now accepts --dry-run alongside --force / --bootstrap. Exports DRY_RUN. - src/api.ts: vapiRequest gates POST/PATCH on DRY_RUN — counts the intent, prints `[dry-run] would <METHOD> <endpoint>` with a 120-char body preview, and returns a synthetic id so caller code threads through. vapiDelete gets the same treatment. GETs always run (drift preview needs them). - src/push.ts: banner ("🧪 DRY-RUN") at start, summary at end ("Would create N, would update M, would delete K"), saveState entirely skipped in dry-run so synthetic ids never leak into the state file. - AGENTS.md: document --dry-run in Available Commands. - tests/push-dry-run.test.ts: --dry-run is parse-accepted, banner prints, state file is NEVER created (verified end-to-end via spawn). - improvements.md: #5 → RESOLVED. Closes improvements.md #5. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
dhruva-reddy
added a commit
that referenced
this pull request
May 2, 2026
## ELI5 **Problem.** `npm run push -- <env>` immediately starts hitting the live dashboard. There was no way to ask "what would this push do?" before firing it. So a fat-fingered command — wrong org, missing file path, wide-scope push when you meant scoped — hit production immediately, and recovery meant `pull` + manual revert. The only existing dry-run concept gated *deletions*, not creates or updates. **What this fix does.** Adds a `--dry-run` flag to `push`. Instead of firing POST/PATCH/DELETE, the engine counts the intent and prints `[dry-run] would <METHOD> <endpoint> <body-preview>` per resource. The state file is never written (so synthetic IDs don't pollute it), and the end-of-run summary shows `Would create N, would update M, would delete K`. GETs still run because drift detection (Stack G) and operator preview both need to see current platform state. **Outcome you'll notice.** Run `npm run push -- <env> --dry-run` to preview any push. Especially useful for "did I scope this right?" and "is the pre-push lint reporting drift I should address first?" before the real push. Cheapest individual operator-safety win in the stack — no schema changes, no engine architecture moves. --- Operators today can't validate "is this push doing what I think it's doing" before it lands on prod. push.ts has a dry-run concept only for deletions; updates and creates fire immediately. Cheapest individual operator-safety win (improvements.md #5). - src/config.ts: parseFlags now accepts --dry-run alongside --force / --bootstrap. Exports DRY_RUN. - src/api.ts: vapiRequest gates POST/PATCH on DRY_RUN — counts the intent, prints `[dry-run] would <METHOD> <endpoint>` with a 120-char body preview, and returns a synthetic id so caller code threads through. vapiDelete gets the same treatment. GETs always run (drift preview needs them). - src/push.ts: banner ("🧪 DRY-RUN") at start, summary at end ("Would create N, would update M, would delete K"), saveState entirely skipped in dry-run so synthetic ids never leak into the state file. - AGENTS.md: document --dry-run in Available Commands. - tests/push-dry-run.test.ts: --dry-run is parse-accepted, banner prints, state file is NEVER created (verified end-to-end via spawn). - improvements.md: #5 → RESOLVED. Closes improvements.md #5. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
dhruva-reddy
added a commit
that referenced
this pull request
May 2, 2026
## ELI5 **Problem.** `npm run push -- <env>` immediately starts hitting the live dashboard. There was no way to ask "what would this push do?" before firing it. So a fat-fingered command — wrong org, missing file path, wide-scope push when you meant scoped — hit production immediately, and recovery meant `pull` + manual revert. The only existing dry-run concept gated *deletions*, not creates or updates. **What this fix does.** Adds a `--dry-run` flag to `push`. Instead of firing POST/PATCH/DELETE, the engine counts the intent and prints `[dry-run] would <METHOD> <endpoint> <body-preview>` per resource. The state file is never written (so synthetic IDs don't pollute it), and the end-of-run summary shows `Would create N, would update M, would delete K`. GETs still run because drift detection (Stack G) and operator preview both need to see current platform state. **Outcome you'll notice.** Run `npm run push -- <env> --dry-run` to preview any push. Especially useful for "did I scope this right?" and "is the pre-push lint reporting drift I should address first?" before the real push. Cheapest individual operator-safety win in the stack — no schema changes, no engine architecture moves. --- Operators today can't validate "is this push doing what I think it's doing" before it lands on prod. push.ts has a dry-run concept only for deletions; updates and creates fire immediately. Cheapest individual operator-safety win (improvements.md #5). - src/config.ts: parseFlags now accepts --dry-run alongside --force / --bootstrap. Exports DRY_RUN. - src/api.ts: vapiRequest gates POST/PATCH on DRY_RUN — counts the intent, prints `[dry-run] would <METHOD> <endpoint>` with a 120-char body preview, and returns a synthetic id so caller code threads through. vapiDelete gets the same treatment. GETs always run (drift preview needs them). - src/push.ts: banner ("🧪 DRY-RUN") at start, summary at end ("Would create N, would update M, would delete K"), saveState entirely skipped in dry-run so synthetic ids never leak into the state file. - AGENTS.md: document --dry-run in Available Commands. - tests/push-dry-run.test.ts: --dry-run is parse-accepted, banner prints, state file is NEVER created (verified end-to-end via spawn). - improvements.md: #5 → RESOLVED. Closes improvements.md #5. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
dhruva-reddy
added a commit
that referenced
this pull request
May 2, 2026
## ELI5 **Problem.** `npm run push -- <env>` immediately starts hitting the live dashboard. There was no way to ask "what would this push do?" before firing it. So a fat-fingered command — wrong org, missing file path, wide-scope push when you meant scoped — hit production immediately, and recovery meant `pull` + manual revert. The only existing dry-run concept gated *deletions*, not creates or updates. **What this fix does.** Adds a `--dry-run` flag to `push`. Instead of firing POST/PATCH/DELETE, the engine counts the intent and prints `[dry-run] would <METHOD> <endpoint> <body-preview>` per resource. The state file is never written (so synthetic IDs don't pollute it), and the end-of-run summary shows `Would create N, would update M, would delete K`. GETs still run because drift detection (Stack G) and operator preview both need to see current platform state. **Outcome you'll notice.** Run `npm run push -- <env> --dry-run` to preview any push. Especially useful for "did I scope this right?" and "is the pre-push lint reporting drift I should address first?" before the real push. Cheapest individual operator-safety win in the stack — no schema changes, no engine architecture moves. --- Operators today can't validate "is this push doing what I think it's doing" before it lands on prod. push.ts has a dry-run concept only for deletions; updates and creates fire immediately. Cheapest individual operator-safety win (improvements.md #5). - src/config.ts: parseFlags now accepts --dry-run alongside --force / --bootstrap. Exports DRY_RUN. - src/api.ts: vapiRequest gates POST/PATCH on DRY_RUN — counts the intent, prints `[dry-run] would <METHOD> <endpoint>` with a 120-char body preview, and returns a synthetic id so caller code threads through. vapiDelete gets the same treatment. GETs always run (drift preview needs them). - src/push.ts: banner ("🧪 DRY-RUN") at start, summary at end ("Would create N, would update M, would delete K"), saveState entirely skipped in dry-run so synthetic ids never leak into the state file. - AGENTS.md: document --dry-run in Available Commands. - tests/push-dry-run.test.ts: --dry-run is parse-accepted, banner prints, state file is NEVER created (verified end-to-end via spawn). - improvements.md: #5 → RESOLVED. Closes improvements.md #5. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds 5 structured output YAML files for automated post-call QA evaluation of Brent Council Housing Benefits calls (Starlight project).
Linear Issue
PRO-846
Files Created
resources/structuredOutputs/starlight-qa-engagement.ymlresources/structuredOutputs/starlight-qa-right-first-time.ymlresources/structuredOutputs/starlight-qa-signposting.ymlresources/structuredOutputs/starlight-qa-explaining.ymlresources/structuredOutputs/starlight-wrap-up-code.ymlSchema Design
Each QA structured output produces per-question evaluations with:
result:yes/no/not_applicablereasoning: explanation referencing the conversationevidence: array of{ message_text, timestamp }excerptsTop-level fields:
auto_fail:trueif ANY auto-fail question receivednooverall_pass:trueonly ifauto_failisfalsecategory_score: fraction string e.g."5/7"Auto-fail logic: If any auto-fail question in ANY of the 4 categories receives
no, the ENTIRE call evaluation fails. Each structured output sets its ownauto_failflag; the consuming application must check across all 4.Key Design Decisions
gpt-4.1attemperature: 0for deterministic, accurate QA evaluationnot_applicableguidanceassistant_ids: []: Empty because Starlight assistant configs are not yet in the gitops repo; will be populated when they are addedsecondary_classification_notesfield for pending tier definitionsLine Count Note
This PR is 778 lines, which exceeds the 500-line guideline. However, all additions are declarative YAML data files with repetitive per-question schema structure. The 5 files are logically atomic units that cannot be meaningfully split -- each represents a single structured output definition. No code was modified.
How to Test
yamlnpm packageschema.typeis always a simple string (not an array) per AGENTS.md warningnpm run push:dev), verify structured outputs appear in the dashboardValidation
name,type,target,description,model,schema,assistant_ids,workflow_ids)schema.typeconfirmed as simple string"object"in all files (avoids.toLowerCase()crash)result,reasoning, andevidencesub-propertiesnamefields followsnake_caseconvention per AGENTS.md