fix: cancel button gives immediate UI feedback#1376
Conversation
localStorage-backed Set<run_id> with 200-entry FIFO eviction and cross-tab sync via the storage event. Used by DRC-3411 cancel-UX fix to mark canceled runs sticky across reloads and tabs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: even-wei <evenwei@infuseai.io>
useRun.onCancel now patches the React Query cache to "Cancelled" synchronously, adds the run to the sticky cancel set, and fires the backend POST as silent fire-and-forget. Polling stops on the next render because enabled/refetchInterval derive from the sticky set. cancelRun now silently swallows network errors — the UI has already detached, so a failed POST cannot leave the user stuck. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: even-wei <evenwei@infuseai.io>
useRun no longer exposes aborting. Removing the stale lines avoids misleading future readers about the hook's contract. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: even-wei <evenwei@infuseai.io>
Adds a dedicated Cancelled render branch to RunView so a canceled run displays terminal state with no spinner or progress. Removes the isAborting prop from RunView/RunViewOss and the now-unused local state from CheckDetailOss — the cancel UX is now driven by the run's status, not a transient client flag. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: even-wei <evenwei@infuseai.io>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: even-wei <evenwei@infuseai.io>
When the user cancels a run, the warehouse may still complete the query and the backend may still write run.result. RunResultPane's result-only actions (Add to Check, Export/Share) gated on run.result presence would silently render. Gate them on the sticky cancel set too so a late-arriving result cannot leak through. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: even-wei <evenwei@infuseai.io>
RunListItem now consults the sticky cancel set and overrides the displayed status to Cancelled when the run.id is present. This keeps the user's cancel decision sticky across page reloads and tabs in the runs list view, matching the behavior in RunView/RunResultPane. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: even-wei <evenwei@infuseai.io>
cancel_run now splits into: - _mark_run_cancelled(run_id) -> (run, task): synchronous; flips run.status to CANCELLED before any adapter call. Cannot hang. - _invoke_task_cancel(task): synchronous wrapper around task.cancel(); may hang on adapter cancel — callers should wrap with a timeout when running on an async event loop. Status is now flipped BEFORE the adapter cancel runs, so a hung warehouse cancel no longer leaves the run stuck in RUNNING. The UI's polling read sees CANCELLED immediately, matching the new client- side cancel detach behavior. cancel_run remains as a backwards-compatible shim. Task 9 will switch the FastAPI handler to use the split helpers with asyncio.wait_for. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: even-wei <evenwei@infuseai.io>
Adds type annotations to _mark_run_cancelled, _invoke_task_cancel, and the cancel_run shim. Documents that the status flip in _mark_run_cancelled is subject to the same sub-µs GIL-window race as the guard documented on update_run_result (run_func.py:203-209). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: even-wei <evenwei@infuseai.io>
The cancel handler now:
- Calls _mark_run_cancelled synchronously to flip run status to
CANCELLED before any adapter call (cheap, instant UI feedback).
- Wraps the adapter cancel (which may hang on warehouses like
Snowflake) in asyncio.wait_for + asyncio.to_thread with a 2s
timeout, so a hung warehouse cannot block the FastAPI event loop.
- Always returns 200 — the UI is already client-side detached; a
failed, timed-out, or unknown-run cancel is a telemetry concern,
not a user-facing error.
- Emits a cancel_run telemetry event for every attempt with
outcome in {acknowledged, timed_out, errored}, plus run_id,
task_kind, adapter_type, elapsed_ms_at_cancel. Adapter teams can
use this to measure cancel-honor rates across warehouses.
Resolves the customer-reported behavior in DRC-3411 where a stuck
Snowflake cancel left the UI hanging with no signal.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: even-wei <evenwei@infuseai.io>
…ncel timeout/error Adds the thread-leak caveat to cancel_run_handler's docstring (Python has no thread-kill primitive; on timeout the worker thread runs until the adapter call returns). Strengthens the timed_out and errored tests to verify the corresponding logger.warning is emitted with the run_id / exception detail — so operator visibility doesn't regress. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: even-wei <evenwei@infuseai.io>
Codecov Report❌ Patch coverage is
... and 6 files with indirect coverage changes 🚀 New features to boost your workflow:
|
Code Review: PR #1376SHA Issues
Notes
Verification
The backend split ( The single Issue above is the only place where the "prefer false-loud over false-quiet" invariant from the PR description does not hold: a canceled row_count_diff that the warehouse honored slowly will still re-enter the lineage aggregation on any browser that wasn't the one that issued the cancel. A two-line guard in 🤖 Generated with Claude Code |
Late-arriving results from canceled runs were leaking into /api/runs/aggregate and rendering on lineage in browsers other than the cancel-issuing one (frontend sticky set is browser-local). The existing `not run.result` skip in materialize_run_results doesn't cover the race where a canceled row_count_diff completes after the cancel was issued — closing the false-quiet hole the PR was meant to seal. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: even-wei <evenwei@infuseai.io>
Code Review: PR #1376 (Re-review)SHA Prior Issue — Closed
if not run.result:
continue
if run.status == RunStatus.CANCELLED:
continueThis intercepts at the right level:
New Issues — NoneNo new edge cases introduced. The fix is additive (a continue-guard), runs before the Regression Test Coverage
Test is at the right layer (pure unit on the aggregator, no mocking gymnastics). Prior Notes — Status
Neither was promoted by the fix commit; the original NOTE characterization holds. Verification
VerdictGO. Commit The PR's "prefer false-loud over false-quiet" invariant now holds end-to-end: backend gates the aggregate at materialization, frontend gates the result-pane / runs-list rendering with the sticky cancel set, and any device that wasn't the cancel issuer will see the run absent from lineage rather than rendering its stale row counts. (Formal 🤖 Generated with Claude Code |
Adds the running-state screenshot referenced in the PR body Verification section. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: even-wei <evenwei@infuseai.io>
…ll override (DRC-3411)
After the user clicks Cancel, an in-flight waitRun(timeout=2) poll can
resolve later and write { status: 'Running' } back into the React Query
cache. Because waitRun does not accept an AbortSignal, React Query's
`enabled: false` cannot abort the in-flight request, so RunView would
briefly flip the UI back from Cancelled to Running.
Gate the Running render on `useCanceledRuns.has(run_id)`: once the user
has clicked Cancel for a run, RunView renders Cancelled regardless of
late `Running` writes to the cached run.status. This matches the user's
mental model (they clicked Cancel; the UI must stay Cancelled).
Frontend-only fix. Backend cancel path is unchanged.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: even-wei <evenwei@infuseai.io>
…lves to Running (DRC-3411) Verifies the invariant introduced by the userCanceled gating: when the user has clicked Cancel locally (run_id in useCanceledRuns) RunView must render Cancelled even if the cached run.status is still 'Running' and isRunning is true. Reproduces T1's MutationObserver finding in a vitest unit test: - Seeds localStorage with the canceled run id (matches how RunResultPane's CancelButton marks the run locally before the backend acknowledges). - Renders RunView with run.status='Running' + isRunning=true to mimic the in-flight waitRun poll writing Running back into React Query. - Asserts Cancelled UI, no spinner, no Cancel button. Confirmed FAILS against pre-fix HEAD (4db2ee0) and PASSES against the fix in the previous commit, so the test pins the regression. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: even-wei <evenwei@infuseai.io>
…me tab (DRC-3411) Each useCanceledRuns consumer keeps its own `ids` state. The browser's `storage` event only fires for CROSS-tab writes, so a same-tab `add()` by useRun.onCancel did not propagate to RunView's hook instance — RunView's `has(run_id)` kept returning false and the userCanceled gating from the previous commit could not engage. Fix: dispatch a custom 'recce:canceledRuns:changed' event from `add()` and have every hook instance refresh its ids on receipt. This makes the gating actually fire for live runs (not just tests that seed localStorage before mount). Found while capturing post-fix screenshots for PR #1376: even with the RunView gating in place, the in-flight waitRun poll still flipped the UI back to Running because RunView's hook instance never saw useRun's add(). Added regression coverage in both useCanceledRuns.test.tsx (direct sibling test) and RunView.test.tsx (full DRC-3411 race scenario with a sibling consumer calling add() AFTER RunView mounts). Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: even-wei <evenwei@infuseai.io>
Counterpart to the RunView and useCanceledRuns broadcast fixes. The status header in the run-result tab read `run.status` directly, so when an in-flight waitRun poll resolved and wrote 'Running' back into the cache, the header reverted from 'Cancelled' to 'Running' even though the result area (RunView) correctly stayed on Cancelled. Apply the same userCanceled gating in RunStatusAndDateDisplay: if the sticky cancel set contains the run_id, render 'Cancelled' regardless of the cached status. Surfaced while capturing post-fix screenshots for PR #1376 — the result pane showed Cancelled correctly, but the header drifted back to Running ~3s later. Regression test added in RunResultPane.test.tsx. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: even-wei <evenwei@infuseai.io>
Captured via standalone Playwright pattern (T5's approach for PR #1129) against duckdb fixture pr-1376-qd. Renderings: - SS-2 / SS-2b: Cancelled chip persists through the in-flight waitRun poll window (6s hold). Pre-fix this reverted to Running. - SS-3: post-reload — Query page returns to default editor state; runId still in localStorage (canceledRuns) for future references. - SS-4: second tab opens to default Query page (no cross-tab carryover of editor state, but localStorage canceledRuns is shared). - SS-5: RunResultPane shows Cancelled in both header and body. - SS-6: POST /cancel_run round-trip 2ms (well under 2.5s budget). - SS-7: PostHog cancel_run event deferred — OSS local has no dev telemetry; production capture tracked separately. SS-1 (pre-existing) shows the Running state captured prior to the fix. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: even-wei <evenwei@infuseai.io>
wcchang1115
left a comment
There was a problem hiding this comment.
Code Review: PR #1376 — DRC-3411 Cancel UX
SHA dcf1210a · Verdict NO-GO
FE-focused review per request (FE best practices and anti-patterns). Backend changes not reviewed.
Issues
-
js/packages/ui/src/hooks/useCanceledRuns.ts:78-97— Side effects insidesetIdsupdater violate React's purity contract.
Evidence:setIds((prev) => { ...; writeToStorage(trimmed); window.dispatchEvent(new Event(SAME_TAB_EVENT)); return trimmed; }). React requires state updaters to be pure because StrictMode invokes them twice to surface impurity; under StrictMode dev builds this causes doublelocalStoragewrites and double event dispatches, and the dispatched event re-enterssetIds(readFromStorage())on every hook instance synchronously (including the calling instance, becausedispatchEventis sync). Behaviorally idempotent today, but a real correctness risk under React 19 concurrent rendering.
Fix: hoist the side effects out of the updater — read storage, compute next, write,setIds(next), then dispatch. Pass A / React rules. -
js/packages/ui/src/components/check/CheckDetailOss.tsx:214-220— Check Detail Cancel bypasses the new optimistic flow.
Evidence:handleCancelcalls rawcancelRun(trackedRunId, apiClient)and returns; it never writesstatus: "Cancelled"to the cache, never callsuseCanceledRuns().add(runId), never flipsisRunning. So the sticky-set gate that Round 2 was added for (RunView/RunStatusAndDateDisplaygating onuseCanceledRuns.has) cannot engage on this surface —useRun(trackedRunId)is in the same component (line 140) buthandleCancelis defined independently. The in-flightwaitRun(timeout=2)poll race the PR explicitly fixes for the Query page can still flip Check Detail back to Running after click. Same shape exists injs/packages/ui/src/hooks/useMultiNodesAction.ts:521. If the customer-reported "profile/histogram queries silently do nothing" surface is reachable via the Checklist Check Detail panel, this PR does not actually fix that path.
Fix:const { onCancel: handleCancel } = useRun(trackedRunId)(already destructuringuseRunon line 140), and similarly routeuseMultiNodesAction.cancelthrough the optimistic flow. Pass C / cross-surface consistency. -
js/packages/ui/src/hooks/useRun.tsx:76-84— Synthetic Run usesas unknown as Runwith invalidtype.
Evidence: whenprevis undefined, the cache is seeded with{ run_id, status: "Cancelled", type: "unknown" } as unknown as Run."unknown"is not a member ofRunType, and the double-cast disables every downstream type check (registry lookupfindByRunType,isQueryRun,RunResultViewselection). Currently shielded only by theisCancelledshort-circuit inRunView; if any future consumer readsrun.typeahead of that gate, the value is meaningless. Either drop the synthetic write entirely (the sticky set +isRunning=falsealready cover the UI in this race) or add a real "cancelled" sentinel type. Pass A.
Notes
-
Per-instance state duplication. Each
useCanceledRuns()call creates an independentuseState<string[]>+ listener pair.RunListIteminstantiates one per row, plus one each inuseRun,RunView,RunResultPane,RunStatusAndDateDisplay,DefaultAddToCheckButton. The idiomatic React 18+ pattern for a shared external store isuseSyncExternalStore— one subscription model, no manual same-tab broadcast event. Refactor opportunity, not a defect. -
js/packages/ui/src/hooks/useRun.tsx:93—useCallbackdeps include unstable object.canceledRunsis{ has, add }returned as a fresh object literal every render (the hook itself doesn't memoize the return), soonCancelis recreated every render — defeating theuseCallback. Depend oncanceledRuns.add(already stable viauseCallback([])) or memoize the hook's return value. -
js/packages/ui/src/hooks/useRun.test.tsx:72-85— "does not throw if cancelRun POST rejects" is a trivial assertion.useRun.onCancelreturns a hard-codedPromise.resolve()regardless of thevoid cancelRun(...)outcome, soexpect(onCancel()).resolves.toBeUndefined()passes no matter what cancelRun does. The mock also replaces the realcancelRun(which has its own try/catch inapi/runs.ts:113-120), so the test creates an unhandled rejection while asserting on an unrelated synthetic promise. To actually verify rejection safety, mockApiClient.postinstead so the realcancelRuntry/catch runs. -
SSR hydration mismatch potential.
useState(() => readFromStorage())returns[]server-side and persisted data client-side. BothRunViewandRunResultPanecarry"use client", so today this is theoretical; ifuseCanceledRunsis ever pulled into a server-rendered surface, hydrate viauseEffectafter mount. -
Stale doc comment.
js/packages/ui/src/components/run/RunViewOss.tsx:111describes the Cancelled state asrun.status === "Cancelled"only; the actual baseRunViewalso gates onuseCanceledRuns.has(run_id). -
Screenshot/description mismatch.
docs/screenshots/drc-3411/SS-5-result-pane-cancelled.pngshows an empty Query editor identical to SS-3, not the "Cancelled in both header and body" state described in the PR table.SS-2-cancelled-chip-persists.pngactually demonstrates that invariant. Re-capture or relabel.
Verification
cd js && pnpm lint— cleancd js && pnpm vitest run packages/ui/src/hooks/useCanceledRuns.test.tsx packages/ui/src/hooks/useRun.test.tsx packages/ui/src/components/run/RunView.test.tsx packages/ui/src/components/run/RunResultPane.test.tsx packages/ui/src/components/run/RunList.test.tsx— 5 files / 19 tests passing
PR checklist
What type of PR is this?
fix— customer-reported UX bug.What this PR does / why we need it:
Customers (205DataLab on Snowflake via Paradime, previously also reported by Paul) saw the Cancel button on long-running profile/histogram queries silently do nothing — query keeps running server-side, no toast, no spinner change, no error. Three compounding issues:
aborting:booleanonly swapped a button label tucked at the bottom of the spinner stack. Polling continued at 50 ms.cancel_run_handler(async) calledcancel_runsynchronously;cancel_runinvokedtask.cancel()synchronously, which callsdbt_adapter.cancel(connection)— a network round-trip that hangs on some warehouses. Status flip toCANCELLEDhappened after the call returned, so on a hang it never fired.Per the customer-aligned framing: "the real query in the background is not important anymore. Our goal is to make sure the user can continue in Recce UI."
Frontend — pure UI detach. New
useCanceledRunshook owns alocalStorage-backed stickySet<run_id>(200-entry FIFO + cross-tabstorageevent sync + same-tab broadcast event).useRun.onCancelpatches the React Query cache viasetQueryDatatoCancelledimmediately, adds the run to the sticky set, setsisRunning=false, and fires the cancel POST with a silent catch. BecauseenabledandrefetchIntervalderive from the sticky set, polling stops on the next render.RunViewandRunResultPane's status header both gate their Running render onuseCanceledRuns.has(run_id)so a late-arrivingwaitRunpoll cannot revert the UI to Running (Round 2, see "Changelog" below).RunViewshows a dedicated Cancelled terminal branch (no more transient "Aborting…").RunResultPanegatesDefaultAddToCheckButtonand export controls on the sticky set so a late-arrivingrun.resultfrom a warehouse that ignored the cancel cannot leak through.RunListrow override renders Cancelled when the sticky set contains the run id.Backend — handler cannot hang. Split
cancel_runinto_mark_run_cancelled(run_id) -> (run, task)(synchronous, flips status toCANCELLEDfirst, cannot hang) and_invoke_task_cancel(task)(synchronous wrapper, may hang on adapter). The handler now runs the latter viaasyncio.wait_for(asyncio.to_thread(_invoke_task_cancel, task), timeout=2.0). Always returns 200. Telemetry: every cancel attempt emits acancel_runevent withoutcome∈{acknowledged, timed_out, errored}plusrun_id,task_kind,adapter_type,elapsed_ms_at_cancel. NotImplementedError is treated as acknowledged (UX already detached).Which issue(s) this PR fixes:
Resolves DRC-3411
Special notes for your reviewer:
workspace/docs/plans/2026-05-14-drc-3411-cancel-ux.md. Plan was reviewed by Codex (gpt-5.4-codex) and Gemini before implementation. Three corrections folded in from that review:AbortControlleralone won't stop the poll loop — must setenabled: false(or equivalent derivation). The existingwaitRun/ApiClientdidn't even accept React Query'ssignal.setQueryDataover a parallel Zustand store — matches existing Recce server-state convention.run.resultwould silently render —RunResultPane.tsx:521-522and similar surfaces gated onrun.resultpresence, not status. Sticky-set override extended to those surfaces.Follow-ups (not this PR):
cancel_runevent outcomes (data eng ticket).<Typography>; could use the existingRunStatusBadgefor visual continuity.Does this PR introduce a user-facing change?:
Yes.
Cancel button now produces immediate visible feedback regardless of whether the warehouse honors the cancel. Canceled runs stay canceled across page reloads and across browser tabs. The backend cancel endpoint now returns within 2 seconds even if the warehouse-side cancel call hangs.Verification
Tests
uv run pytest tests/apis/test_run_func.py tests/apis/test_cancel_run_handler.py): 43 passed (2026-05-21).test_run_func.py: CANCELLED-sentinel preservation inupdate_run_result,materialize_run_resultscancelled-row-count regression (38 tests).test_cancel_run_handler.py: acknowledged / timed_out under 3s / errored / NotImplementedError-as-acknowledged paths (5 tests).cd js && pnpm vitest run): 3725 passed, 5 skipped (2026-05-21). +4 new tests for the in-flight poll race:RunView.test.tsx: Cancelled rendered when user cancelled locally even if cached status is Running; flips to Cancelled when a sibling consumer callsadd()after mount (broadcast invariant).RunResultPane.test.tsx: status header shows Cancelled when run is in sticky cancel set (gating invariant on the header path).useCanceledRuns.test.tsx:add()broadcasts to sibling instances in the same tab (regression for the missing-broadcast bug surfaced during screenshot verification).pnpm lint && pnpm type:check): clean.4db2ee0d(the pre-fix HEAD before Round 2) and PASS against the current branch. The tests pin the actual bug, not the symptom.Screenshot evidence (post-fix renderings)
Captured via standalone Playwright pattern against the
pr-1376-qdduckdb fixture. The cancel-race repro uses a 16M-row CROSS JOIN with md5 hashing — the duckdb query runs ~13s, andcancel_runreturns 200 immediately even though duckdb does not honor cancel mid-query, so the in-flightwaitRunpoll keeps resolving with status="Running" for several seconds after the cancel. This is the exact race that caused the original Round 1 regression.waitRunpolls resolved to Running in cache) — UI still shows CancelledlocalStoragecanceled setcanceledRunslocalStorage shared cross-tabPOST /api/runs/{id}/cancelround-trip = 2 mscancel_runevent — DEFERRED (no dev telemetry; expected event shape documented)Changelog
useCanceledRunsset,setQueryData-based optimistic Cancelled,_mark_run_cancelled+_invoke_task_cancelsplit withasyncio.wait_for2 s timeout,cancel_runtelemetry event with outcome enum.waitRunpoll race:RunView: gate the Running render branch onuseCanceledRuns.has(run.run_id)so lateRunningwrites to the cache cannot revert the UI from Cancelled to Running (commit200d45b6).useCanceledRuns: dispatch arecce:canceledRuns:changedevent onadd()so sibling hook instances in the same tab observe the update — the browserstorageevent only fires cross-tab, so without this,RunView's hook instance kept staleidsand the gating could not engage (commit906f8d12).RunResultPane.RunStatusAndDateDisplay: same gating on the status header — without this, the result body correctly showed Cancelled but the header drifted back to Running (commitd8266187).Rollback / risk note
Bounded thread leak: on
timed_out, the worker thread keeps running until the adapter call returns (no Python thread-kill). Acceptable — cancel is low-frequency, leak bounded by warehouse connection timeout (documented in handler docstring). Rollback is single-commit revert; stickylocalStoragekey is namespaced and self-evicting (200-entry FIFO), so leftover state is harmless after revert.