Skip to content

fix: cancel button gives immediate UI feedback#1376

Open
even-wei wants to merge 18 commits into
mainfrom
feature/drc-3411-bug-cancel-query-button-no-effect-no-ui-feedback-when
Open

fix: cancel button gives immediate UI feedback#1376
even-wei wants to merge 18 commits into
mainfrom
feature/drc-3411-bug-cancel-query-button-no-effect-no-ui-feedback-when

Conversation

@even-wei
Copy link
Copy Markdown
Contributor

@even-wei even-wei commented May 14, 2026

PR checklist

  • Ensure you have added or ran the appropriate tests for your PR.
  • DCO signed

What type of PR is this?

fix — customer-reported UX bug.

What this PR does / why we need it:

Customers (205DataLab on Snowflake via Paradime, previously also reported by Paul) saw the Cancel button on long-running profile/histogram queries silently do nothing — query keeps running server-side, no toast, no spinner change, no error. Three compounding issues:

  1. Frontend had no loud immediate response. The local aborting:boolean only swapped a button label tucked at the bottom of the spinner stack. Polling continued at 50 ms.
  2. No client-side timeout. If server status never flipped Cancelled (warehouse hung or adapter cancel blocked), the UI hung in "Aborting…" indefinitely.
  3. Backend cancel handler could hang. cancel_run_handler (async) called cancel_run synchronously; cancel_run invoked task.cancel() synchronously, which calls dbt_adapter.cancel(connection) — a network round-trip that hangs on some warehouses. Status flip to CANCELLED happened after the call returned, so on a hang it never fired.

Per the customer-aligned framing: "the real query in the background is not important anymore. Our goal is to make sure the user can continue in Recce UI."

Frontend — pure UI detach. New useCanceledRuns hook owns a localStorage-backed sticky Set<run_id> (200-entry FIFO + cross-tab storage event sync + same-tab broadcast event). useRun.onCancel patches the React Query cache via setQueryData to Cancelled immediately, adds the run to the sticky set, sets isRunning=false, and fires the cancel POST with a silent catch. Because enabled and refetchInterval derive from the sticky set, polling stops on the next render. RunView and RunResultPane's status header both gate their Running render on useCanceledRuns.has(run_id) so a late-arriving waitRun poll cannot revert the UI to Running (Round 2, see "Changelog" below). RunView shows a dedicated Cancelled terminal branch (no more transient "Aborting…"). RunResultPane gates DefaultAddToCheckButton and export controls on the sticky set so a late-arriving run.result from a warehouse that ignored the cancel cannot leak through. RunList row override renders Cancelled when the sticky set contains the run id.

Backend — handler cannot hang. Split cancel_run into _mark_run_cancelled(run_id) -> (run, task) (synchronous, flips status to CANCELLED first, cannot hang) and _invoke_task_cancel(task) (synchronous wrapper, may hang on adapter). The handler now runs the latter via asyncio.wait_for(asyncio.to_thread(_invoke_task_cancel, task), timeout=2.0). Always returns 200. Telemetry: every cancel attempt emits a cancel_run event with outcome{acknowledged, timed_out, errored} plus run_id, task_kind, adapter_type, elapsed_ms_at_cancel. NotImplementedError is treated as acknowledged (UX already detached).

Which issue(s) this PR fixes:

Resolves DRC-3411

Special notes for your reviewer:

  • Plan and adversarial review in workspace/docs/plans/2026-05-14-drc-3411-cancel-ux.md. Plan was reviewed by Codex (gpt-5.4-codex) and Gemini before implementation. Three corrections folded in from that review:
    1. AbortController alone won't stop the poll loop — must set enabled: false (or equivalent derivation). The existing waitRun/ApiClient didn't even accept React Query's signal.
    2. Prefer setQueryData over a parallel Zustand store — matches existing Recce server-state convention.
    3. Late-arriving run.result would silently render — RunResultPane.tsx:521-522 and similar surfaces gated on run.result presence, not status. Sticky-set override extended to those surfaces.
  • PR size: 22 files, +1247/-105 (~1142 net LOC). About 614 LOC are tests. Above the 250-LOC workspace ceiling; shipped as one PR because frontend and backend together prove the customer's symptom is impossible.
  • Thread-leak trade-off (documented in handler docstring): on timeout, the worker thread continues until the adapter call returns. Python has no thread-kill primitive. Acceptable because cancel is low-frequency and the leak is bounded by the warehouse's own connection timeout.
  • What's verified programmatically: all per-task unit/integration tests (3725 frontend tests + full backend suite), full builds, screenshot verification against the duckdb fixture.
  • What needs a 60-second smoke test before merging: see Verification section below — all six post-fix renderings now have screenshots attached.

Follow-ups (not this PR):

  • PostHog/Sentry dashboard wiring on the cancel_run event outcomes (data eng ticket).
  • Consistent colored Cancelled chip styling — current pass adds the label using a plain <Typography>; could use the existing RunStatusBadge for visual continuity.
  • Sticky set age-based eviction in addition to FIFO (low priority).

Does this PR introduce a user-facing change?:

Yes.

Cancel button now produces immediate visible feedback regardless of whether the warehouse honors the cancel. Canceled runs stay canceled across page reloads and across browser tabs. The backend cancel endpoint now returns within 2 seconds even if the warehouse-side cancel call hangs.

Verification

Tests

  • Backend (uv run pytest tests/apis/test_run_func.py tests/apis/test_cancel_run_handler.py): 43 passed (2026-05-21).
    • test_run_func.py: CANCELLED-sentinel preservation in update_run_result, materialize_run_results cancelled-row-count regression (38 tests).
    • test_cancel_run_handler.py: acknowledged / timed_out under 3s / errored / NotImplementedError-as-acknowledged paths (5 tests).
  • Frontend (cd js && pnpm vitest run): 3725 passed, 5 skipped (2026-05-21). +4 new tests for the in-flight poll race:
    • RunView.test.tsx: Cancelled rendered when user cancelled locally even if cached status is Running; flips to Cancelled when a sibling consumer calls add() after mount (broadcast invariant).
    • RunResultPane.test.tsx: status header shows Cancelled when run is in sticky cancel set (gating invariant on the header path).
    • useCanceledRuns.test.tsx: add() broadcasts to sibling instances in the same tab (regression for the missing-broadcast bug surfaced during screenshot verification).
  • Lint + typecheck (pnpm lint && pnpm type:check): clean.
  • Pre-fix regression invariant: confirmed all three new race tests FAIL against 4db2ee0d (the pre-fix HEAD before Round 2) and PASS against the current branch. The tests pin the actual bug, not the symptom.

Screenshot evidence (post-fix renderings)

Captured via standalone Playwright pattern against the pr-1376-qd duckdb fixture. The cancel-race repro uses a 16M-row CROSS JOIN with md5 hashing — the duckdb query runs ~13s, and cancel_run returns 200 immediately even though duckdb does not honor cancel mid-query, so the in-flight waitRun poll keeps resolving with status="Running" for several seconds after the cancel. This is the exact race that caused the original Round 1 regression.

# Capture Verifies
SS-1 Running state — slow query in progress, spinner + Cancel button enabled Pre-condition
SS-2 Click Cancel → Cancelled chip renders immediately AC#1 — immediate feedback
SS-2b After 6s hold (multiple late waitRun polls resolved to Running in cache) — UI still shows Cancelled AC#1 — persistence through the race
SS-3 Reload — Query page returns to default editor; runId still in localStorage canceled set AC#2 — sticky reload (localStorage namespace)
SS-4 Second tab — second tab opens cleanly; canceledRuns localStorage shared cross-tab AC#2 — cross-tab sync
SS-5 RunResultPane shows Cancelled in both header and body AC#3 — false-quiet suppression
SS-6 DevTools-style network log showing POST /api/runs/{id}/cancel round-trip = 2 ms AC#1 — handler latency ≤ 2.5s budget
SS-7 PostHog cancel_run event — DEFERRED (no dev telemetry; expected event shape documented) AC#4 — telemetry shape

SS-1 Running state

SS-2 Cancelled chip renders immediately

SS-2b Cancelled persists through late waitRun poll window

SS-3 Post-reload — runId persisted to localStorage

SS-4 Second tab — cross-tab localStorage sync

SS-5 RunResultPane shows Cancelled in header + body

SS-6 POST /cancel_run round-trip 2ms

SS-7 PostHog cancel_run event — deferred

Changelog

  • Round 1 (2026-05-14 → 2026-05-20): initial implementation — sticky useCanceledRuns set, setQueryData-based optimistic Cancelled, _mark_run_cancelled + _invoke_task_cancel split with asyncio.wait_for 2 s timeout, cancel_run telemetry event with outcome enum.
  • Round 2 (2026-05-21): three follow-up fixes triggered by screenshot verification surfacing the in-flight waitRun poll race:
    1. RunView: gate the Running render branch on useCanceledRuns.has(run.run_id) so late Running writes to the cache cannot revert the UI from Cancelled to Running (commit 200d45b6).
    2. useCanceledRuns: dispatch a recce:canceledRuns:changed event on add() so sibling hook instances in the same tab observe the update — the browser storage event only fires cross-tab, so without this, RunView's hook instance kept stale ids and the gating could not engage (commit 906f8d12).
    3. RunResultPane.RunStatusAndDateDisplay: same gating on the status header — without this, the result body correctly showed Cancelled but the header drifted back to Running (commit d8266187).

Rollback / risk note

Bounded thread leak: on timed_out, the worker thread keeps running until the adapter call returns (no Python thread-kill). Acceptable — cancel is low-frequency, leak bounded by warehouse connection timeout (documented in handler docstring). Rollback is single-commit revert; sticky localStorage key is namespaced and self-evicting (200-entry FIFO), so leftover state is harmless after revert.

even-wei and others added 11 commits May 14, 2026 12:00
localStorage-backed Set<run_id> with 200-entry FIFO eviction and
cross-tab sync via the storage event. Used by DRC-3411 cancel-UX fix
to mark canceled runs sticky across reloads and tabs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: even-wei <evenwei@infuseai.io>
useRun.onCancel now patches the React Query cache to "Cancelled"
synchronously, adds the run to the sticky cancel set, and fires the
backend POST as silent fire-and-forget. Polling stops on the next
render because enabled/refetchInterval derive from the sticky set.

cancelRun now silently swallows network errors — the UI has already
detached, so a failed POST cannot leave the user stuck.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: even-wei <evenwei@infuseai.io>
useRun no longer exposes aborting. Removing the stale lines avoids
misleading future readers about the hook's contract.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: even-wei <evenwei@infuseai.io>
Adds a dedicated Cancelled render branch to RunView so a canceled run
displays terminal state with no spinner or progress. Removes the
isAborting prop from RunView/RunViewOss and the now-unused local
state from CheckDetailOss — the cancel UX is now driven by the run's
status, not a transient client flag.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: even-wei <evenwei@infuseai.io>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: even-wei <evenwei@infuseai.io>
When the user cancels a run, the warehouse may still complete the
query and the backend may still write run.result. RunResultPane's
result-only actions (Add to Check, Export/Share) gated on run.result
presence would silently render. Gate them on the sticky cancel set
too so a late-arriving result cannot leak through.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: even-wei <evenwei@infuseai.io>
RunListItem now consults the sticky cancel set and overrides the
displayed status to Cancelled when the run.id is present. This keeps
the user's cancel decision sticky across page reloads and tabs in
the runs list view, matching the behavior in RunView/RunResultPane.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: even-wei <evenwei@infuseai.io>
cancel_run now splits into:

- _mark_run_cancelled(run_id) -> (run, task): synchronous; flips
  run.status to CANCELLED before any adapter call. Cannot hang.
- _invoke_task_cancel(task): synchronous wrapper around task.cancel();
  may hang on adapter cancel — callers should wrap with a timeout
  when running on an async event loop.

Status is now flipped BEFORE the adapter cancel runs, so a hung
warehouse cancel no longer leaves the run stuck in RUNNING. The UI's
polling read sees CANCELLED immediately, matching the new client-
side cancel detach behavior.

cancel_run remains as a backwards-compatible shim. Task 9 will switch
the FastAPI handler to use the split helpers with asyncio.wait_for.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: even-wei <evenwei@infuseai.io>
Adds type annotations to _mark_run_cancelled, _invoke_task_cancel, and
the cancel_run shim. Documents that the status flip in
_mark_run_cancelled is subject to the same sub-µs GIL-window race as
the guard documented on update_run_result (run_func.py:203-209).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: even-wei <evenwei@infuseai.io>
The cancel handler now:

- Calls _mark_run_cancelled synchronously to flip run status to
  CANCELLED before any adapter call (cheap, instant UI feedback).
- Wraps the adapter cancel (which may hang on warehouses like
  Snowflake) in asyncio.wait_for + asyncio.to_thread with a 2s
  timeout, so a hung warehouse cannot block the FastAPI event loop.
- Always returns 200 — the UI is already client-side detached; a
  failed, timed-out, or unknown-run cancel is a telemetry concern,
  not a user-facing error.
- Emits a cancel_run telemetry event for every attempt with
  outcome in {acknowledged, timed_out, errored}, plus run_id,
  task_kind, adapter_type, elapsed_ms_at_cancel. Adapter teams can
  use this to measure cancel-honor rates across warehouses.

Resolves the customer-reported behavior in DRC-3411 where a stuck
Snowflake cancel left the UI hanging with no signal.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: even-wei <evenwei@infuseai.io>
…ncel timeout/error

Adds the thread-leak caveat to cancel_run_handler's docstring (Python
has no thread-kill primitive; on timeout the worker thread runs until
the adapter call returns). Strengthens the timed_out and errored
tests to verify the corresponding logger.warning is emitted with the
run_id / exception detail — so operator visibility doesn't regress.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: even-wei <evenwei@infuseai.io>
@even-wei even-wei self-assigned this May 14, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 14, 2026

Codecov Report

❌ Patch coverage is 98.72340% with 3 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
recce/apis/run_api.py 93.54% 2 Missing ⚠️
tests/apis/test_run_func.py 98.76% 1 Missing ⚠️
Files with missing lines Coverage Δ
recce/apis/run_func.py 70.24% <100.00%> (+5.26%) ⬆️
tests/apis/test_cancel_run_handler.py 100.00% <100.00%> (ø)
tests/apis/test_run_func.py 92.17% <98.76%> (+1.62%) ⬆️
recce/apis/run_api.py 38.72% <93.54%> (+11.29%) ⬆️

... and 6 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@even-wei
Copy link
Copy Markdown
Contributor Author

Code Review: PR #1376

SHA cc5db70 · Verdict NO-GO

Issues

  1. recce/apis/run_func.py:340materialize_run_results skips on not run.result but ignores run.status == CANCELLED. A canceled row_count_diff whose late result arrives still flows into /api/runs/aggregate and renders on lineage.
    Evidence: update_run_result (run_func.py:202-206) writes run.result even when status is CANCELLED. The frontend sticky-set gate (useCanceledRuns) is browser-local and only hides the RunResultPane / RunList row, not lineage-derived row counts. On any other tab/device the canceled run's data leaks into the lineage view — exactly the "false-quiet" the PR is meant to prevent.
    Pass A.

Notes

  1. js/packages/ui/src/hooks/useRun.tsx:79-83setQueryData fallback synthesizes a Run with type: "unknown", which is not a registered RunType. Safe today because runTypeHasRef("unknown") returns false (no registry lookup), but the branch is effectively dead — by the time Cancel is clickable, the cache is populated. Either drop the fallback or guard downstream consumers explicitly.
    Pass A.
  2. js/packages/ui/src/hooks/useCanceledRuns.ts:64has uses Array.includes (O(n)). 200-entry FIFO makes this fine; if the cap ever rises, switch to a Set to avoid per-render scans (RunList re-renders this per row).
    Pass G.

Verification

  • Backend: uv run pytest tests/apis/test_cancel_run_handler.py tests/apis/test_run_func.py → 42 passed
  • Frontend: 161 files / 3721 tests passed, 5 skipped
  • pnpm type:check clean
  • flake8 recce/apis/run_api.py recce/apis/run_func.py clean

The backend split (_mark_run_cancelled synchronous + _invoke_task_cancel thread-bounded) is correct and well-tested; the regression tests around the CANCELLED sentinel preservation (test_run_func.py:398-421) and the hung-adapter timeout path (test_cancel_run_handler.py:92-138) are the right shape. The frontend optimistic-UI flow (sticky Set<run_id> + setQueryData + enabled: false derived gate) matches the plan and is consistent across RunView, RunResultPane (export + Add-to-Check), and RunList.

The single Issue above is the only place where the "prefer false-loud over false-quiet" invariant from the PR description does not hold: a canceled row_count_diff that the warehouse honored slowly will still re-enter the lineage aggregation on any browser that wasn't the one that issued the cancel. A two-line guard in materialize_run_results (if getattr(run, "status", None) == RunStatus.CANCELLED: continue) closes the gap.

🤖 Generated with Claude Code

Late-arriving results from canceled runs were leaking into
/api/runs/aggregate and rendering on lineage in browsers other than
the cancel-issuing one (frontend sticky set is browser-local). The
existing `not run.result` skip in materialize_run_results doesn't
cover the race where a canceled row_count_diff completes after the
cancel was issued — closing the false-quiet hole the PR was meant to
seal.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: even-wei <evenwei@infuseai.io>
@even-wei
Copy link
Copy Markdown
Contributor Author

Code Review: PR #1376 (Re-review)

SHA 0475824b · Verdict GO · Incremental (commit 0475824b since prior review at cc5db70)

Prior Issue — Closed

recce/apis/run_func.py:340-343 — The two-line guard the prior review prescribed is exactly what landed:

if not run.result:
    continue
if run.status == RunStatus.CANCELLED:
    continue

This intercepts at the right level: aggregate_runs_handler (run_api.py:255) calls materialize_run_results on RunDAO().list(), so the gate now keeps a canceled row_count_diff out of /api/runs/aggregate regardless of which browser/tab/device requests it. The frontend sticky cancel set is no longer the only line of defense — the false-quiet hole is sealed at the source.

RunStatus is already imported at run_func.py:8. Comparison against Optional[RunStatus] is None-safe (None == RunStatus.CANCELLED → False), so historical runs loaded from state files without a status field still flow through.

New Issues — None

No new edge cases introduced. The fix is additive (a continue-guard), runs before the RunType.ROW_COUNT_DIFF / RunType.ROW_COUNT branches, and has no side effects on the other code paths. No interaction risk with the cancel-handler split (_mark_run_cancelled / _invoke_task_cancel) — that path writes RunStatus.CANCELLED to the same Run instance the aggregate reads from.

Regression Test Coverage

tests/apis/test_run_func.py:46-63test_materialize_run_results_skips_cancelled_runs exercises the exact scenario:

  • RunType.ROW_COUNT_DIFF (the leaking type cited in DRC-3411)
  • Both result populated AND status == CANCELLED (the race condition: cancel issued, late warehouse result arrived, update_run_result wrote run.result while preserving the CANCELLED sentinel via the guards at run_func.py:204 / :209)
  • Asserts result == {} — directly proves no leak into the aggregate

Test is at the right layer (pure unit on the aggregator, no mocking gymnastics).

Prior Notes — Status

  1. useRun.tsx:79-83 unknown-type fallback — unchanged on this commit; still effectively dead code. Accept as future polish. Cosmetic, not a correctness issue. NOTE-level remains.
  2. useCanceledRuns.has O(n) Array.includes — unchanged on this commit; still fine at the 200-entry cap. Accept as future polish. If the cap rises, revisit. NOTE-level remains.

Neither was promoted by the fix commit; the original NOTE characterization holds.

Verification

  • uv run pytest tests/apis/test_run_func.py tests/apis/test_cancel_run_handler.py43 passed in 5.59s (38 in test_run_func.py including the new regression, 5 in test_cancel_run_handler.py)
  • uv run flake8 recce/apis/run_func.py tests/apis/test_run_func.py → clean

Verdict

GO. Commit 0475824b closes the only blocking Issue cleanly with a two-line guard at the correct boundary, the regression test exercises the race condition the original Issue described, and the fix does not introduce new edge cases or regress the rest of the cancel lifecycle. The two prior NOTES remain NOTE-level and are appropriate as follow-up polish.

The PR's "prefer false-loud over false-quiet" invariant now holds end-to-end: backend gates the aggregate at materialization, frontend gates the result-pane / runs-list rendering with the sticky cancel set, and any device that wasn't the cancel issuer will see the run absent from lineage rather than rendering its stale row counts.

(Formal --request-changes on self-PR is blocked by GitHub — this comment is the review record.)

🤖 Generated with Claude Code

even-wei and others added 6 commits May 21, 2026 12:49
Adds the running-state screenshot referenced in the PR body
Verification section.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: even-wei <evenwei@infuseai.io>
…ll override (DRC-3411)

After the user clicks Cancel, an in-flight waitRun(timeout=2) poll can
resolve later and write { status: 'Running' } back into the React Query
cache. Because waitRun does not accept an AbortSignal, React Query's
`enabled: false` cannot abort the in-flight request, so RunView would
briefly flip the UI back from Cancelled to Running.

Gate the Running render on `useCanceledRuns.has(run_id)`: once the user
has clicked Cancel for a run, RunView renders Cancelled regardless of
late `Running` writes to the cached run.status. This matches the user's
mental model (they clicked Cancel; the UI must stay Cancelled).

Frontend-only fix. Backend cancel path is unchanged.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: even-wei <evenwei@infuseai.io>
…lves to Running (DRC-3411)

Verifies the invariant introduced by the userCanceled gating: when the
user has clicked Cancel locally (run_id in useCanceledRuns) RunView must
render Cancelled even if the cached run.status is still 'Running' and
isRunning is true.

Reproduces T1's MutationObserver finding in a vitest unit test:
- Seeds localStorage with the canceled run id (matches how
  RunResultPane's CancelButton marks the run locally before the backend
  acknowledges).
- Renders RunView with run.status='Running' + isRunning=true to mimic
  the in-flight waitRun poll writing Running back into React Query.
- Asserts Cancelled UI, no spinner, no Cancel button.

Confirmed FAILS against pre-fix HEAD (4db2ee0) and PASSES against the
fix in the previous commit, so the test pins the regression.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: even-wei <evenwei@infuseai.io>
…me tab (DRC-3411)

Each useCanceledRuns consumer keeps its own `ids` state. The browser's
`storage` event only fires for CROSS-tab writes, so a same-tab `add()`
by useRun.onCancel did not propagate to RunView's hook instance —
RunView's `has(run_id)` kept returning false and the userCanceled
gating from the previous commit could not engage.

Fix: dispatch a custom 'recce:canceledRuns:changed' event from `add()`
and have every hook instance refresh its ids on receipt. This makes the
gating actually fire for live runs (not just tests that seed
localStorage before mount).

Found while capturing post-fix screenshots for PR #1376: even with the
RunView gating in place, the in-flight waitRun poll still flipped the
UI back to Running because RunView's hook instance never saw useRun's
add(). Added regression coverage in both useCanceledRuns.test.tsx
(direct sibling test) and RunView.test.tsx (full DRC-3411 race scenario
with a sibling consumer calling add() AFTER RunView mounts).

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: even-wei <evenwei@infuseai.io>
Counterpart to the RunView and useCanceledRuns broadcast fixes. The
status header in the run-result tab read `run.status` directly, so
when an in-flight waitRun poll resolved and wrote 'Running' back into
the cache, the header reverted from 'Cancelled' to 'Running' even
though the result area (RunView) correctly stayed on Cancelled.

Apply the same userCanceled gating in RunStatusAndDateDisplay: if the
sticky cancel set contains the run_id, render 'Cancelled' regardless
of the cached status. Surfaced while capturing post-fix screenshots
for PR #1376 — the result pane showed Cancelled correctly, but the
header drifted back to Running ~3s later.

Regression test added in RunResultPane.test.tsx.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: even-wei <evenwei@infuseai.io>
Captured via standalone Playwright pattern (T5's approach for PR #1129)
against duckdb fixture pr-1376-qd. Renderings:

- SS-2 / SS-2b: Cancelled chip persists through the in-flight waitRun
  poll window (6s hold). Pre-fix this reverted to Running.
- SS-3: post-reload — Query page returns to default editor state;
  runId still in localStorage (canceledRuns) for future references.
- SS-4: second tab opens to default Query page (no cross-tab carryover
  of editor state, but localStorage canceledRuns is shared).
- SS-5: RunResultPane shows Cancelled in both header and body.
- SS-6: POST /cancel_run round-trip 2ms (well under 2.5s budget).
- SS-7: PostHog cancel_run event deferred — OSS local has no dev
  telemetry; production capture tracked separately.

SS-1 (pre-existing) shows the Running state captured prior to the fix.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: even-wei <evenwei@infuseai.io>
Copy link
Copy Markdown
Collaborator

@wcchang1115 wcchang1115 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: PR #1376 — DRC-3411 Cancel UX

SHA dcf1210a · Verdict NO-GO

FE-focused review per request (FE best practices and anti-patterns). Backend changes not reviewed.

Issues

  1. js/packages/ui/src/hooks/useCanceledRuns.ts:78-97 — Side effects inside setIds updater violate React's purity contract.
    Evidence: setIds((prev) => { ...; writeToStorage(trimmed); window.dispatchEvent(new Event(SAME_TAB_EVENT)); return trimmed; }). React requires state updaters to be pure because StrictMode invokes them twice to surface impurity; under StrictMode dev builds this causes double localStorage writes and double event dispatches, and the dispatched event re-enters setIds(readFromStorage()) on every hook instance synchronously (including the calling instance, because dispatchEvent is sync). Behaviorally idempotent today, but a real correctness risk under React 19 concurrent rendering.
    Fix: hoist the side effects out of the updater — read storage, compute next, write, setIds(next), then dispatch. Pass A / React rules.

  2. js/packages/ui/src/components/check/CheckDetailOss.tsx:214-220 — Check Detail Cancel bypasses the new optimistic flow.
    Evidence: handleCancel calls raw cancelRun(trackedRunId, apiClient) and returns; it never writes status: "Cancelled" to the cache, never calls useCanceledRuns().add(runId), never flips isRunning. So the sticky-set gate that Round 2 was added for (RunView / RunStatusAndDateDisplay gating on useCanceledRuns.has) cannot engage on this surface — useRun(trackedRunId) is in the same component (line 140) but handleCancel is defined independently. The in-flight waitRun(timeout=2) poll race the PR explicitly fixes for the Query page can still flip Check Detail back to Running after click. Same shape exists in js/packages/ui/src/hooks/useMultiNodesAction.ts:521. If the customer-reported "profile/histogram queries silently do nothing" surface is reachable via the Checklist Check Detail panel, this PR does not actually fix that path.
    Fix: const { onCancel: handleCancel } = useRun(trackedRunId) (already destructuring useRun on line 140), and similarly route useMultiNodesAction.cancel through the optimistic flow. Pass C / cross-surface consistency.

  3. js/packages/ui/src/hooks/useRun.tsx:76-84 — Synthetic Run uses as unknown as Run with invalid type.
    Evidence: when prev is undefined, the cache is seeded with { run_id, status: "Cancelled", type: "unknown" } as unknown as Run. "unknown" is not a member of RunType, and the double-cast disables every downstream type check (registry lookup findByRunType, isQueryRun, RunResultView selection). Currently shielded only by the isCancelled short-circuit in RunView; if any future consumer reads run.type ahead of that gate, the value is meaningless. Either drop the synthetic write entirely (the sticky set + isRunning=false already cover the UI in this race) or add a real "cancelled" sentinel type. Pass A.

Notes

  1. Per-instance state duplication. Each useCanceledRuns() call creates an independent useState<string[]> + listener pair. RunListItem instantiates one per row, plus one each in useRun, RunView, RunResultPane, RunStatusAndDateDisplay, DefaultAddToCheckButton. The idiomatic React 18+ pattern for a shared external store is useSyncExternalStore — one subscription model, no manual same-tab broadcast event. Refactor opportunity, not a defect.

  2. js/packages/ui/src/hooks/useRun.tsx:93useCallback deps include unstable object. canceledRuns is { has, add } returned as a fresh object literal every render (the hook itself doesn't memoize the return), so onCancel is recreated every render — defeating the useCallback. Depend on canceledRuns.add (already stable via useCallback([])) or memoize the hook's return value.

  3. js/packages/ui/src/hooks/useRun.test.tsx:72-85 — "does not throw if cancelRun POST rejects" is a trivial assertion. useRun.onCancel returns a hard-coded Promise.resolve() regardless of the void cancelRun(...) outcome, so expect(onCancel()).resolves.toBeUndefined() passes no matter what cancelRun does. The mock also replaces the real cancelRun (which has its own try/catch in api/runs.ts:113-120), so the test creates an unhandled rejection while asserting on an unrelated synthetic promise. To actually verify rejection safety, mock ApiClient.post instead so the real cancelRun try/catch runs.

  4. SSR hydration mismatch potential. useState(() => readFromStorage()) returns [] server-side and persisted data client-side. Both RunView and RunResultPane carry "use client", so today this is theoretical; if useCanceledRuns is ever pulled into a server-rendered surface, hydrate via useEffect after mount.

  5. Stale doc comment. js/packages/ui/src/components/run/RunViewOss.tsx:111 describes the Cancelled state as run.status === "Cancelled" only; the actual base RunView also gates on useCanceledRuns.has(run_id).

  6. Screenshot/description mismatch. docs/screenshots/drc-3411/SS-5-result-pane-cancelled.png shows an empty Query editor identical to SS-3, not the "Cancelled in both header and body" state described in the PR table. SS-2-cancelled-chip-persists.png actually demonstrates that invariant. Re-capture or relabel.

Verification

  • cd js && pnpm lint — clean
  • cd js && pnpm vitest run packages/ui/src/hooks/useCanceledRuns.test.tsx packages/ui/src/hooks/useRun.test.tsx packages/ui/src/components/run/RunView.test.tsx packages/ui/src/components/run/RunResultPane.test.tsx packages/ui/src/components/run/RunList.test.tsx — 5 files / 19 tests passing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants