Skip to content

v0.6.86: CORS updates, OAuth MCP, navigation pinning dynamic pages, google slides endpoints, DB access pattern improvements#4690

Merged
waleedlatif1 merged 16 commits into
mainfrom
staging
May 21, 2026
Merged

v0.6.86: CORS updates, OAuth MCP, navigation pinning dynamic pages, google slides endpoints, DB access pattern improvements#4690
waleedlatif1 merged 16 commits into
mainfrom
staging

Conversation

@waleedlatif1
Copy link
Copy Markdown
Collaborator

@waleedlatif1 waleedlatif1 commented May 21, 2026

TheodoreSpeaks and others added 15 commits May 20, 2026 05:44
* feat(table): chunked dispatcher for workflow-column runs

Replaces the all-rows-at-once runWorkflowColumn with a row-window dispatcher
backed by a new table_run_dispatches row. Each user click inserts a dispatch
row and triggers a trigger.dev task that crawls the table 20 rows at a time,
re-enqueueing itself between windows. The HTTP/Mothership entrypoints return
{ dispatchId } immediately instead of holding the request open for minutes
on multi-thousand-row dispatches.

- Per-row cancel stamps cancelledAt; the dispatcher skips cells whose
  cancelledAt > dispatch.requestedAt so a mid-cascade cancel sticks even
  under isManualRun.
- Table-wide cancel marks active dispatches cancelled atomically so the
  dispatcher bails on its next iteration.
- New 'dispatch' SSE event variant plumbed; client ignores for v1.

* fix(table): eager bulk clear on column run so cells flip immediately

Run-column with run-mode 'all' wasn't visually flipping rows that already
had data — the cell renderer's "value wins" branch kept showing the prior
output behind the queued/running state. The dispatcher only cleared one
window of rows at a time, so most of the column stayed stale until the
cursor walked to it.

Now:
- Dispatcher's `pending → dispatching` transition runs a single SQL UPDATE
  that wipes targeted `data` output columns and `executions[gid]` across
  every targeted row (mode-aware: 'incomplete' skips fully-filled rows).
- Per-window clear in `dispatcherStep` is gone — rows are pre-cleared,
  the loop only filters cancel tombstones / unmet deps and enqueues.
- Optimistic patch in `useRunColumn` mirrors the bulk clear by nulling
  output values in the cached row, so the UI flips queued/running
  instantly without waiting for the SSE catch-up.

* fix(table): bulk clear honors in-flight execs under mode: 'incomplete'

The eager bulk clear for mode: 'incomplete' only skipped rows that were
already fully filled, so two overlapping dispatches could race — dispatch B
would nuke executions[gid] on a row dispatch A had just stamped 'queued',
flickering the cell and potentially confusing the worker.

Skip any row whose targeted group is currently queued/running/pending — an
'incomplete' run shouldn't touch what another dispatch is actively working
on. The per-walk 'in-flight' eligibility skip already handles rows that
flip in-flight between the clear and the cursor reaching them.

* refactor(table): dispatcher uses batchTriggerAndWait + tag-based cancel

Switch the per-window cell fan-out from fire-and-forget tasks.trigger to
tasks.batchTriggerAndWait. The dispatcher is now a single long-lived
trigger.dev task that loops dispatcherStep until the table is exhausted;
trigger.dev CRIU-checkpoints the parent during each wait so we don't pay
compute while cells execute. Queue depth is bounded at WINDOW_SIZE per
dispatch — no more flooding trigger.dev with a million queued runs.

- dispatcher.ts builds payloads via the new shared buildPendingRuns helper
  and calls tasks.batchTriggerAndWait directly. Pre-stamps each cell to
  `queued` (jobId=null) so the UI flips instantly.
- table-run-dispatcher.ts is now a plain while-true loop. No
  RUN_BUDGET_MS, no self-re-enqueue, no cold-start tax per window.

Cancel:
- New cancelCellRunsByTags(tags) paginates runs.list + runs.cancel(id).
- cancelWorkflowGroupRuns fires the tag-sweep alongside the per-jobId
  queue.cancelJob path (preserved for auto-fire cells that have real
  jobIds from single tasks.trigger calls).
- Trigger.dev acks the cancel → batchTriggerAndWait resumes → dispatcher
  observes the dispatch-row cancel flag → exits.

Side fixes:
- getAsyncBackendType returns 'trigger-dev' whenever taskContext.isInsideTask
  is true, regardless of TRIGGER_DEV_ENABLED env. The preview/dev-sim
  worker silently routing cell jobs to DatabaseJobQueue (no poller) is
  fixed without any env config change.
- runWorkflowColumn skips the dispatcher entirely when trigger.dev is
  disabled, running cells inline via DatabaseJobQueue.runInline. HTTP
  response returns dispatchId: null in that mode.
- runColumnContract response schema updated to dispatchId.nullable().

* fix(table): show Stop button on optimistic-pending row cells

isExecInFlight required a jobId for `pending` status, gating it as "real
backend pending" vs "optimistic flag only." The row-gutter Stop button
keyed on this — so a freshly clicked Play sat as `pending` (no jobId) and
the user couldn't cancel it until the server-side `queued` stamp arrived
via SSE. With the dispatcher pre-batch stamping cells as `queued` (not
`pending`) and no per-cell jobIds under batchTriggerAndWait, the gap was
worse.

Drop the jobId requirement. `pending` now counts as in-flight everywhere.
Cancel writes `cancelled` to the cell exec authoritatively whether or not
a real trigger.dev run exists yet — cancelling an optimistic cell means
"don't run this," which is correct.

Also collapse isOptimisticInFlight into isExecInFlight since the two
helpers are now identical.

* refactor(table): loop-in-cell cascade + dispatcher-everywhere routing

Two coupled changes:

1. Cell-task runs the row's full cascade in-process. executeWorkflowGroupCellJob
   acquires a Redis lock per (tableId, rowId) with heartbeat (10s/30s TTL),
   then loops through eligible workflow groups for the row. One cell-task =
   one row's full cascade, not N. Resume worker holds the same lock and
   continues the cascade after a HITL resume. Shared withCascadeLock helper
   in lib/table/cascade-lock.ts.

2. Every cell-enqueue goes through the dispatcher. The implicit
   scheduleRunsForRows reactor in service.ts is removed — 8 callsites
   (insertRow, batchInsertRows, upsertRow, updateRowsByFilter,
   batchUpdateRows, addWorkflowGroup, updateWorkflowGroup) now fire
   runWorkflowColumn with mode: 'incomplete', isManualRun: false. HTTP
   routes that call updateRow directly also fire runWorkflowColumn
   afterwards. scheduleRunsForTable / scheduleRunsForRowIds deleted;
   scheduleRunsForRows demoted to private (only the TRIGGER_DEV_ENABLED=false
   fallback uses it). skipScheduler flag dropped from UpdateRowData /
   BatchUpdateByIdData — no longer meaningful since there's nothing implicit
   to suppress.

Plumbed isManualRun through the dispatch row (new is_manual_run column,
default true) so auto-fire callers honor autoRun: false and don't re-run
completed cells.

Stamp 'pending' (not 'queued', executionId: null) before
batchTriggerAndWait — cell-task writes its own 'queued' on lock acquire.

Small UI polish: row gutter Play button spacing, "Delete workflow" →
"Delete column" label, optimistic-pending cells now show Stop button
(isExecInFlight no longer requires jobId).

* fix(table): SQL cancellation guard allows worker to claim a null-execId cell

The dispatcher's pre-batch `pending` stamp leaves executionId unset so any
cell-task that wins the cascade lock can claim the cell. The cancellation-
guard SQL clause was rejecting these claims because it tested
`executions->gid IS NULL` (whole exec missing) but the pre-stamp leaves
the exec present with executionId=null.

Add a third carve-out: `executions->gid->>'executionId' IS NULL`. Now the
guard reads "write allowed if no exec exists, OR no executionId is set
yet, OR the executionId matches ours."

Symptom: every cell-task's first markWorkflowGroupPickedUp call would log
"SQL guard saw cancelled" and skip, leaving cells stuck at the dispatcher's
pending stamp.

* fix(table): dispatcher cursor starts at -1 so position 0 is included

The dispatcher's row-window SELECT is `position > cursor` for exclusive
lower-bound semantics. With cursor initialized to 0, position-0 rows were
never picked up — every dispatch silently skipped the table's first row.

Start cursor at -1 instead. First window's filter `position > -1` matches
position 0; subsequent iterations advance to `lastPosition` which then
correctly excludes already-processed rows.

* refactor(table): align optimistic UI with new dispatcher; sticky cancel via 'new' mode

Fix 0: new `DispatchMode = 'new'` for auto-fire callsites. Eligibility skips
rows with any prior `executions[gid]` entry — cancelled / errored / completed
cells stay sticky until a manual run. Dispatcher's windowed SELECT pushes
`NOT jsonb_exists_any(...)` to SQL so CSV imports into mostly-attempted
tables don't pay a per-window load+JS-filter. `batchInsertRows` drops its
`rowIds` payload (keeps dispatch scope tiny on big imports).

Fix A/B/D: client optimistic patches now mirror the backend's actual
invariants. `useCreateTableRow.onSuccess` stamps eligible groups via
`optimisticallyScheduleNewlyEligibleGroups` so newly-inserted rows show
`Queued` instantly. `useCancelTableRuns.onMutate` distinguishes optimistic-
only pending (`executionId == null` — strip silently) from real worker
claims (stamp cancelled; SSE will reconcile). Drop `onSettled` invalidation
on `useUpdateTableRow` / `useBatchUpdateTableRows` to kill the
delete-cell flicker.

Fix C: active-dispatches overlay. New `listActiveDispatches` helper,
contract, and `GET /api/table/[tableId]/dispatches` route. `kind:'dispatch'`
SSE events carry scope+cursor+mode on every transition. New
`useActiveDispatches` hook + `resolveCellExec` synthesize a virtual
`pending` exec for cells in an active dispatch's scope ahead of cursor —
queued indicators now survive page refresh during long Run-all dispatches.
`cancelWorkflowGroupRuns` emits `kind:'dispatch',status:'cancelled'`
events so the overlay clears without a refetch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(table): unify trigger.dev and inline dispatcher paths

`runWorkflowColumn` now always inserts a `table_run_dispatches` row and
drives the dispatcher state machine. The trigger.dev / in-process branch
narrows to a single line: trigger.dev fires `tableRunDispatcherTask` (which
calls the new `runDispatcherToCompletion`), the inline path calls the same
helper fire-and-forget. Deletes `scheduleRunsForRows` and
`stampQueuedOrCancel` — the inline-fallback no longer duplicates window
walking, SSE emission, or cancel.

The dispatcher's window-execute call goes through `JobQueueBackend`:
- New `batchEnqueueAndWait` interface method.
- Trigger.dev impl wraps `tasks.batchTriggerAndWait` behind a
  `taskContext.isInsideTask` guard (clear error if called from outside a
  task).
- Database impl skips `async_jobs` entirely — `Promise.all` over
  `options.runner(payload, signal)` per item, with per-cell AbortControllers
  tracked by `cancelKey` for cancel.

`cancelInlineRun` moves to the interface as `cancelByKey` so
`cancelWorkflowGroupRuns` no longer reaches into the database backend.

Fix `mode: 'new'` SQL filter:
- `${array}::text[]` interpolated as a tuple-cast which Postgres rejected
  ("cannot cast type record to text[]") and every inline dispatch silently
  failed. Switched to `ARRAY[${sql.join(...)}]::text[]`.
- Predicate was `jsonb_exists_any` ("any one targeted group present"),
  which excluded rows that needed at least one group re-run after a
  downstream output was deleted. Switched to `jsonb_exists_all` — per-group
  JS eligibility handles the rest.

Cascade-loop workflowId bug: `runRowCascadeLoop` was not threading the new
group's `workflowId` when advancing across groups. The cell-task ran the
previous group's workflow against the next group's cell, terminating
`completed` with empty `accumulatedData`. Fixed by tracking
`currentWorkflowId` alongside `currentGroupId` / `currentExecutionId`.

Client optimistic-patch tightening:
- `useRunColumn.onMutate` mirrors server eligibility — skip cells with
  unmet deps so unmet rows don't flash Queued and get stuck (no SSE will
  arrive for cells the server skipped).
- `resolveCellExec` overlay synthesizes a virtual `pending` only when
  `areGroupDepsSatisfied` is true. Rows with unmet deps render Waiting,
  matching the dispatcher's actual behavior.

Cleanup from /simplify pass:
- Use `generateShortId(20)` instead of
  `generateId().replace(/-/g, '').slice(0, 20)`.
- Inline `batchEnqueueAndWait` no longer allocates synthetic ids
  (returned `string[]` is unused).
- Flattened the per-cell `tracked` array — only push entries that
  registered controllers, drop the null placeholders.
- Extracted `runDispatcherToCompletion` to share the loop between the
  trigger.dev wrapper and the in-process path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(table): backend running counter, dep-aware retrigger, sidebar polish

Counter (Fix 1): top-right "X running" + per-row badge are now
backend-bootstrapped via a count on `user_table_rows.executions ->> 'status'
= 'running'` returned alongside active dispatches. SSE `kind: 'cell'` events
compute a delta from `prev → next` status to keep the cache live; cell
events for rows outside the loaded page slice trigger a run-state refetch.
On `pruned` we invalidate the cache. Counts only worker-claimed `running`
cells — optimistic queued/pending no longer inflate the badge, and rows
outside the loaded page slice are counted too.

Sidebar (Fix 2 + 3a): `Run after` no longer ticks every column by default
for new groups (empty list). Save is disabled with an inline error when
auto-run is on with zero deps. `edit-group` mode anchors the left-of-current
filter to the group's leftmost column, so a workflow can only depend on
columns to its left.

Reorder scrub (Fix 3b): `updateTableMetadata` walks the schema's workflow
groups when `columnOrder` is in the patch and drops any dep whose new
position lands at or after the group's leftmost column (uses the existing
`stripGroupDeps` helper). Metadata + schema updates land atomically.

Server returns ordered columns (Fix 3b cont'd): `getTableById` /
`listTables` now sort `schema.columns` by `metadata.columnOrder` before
returning, via a new `applyColumnOrderToSchema` helper. Every consumer
(grid, sidebar, copilot, mothership) gets one ordered list — the sidebar's
leftmost-group-column anchor now points at the right index.

Dep-aware retrigger (Fix 4): editing a value that a downstream workflow
depends on now re-runs that workflow.
- `deriveExecClearsForDataPatch` returns
  `{ executionsPatch, inFlightDownstreamGroups }`. Walks
  `schema.workflowGroups[].dependencies.columns` for every column in the
  patch, clears terminal-state downstream entries, and reports in-flight
  entries.
- `updateRow` calls `cancelWorkflowGroupRuns` + `runWorkflowColumn`
  (`mode: 'incomplete' + isManualRun: true`) for in-flight downstream
  groups, then always fires `runWorkflowColumn({ mode: 'new' })` for the
  cleared groups. Skips both when `executionsPatch` is provided by the
  caller — those are cell-task / cancel writes that would otherwise spawn
  a recursive flood of dispatches per partial-write.
- `cancelWorkflowGroupRuns(tableId, rowId, { groupIds? })` accepts a
  per-group filter so the cancel only touches the affected groups, not
  every in-flight cell on the row.
- `pickNextEligibleGroupForRow` now treats a dispatcher pre-stamp
  (`pending` + `executionId: null`) as claimable — the cascade-loop is the
  real owner. Without this, the dispatcher's pre-stamp of downstream
  groups made the cascade-loop see them as "in-flight" and skip them,
  stranding `pending` cells forever.
- `optimisticallyScheduleNewlyEligibleGroups` extends the cache patch to
  flip dep-touched groups to `pending` regardless of their current status,
  matching the server's cancel-then-rerun behavior.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(table): paused workflow cells route through executeResumeJob; render Pending + viewable

Three connected issues with workflows that pause mid-cell (e.g. wait blocks):

1. `/api/resume/poll` (the time-pause auto-resumer) called
   `PauseResumeManager.startResumeExecution` directly, bypassing
   `executeResumeJob` from `background/resume-execution.ts`. The wrapper is
   where the cell-context restoration + cascade-loop continuation lives —
   without it, the resumed workflow ran to completion but never wrote the
   terminal state back to the table cell. Cell stays `pending` forever
   even though the underlying execution finished.

   Fix: dynamically import `executeResumeJob` and use it for the
   `'starting'` branch. Same primitive the trigger.dev `resumeExecutionTask`
   wraps — calling it directly handles both trigger.dev-disabled local dev
   and trigger.dev-enabled prod identically.

2. The cell renderer mapped `status: 'pending'` to `kind: 'queued'` (gray
   "Queued" badge) regardless of whether the run had started. A HITL-paused
   run has `status: 'pending'` + `jobId` prefixed `paused-` + a real
   `executionId` — semantically very different from "queued, hasn't run."
   Now renders as `pending-upstream` (the existing Pending pill) for
   paused-jobId rows.

3. Right-click "View execution" was disabled for `pending` cells (gated to
   `completed | error | running`), so users couldn't open the trace for a
   paused execution. Paused runs have a viewable trace (the executionId is
   real and the log row exists). Both the per-row context menu and the
   action-bar derivation now recognize `pending` + `paused-` jobId as a
   started run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(table): typewriter reveal for SSE-driven workflow cell values

Workflow-output cells now reveal their text character-by-character when an
SSE update lands, while page reloads and virtualization remounts still paint
the value instantly. A first-render guard inside the new useTypewriter hook
distinguishes hydration from live updates with no plumbing through the cell
tree.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(table): address bugbot/greptile review feedback

Two P1 issues + one cleanup from the bot reviewers:

1. **Double-dispatch + completed-output wipe.** Both PATCH row routes
   (`app/api/table/[tableId]/rows/[rowId]` and
   `app/api/v1/tables/[tableId]/rows/[rowId]`) were firing a second
   `runWorkflowColumn({ mode: 'incomplete' })` after `updateRow` returns.
   `updateRow` already fires `mode: 'new'` internally for user edits, so
   the second call created a concurrent dispatch. Worse, the
   `mode: 'incomplete'` path's `bulkClearWorkflowGroupCells` wipes ALL
   targeted output columns on any row where any one column is empty —
   meaning sibling-group completed outputs could be erased. Removed both
   route-level calls; auto-dispatch lives entirely in `updateRow`.

2. **`runWorkflowColumn` log-spamming on plain tables.**
   `if (targetGroups.length === 0) throw new Error(...)` fired on every
   row insert/update for tables without any workflow groups (the
   majority). Every caller wraps with `.catch(logger.error)`, so each
   PATCH produced an error-level log. Return `{ dispatchId: null }`
   silently — manual `runWorkflowColumn` callers pass `groupIds`
   explicitly so they can't reach this branch.

3. **`isManualRun` plumbed through dispatch SSE events.** Late-arriving
   `kind: 'dispatch'` events for dispatches not in the initial fetch
   were hardcoding `isManualRun: false`. Added the field to the event
   shape, emit it from `dispatcherStep` (pending → complete, dispatching
   transitions) and `markActiveDispatchesCancelled`, and consume it in
   the SSE handler with a sensible fallback for legacy emits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(table): row executions sidecar + left-to-right dep retrigger + cancel counter refresh

Split per-row workflow-group execution state out of the user_table_rows.executions
JSONB column into a new table_row_executions sidecar keyed by (row_id, group_id).
Dispatcher filters, "X running" counter, bulk clears, and the cancellation guard
all hit indexed columns instead of walking JSONB. Wire shape unchanged — server
merges sidecar rows back into row.executions on the way out.

Also:
- deriveExecClearsForDataPatch now walks workflowGroups left-to-right with a
  propagating dirtied-column set so transitive dep chains (edit col A → group 1
  re-runs → group 2 depends on group 1's output → group 2 re-runs) collapse to
  a single forward pass.
- useCancelTableRuns.onSettled invalidates the activeDispatches query so the
  top-right counter and row gutter Stop button refetch from the server after
  any Stop (per-cell, row, or table-wide). countRunningCells is the source of
  truth; client no longer needs duplicate state.

Three migrations on this branch (0209 + 0210 + new sidecar) collapsed into one
since the feature is unreleased.

* fix(table): address remaining cursor/greptile review feedback

- Mothership update_row no longer double-dispatches. updateRow already fires
  the auto-cascade internally; the second `mode: 'incomplete'` call here
  raced with it and could bulk-clear sibling-group outputs.
- SSE dispatch events no longer dropped when the activeDispatches cache is
  cold. Seed an empty TableRunState if the initial fetch hasn't landed yet
  so the queued overlay doesn't lose the first dispatch event.
- batchUpdateRows now runs cancel+rerun for per-row in-flight downstream
  groups, mirroring updateRow. Without this, dep edits in a batch left
  running workflows reading stale upstream values.

* fix(table): cancel prior runs, scope batch insert dispatch, recover orphan pre-stamps

Addresses cursor + greptile review feedback on table dispatcher edge cases:

- Manual table-wide Run-all / Run-column now cancels prior active dispatches
  AND in-flight cell workers before bulk-clearing. Without this, mode:'all'
  deleted running sidecar rows out from under their workers (which kept
  writing into the wiped state) and a second Run-all could enqueue overlapping
  cells racing on the same rows. Row-scoped manual calls (dep-edit cascade)
  are excluded — those already cancel their own scope.
- batchInsertRowsWithTx now scopes its auto-dispatch to the newly-inserted
  row ids. Without this, after the sidecar migration the NOT EXISTS filter
  matches every existing row (zero sidecar entries), so a CSV import would
  walk the entire table dispatching workflow runs on every pre-existing row.
- classifyEligibility carve-out: pending + executionId=null is an orphan
  pre-stamp (cascade-lock contention, batchEnqueueAndWait failure, etc.),
  treated as claimable so future dispatchers can re-stamp instead of skipping
  it as 'in-flight' forever. Matches pickNextEligibleGroupForRow's logic.
- On batchEnqueueAndWait failure, dispatcherStep now sweeps the orphan
  pre-stamps it wrote for the failed batch so the cells don't render Queued
  forever; the next user action picks them up cleanly.

* fix(table): row-scoped Refresh cancels in-flight; counter includes queued/pending

- runWorkflowColumn now cancels prior in-flight cells for row-scoped manual
  runs too (context-menu Refresh on a row subset, action-bar Refresh on
  selected rows). Previously only the table-wide path cancelled, so a
  row-scoped Refresh would bulk-clear running sidecar rows without aborting
  workers. Per-row cancel skips markActiveDispatchesCancelled so unrelated
  dispatches keep running.
- countRunningCells now counts all in-flight statuses (queued / running /
  pending) instead of just running. The row gutter Run/Stop button reads
  this map — with the old behavior, clicking Play during the queued window
  would re-enqueue an already-queued cell. SSE applyCell handler updated
  to use isExecInFlight so client deltas track the same semantics.

* fix(table): per-row Stop tombstones ahead-of-cursor rows during Run-all

Per-row Stop only cancelled sidecar rows already in flight. A row the
dispatcher hadn't reached yet had no exec record, so Stop was a no-op there
— the dispatcher would later walk to it, classify the group eligible, and
re-fire workflows the user thought they stopped.

cancelWorkflowGroupRuns now, for a per-row cancel, checks active dispatches
whose scope covers the row and writes `cancelled` tombstones (cancelledAt =
now) for the at-risk groups that don't already have a sidecar entry. The
dispatcher's existing `cancelledAt > dispatch.requestedAt` filter then skips
them when the cursor arrives. onConflictDoNothing guards against clobbering
a concurrently-written entry; the active-dispatch check avoids stamping
spurious cancels on idle rows.

* fix(table): seed dispatch overlay on Run; surface batch-enqueue failures as error

- useRunColumn.onSuccess invalidates the activeDispatches query so the
  resolveCellExec queued overlay populates immediately for ahead-of-cursor
  rows (scrolled-in / refetched), instead of waiting for the first dispatch
  SSE. Targeted at activeDispatches only — the rows cache stays owned by
  useTableEventStream.
- On batchEnqueueAndWait failure, dispatcherStep now flips the orphan
  pre-stamps to a terminal `error` state and emits a cell SSE event, rather
  than deleting them. The cursor still advances past the window, but the
  dropped cells are now visible (Error pill) instead of silently empty, stay
  out of the in-flight set, and re-run on the next manual run.

* fix(table): seed dispatch overlay on Run; surface batch-enqueue failures as error

- useRunColumn.onSuccess invalidates activeDispatches so the resolveCellExec
  queued overlay populates immediately for ahead-of-cursor rows instead of
  waiting for the first dispatch SSE. Rows cache stays owned by SSE.
- On batchEnqueueAndWait failure, dispatcherStep flips orphan pre-stamps to a
  terminal error state (+ cell SSE) instead of deleting them, so the dropped
  window is visible (Error pill) rather than silently empty and re-runs on the
  next manual run.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(cors): re-enable credentials on embed CORS policy

Chat and form embeds authenticate via the chat_auth_<id> / form auth
cookie set by setDeploymentAuthCookie. The previous PR set
Access-Control-Allow-Credentials: false on these routes, which made the
browser drop the auth cookie and produce 401s on subsequent embed calls
after login. Restore credentials: true (matching pre-consolidation
behavior) while keeping reflected origin and Vary: Origin.

The wildcard fallback when Origin is absent now also drops credentials
to stay CORS-spec-compliant.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore(cors): trim verbose comments in proxy

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore(cors): restore concise TSDoc on proxy CORS helpers

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
* feat(mcp): OAuth 2.1 support for outbound MCP servers

* fix(mcp): tighten OAuth refresh race and session-error detection

Re-load the OAuth row inside withMcpOauthRefreshLock so concurrent
callers observe predecessor-written tokens instead of a stale snapshot
loaded before lock acquisition. Without this, the second caller's
provider held a rotated-out refresh token and the SDK tripped
invalid_grant, forcing reauthorization.

Switch isSessionError to match the SDK's typed StreamableHTTPError
(code 404/400) instead of substring-checking arbitrary error messages,
removing false positives on URLs that happen to contain those digits.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* refactor(mcp): tighten OAuth callback contract and registration metadata

- Validate callback query params via mcpOauthCallbackContract instead of
  raw searchParams.get, matching the rest of the MCP route surface.
- Drop non-RFC-7591 application_type field from dynamic client registration
  to avoid rejection by strict authorization servers.
- Collapse the pre-lock OAuth row load in createClient — the row is now
  loaded exclusively inside withMcpOauthRefreshLock, removing a redundant
  query and a stale-snapshot path.

* fix(mcp): narrow workspaceId before async closure in OAuth createClient

* fix(mcp): return authType from create-server endpoint

The POST /api/mcp/servers handler omitted authType from the success
response, so useCreateMcpServer always saw data.data.authType as
undefined and never triggered the OAuth popup after creating an
OAuth-protected server. Thread authType through performCreateMcpServer
into the response so the client can decide whether to auto-start OAuth.

* fix(mcp): mirror server null normalization in optimistic oauthClientId update

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(mcp): revert optimistic oauthClientId to undefined to match McpServer type

The response contract preprocesses null → undefined, so McpServer.oauthClientId
is string | undefined. Using null broke type checking.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(mcp): tighten OAuth probe signal and clear stale popup interval

- probe: only classify as OAuth on resource_metadata or scope params.
  Bare `Bearer error="invalid_token"` is generic and used by API-key servers,
  so it must not auto-flip the auth type to OAuth.
- popup hook: clear any existing close-watcher interval before overwriting
  when startOauthForServer is invoked twice for the same serverId.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(mcp): normalize empty-string oauthClientId at route boundary

Orchestration already converts falsy → null via `|| null` (server-lifecycle.ts),
so the DB was never receiving an empty string. Tightening the route layer to
match the same convention keeps the boundary contract consistent and avoids
relying on downstream normalization.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(canvas): expand MCP tool params into per-row labels on block tile

The MCP Tool block on the workflow canvas previously crammed every selected-
tool parameter into a stringified blob under the `Tool` row. Now, when a tool
is selected, the tile reads the cached `_toolSchema` and emits one labeled
SubBlockRow per parameter (matching the Exa block's per-param layout). Labels
reuse `formatParameterLabel` for parity with the editor panel; values pass
through the existing `getDisplayValue` so booleans/numbers/arrays render
identically to other blocks. Deterministic tile height counts expanded rows
so the tile sizes correctly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(logs): show MCP icon and strip prefix in trace tool spans

Tool spans for MCP calls were rendering the raw id (e.g.
`mcp-f908f259-planetscale_list_organizations`) with the default blank-
square icon. Now they read just the tool name and render the MCP block's
icon and bgColor, matching how workflow-execute tools render.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(logs): lift near-black trace icon backgrounds for dark-mode contrast

Block bgColors below a small luminance threshold (e.g. the MCP block's
#181C1E) rendered nearly invisible against the dark-mode surface
(--bg: #1b1b1b). Adds a tiny adjustBgForContrast helper that floors each
RGB channel at 0x33 only when luminance is below 30,000, leaving every
branded color above that band untouched. Applied to both the trace tree
row and the detail pane.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(logs): fall back to neutral gray for near-black trace icon bgs

#333333 was still too close to the dark-mode surface to read. For bgs
below the luminance threshold (e.g. the MCP block's #181C1E) we now fall
back to DEFAULT_BLOCK_COLOR (#6b7280) — the same neutral the renderer
uses for blocks with no distinct identity. Clearly visible in both
themes; brighter brand colors still pass through.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore(db): drop 0209_mcp_oauth migration ahead of staging merge

Staging shipped 0209_smiling_fixer; the MCP OAuth migration will be
regenerated on top of staging as 0210.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore(db): regenerate MCP OAuth migration as 0210

Re-runs drizzle-kit generate on top of staging's 0209_smiling_fixer.
Same schema (mcp_server_oauth table + mcp_servers.auth_type / oauth_*
columns) as the dropped 0209_mcp_oauth.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore(audit): bump route baseline 748 → 749 after staging merge

The post-merge route count is 749 (this branch's OAuth start/callback
plus staging's new route). I had set the baseline to 748 in the merge
conflict resolution — bumping to match reality so the strict audit
passes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore: remove source-command skill files committed by accident

These were untracked-then-accidentally-staged in 05c4bc1 via a wide
`git add -A`. They aren't part of this PR's scope.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
#4675)

* fix(table): cut dispatcher cold-start by lazy-loading heavy import chains

The trigger.dev table-run-dispatcher spent ~6s in module init before its
first batchTriggerAndWait — it imports lib/table/service for getTableById,
which eagerly imported lib/table/trigger → @/lib/webhooks/processor →
webhook-execution + executor, dragging the entire workflow-execution stack
into the dispatcher container even though it never fires a trigger.

- trigger.ts lazy-imports the webhook processor + polling utils inside
  fireTableTrigger (the only consumer), so importing service no longer pulls
  the executor.
- buildEnqueueItems only imports the cell job (for the inline `runner`) on the
  database backend; the trigger.dev backend triggers by task id and ignores
  runner.

* fix(table): run counter + gutter Stop update instantly on Run

The "X running" badge, per-row gutter Stop button, and runningByRowId map
stayed at zero after clicking Run until a manual refetch. useRunColumn
optimistically stamped cells pending in the rows cache but never bumped the
activeDispatches counter — so when the dispatcher's real pending SSE arrived,
applyCell saw the cell was already in-flight (wasInFlight === isInFlight) and
skipped the counter delta. The optimistic stamp ate the transition.

- onMutate now bumps runningCellCount / runningByRowId by the cells it stamps,
  snapshotting prior run-state for rollback on error.
- onSuccess seeds the dispatch into the overlay list from the response instead
  of invalidating activeDispatches (a refetch would reset the optimistic
  counter to the server's still-zero count before the dispatcher stamps).

* fix(table): drive cell typewriter with rAF so concurrent reveals stay smooth

The character-by-character reveal used a per-cell setInterval. When many cells
reveal at once (a Run-all completing in waves), the independent interval
callbacks fire at uncoordinated times and each forces its own render +
layout/paint — O(cells) reflows over an un-virtualized grid, so it degrades as
more cells fill. Switch to requestAnimationFrame: all cells' callbacks run
before one paint, so React batches them into a single render + paint per frame
regardless of cell count. Reveal length is derived from elapsed time, so a
dropped frame catches up instead of slowing the animation.

* fix(table): roll back optimistic run counter when no dispatch is created

useRunColumn.onSuccess returned early on a null dispatchId (no matching
groups / eligible rows) without undoing the onMutate counter bump — and no
SSE would arrive to correct it, leaving the counter permanently inflated.
Restore the pre-mutation run-state on that path, mirroring onError.

* chore(table): tighten inline comments on dispatcher cold-start fixes
* fix(landing-nav): scroll to top on route change in shared shells

* fix(landing-nav): hoist popstate flag to module scope and skip on hash anchors

* fix(landing-nav): use popstate timestamp window so flag self-expires

* fix(landing-nav): use -Infinity sentinel and consume popstate timestamp on use

* fix(landing-nav): skip scroll on initial mount so reload restoration wins

* fix(landing-nav): gate initial-mount skip on document.readyState

* refactor(landing-nav): simplify scroll-to-top to conventional pattern
…undtrips (#4680)

* improvement(knowledge): batch trigger dispatch, prune redundant DB roundtrips

Connector sync was dispatching Trigger.dev document-processing jobs one
HTTP roundtrip at a time. processDocumentsWithQueue now uses
tasks.batchTrigger when Trigger.dev is available, collapsing N roundtrips
to ceil(N/1000). Idempotency keys protect against duplicate runs on retry.

Also trims DB roundtrips inside the sync loop:
- Per-batch isConnectorDeleted + isKnowledgeBaseDeleted collapsed into a
  single checkSyncLiveness JOIN (one SELECT instead of two per batch).
- Dropped redundant pre-upload isKnowledgeBaseDeleted checks from
  addDocument/updateDocument: the batch-boundary liveness check already
  catches pre-batch deletions and the in-tx FOR UPDATE is authoritative
  for races during the batch.
- Removed dead processDocumentsWithTrigger helper (never called).

* refactor(knowledge): split dispatch helpers, drop dead trigger branch

- Use the canonical DocumentProcessingPayload from the task module instead
  of the duplicate DocumentJobData interface in service.ts
- Pass typeof processDocumentTask as a generic to tasks.batchTrigger so the
  payload shape is type-checked against the task definition
- Inline TRIGGER_BATCH_SIZE provenance (Trigger.dev SDK 4.3.1+ doc'd cap,
  we're on 4.4.3)
- Split direct vs trigger dispatch into dispatchInProcess and
  dispatchViaBatchTrigger; collapse the all-failed throw into a single
  check on the combined dispatched counter
- Remove dispatchDocumentProcessingJob — its trigger branch is no longer
  reachable now that batchTrigger handles the trigger path, and the direct
  branch is inlined

* improvement(knowledge): log Trigger.dev batchIds for audit trail

tasks.batchTrigger returns a batchId per call. Collecting and logging
them after dispatch makes it possible to look up or cancel batches in
the Trigger.dev dashboard when investigating stuck or missing documents.

* improvement(knowledge): thread requestId through direct dispatch logs

Symmetry polish: dispatchInProcess now includes [requestId] in its error
log so direct-mode failures are correlatable the same way trigger-mode
failures already are.

* improvement(knowledge): trim verbose comments

Tightens TSDoc on processDocumentsWithQueue, TRIGGER_BATCH_SIZE,
checkSyncLiveness, and the idempotency-key inline comment.
…nd (#4679)

* improvement(elevenlabs): wire stability and similarity_boost end-to-end

* fix(elevenlabs): guard NaN in voice settings and always send both knobs together
#4678)

* feat(google-slides): complete API surface for branded slide generation

* fix(google-slides): address PR review — explicit videoId mapping, fast base64 export, remove dead utility

* fix(google-slides): declare z-order operation output in block outputs map
…ad (#4681)

* improvement(knowledge): eliminate N+1 on tag definitions in bulk upload

createDocumentRecords previously called processDocumentTags per-doc, each
running a SELECT against knowledge_base_tag_definitions — N queries that
all returned the same kbId-scoped rows. Worse, those reads used the
global db pool while the tx held a FOR UPDATE lock on the KB row, risking
pool contention on large bulk uploads.

Split the helper into loadTagDefinitions (single query, accepts the tx as
executor) and resolveDocumentTags (pure, takes the pre-loaded Map). The
bulk path loads once inside the transaction; createSingleDocument loads
once outside its tx. Same throw-on-validation-error semantics preserved.

* improvement(knowledge): fold processDocumentAsync prefetches into one JOIN

processDocumentAsync was issuing three separate SELECTs per processed
document: knowledge_base (config), workspace (billing settings), and
document (tag values). For a typical Trigger.dev fleet processing
thousands of docs, that's thousands of redundant pool checkouts.

Collapsed into a single JOIN at the top of processDocumentAsync that
fetches kb config + billed account user + document tag values in one
roundtrip. The post-embedding tag SELECT (which previously held tags
through the full embedding-generation wait) is gone; tags from the
initial prefetch are reused.

Behavior:
- Missing/archived/deleted document or KB → same 'failed' status outcome
  as before, single consolidated error message.
- Missing billed account → preserves existing error.
- All 208 KB tests pass (test mock extended for innerJoin/leftJoin).

* improvement(knowledge): skip tag-definitions load when no doc carries tags

Trim verbose comments in the same pass.

* lint
)

Swap the default Open Graph and Twitter card image from the purple "sim"
lettermark on purple to the brand wordmark (green icon + dark "sim" text)
centered on a white background. Pulled directly from the sidebar's
wordmark-dark.svg so the asset stays in sync with the in-app brand.

- New asset at /logo/426-240/reverse/small.png (2130x1200, matches
  declared OG dimensions)
- Default branded metadata + landing-page-specific overrides both updated;
  all sub-pages that inherit the default pick up the new image
  automatically

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…d cells (#4682)

The "X running" badge + per-row gutter Stop only updated on manual Run
(useRunColumn bumped the run-state counter). Edit-triggered auto-runs
(useUpdateTableRow, useBatchUpdateTableRows, useCreateTableRow) stamped cells
pending in the rows cache but never bumped runningCellCount/runningByRowId, so
Stop stayed hidden even though cells were queued (the counter is already
queued-inclusive). Extracted countNewlyInFlight + bumpRunState helpers and
wired them into all the optimistic auto-fire paths with onError rollback;
reused them in useRunColumn.
…ma (#4686)

The values.schema.json constrained global.imageRegistry to JSON Schema
hostname format (RFC 1123), which forbids '/'. That rejected the
host+path form required by Artifactory virtual repos, Harbor projects,
GCR (gcr.io/project-id), and ECR-with-namespace — all of which the
chart's image-rendering helper already supports (it prints '%s/%s:%s').

Drop the format constraint and document the supported shapes. Matches
the bitnami common-chart convention of validating image registry as a
plain string and deferring to Docker for the actual reference parse.
…running count (#4687)

* fix(table): no typewriter flash; Run-row skips completed workflows

- Typewriter: reset the revealed text synchronously during render when the
  value changes (not in an effect), so a cell going from running→value no
  longer flashes the full text for one frame before animating.
- Run row / manual incomplete runs now treat a `completed` group as done even
  if an output column is blank — only "Run all" re-runs completed cells. The
  auto cascade keeps re-filling blank outputs (completedAndFilled). Client
  optimistic stamp mirrors: incomplete skips `completed` cells.

* fix(table): incomplete bulk-clear is per-group, not per-row

bulkClearWorkflowGroupCells in incomplete mode wiped EVERY targeted group's
output data + exec on any row that wasn't fully filled across all targeted
groups. So Run-row on a row with one completed group and one cancelled group
wiped the completed group's outputs + exec too, and the dispatcher re-ran it.

Now incomplete-mode clears per-group: only error/cancelled groups get their
output columns + exec cleared; completed and in-flight groups are left intact
(never-run groups have nothing to clear and run via eligibility). Combined
with the classifyEligibility guard, a completed workflow is never re-run by
Run-row — only Run-all re-runs it.

* fix(table): X-running count from dispatch scope so reload matches live

The "X running" badge read countRunningCells (sidecar in-flight), but the
dispatcher only stamps one ~20-cell window at a time. During a 1000-row
Run-all the client optimistically showed ~1000 while a reload showed ~20 —
the sidecar never holds more than a window.

Derive the count from the active dispatches instead: rows in scope ahead of
the cursor × |groupIds| (exact for Run-all, upper bound for incomplete/new).
Both scope and cursor are persisted, so a reload computes the same number.

- countActiveRunCells (dispatcher.ts): dispatch-scope total, sidecar fallback
  when no dispatch is active. byRowId stays sidecar-based (the client overlay
  renders queued rows ahead of the cursor).
- Live: applyDispatch re-syncs the badge from the server on every dispatch
  event (one per window, after its cells finish + cursor advances), so the
  badge steps down per window and matches reload. applyCell no longer touches
  runningCellCount (still keeps runningByRowId live for the gutter).
- Optimistic on click: useRunColumn seeds the full run scope (totalCount ×
  groups) so the badge is right before the first window lands.

* fix(table): address review — parallel dispatch counts, unfiltered rowCount

- countActiveRunCells: run the per-dispatch COUNT queries + the sidecar count
  in parallel instead of serially (one round-trip per dispatch).
- Optimistic Run-all estimate now reads the table definition's maintained,
  unfiltered rowCount (detail cache) instead of the rows query's filter-scoped
  totalCount — the dispatcher runs every row regardless of the active filter.
* fix(mcp): probe-based OAuth detection in test-connection

* fix(mcp): guard undefined url in probe and use canonical McpAuthType
…S3, faster outlier drain (#4688)

* improvement(cleanup): batchTrigger fan-out, chunked queries, batched S3, faster outlier drain

- Fan cleanup-tasks/logs/soft-deletes out via tasks.batchTrigger (500 ws/chunk); bump to large-1x with concurrencyLimit: 5
- Chunk bulk DELETEs (1000 IDs/stmt) and collectChatFiles JSONB SELECT (500 chats/stmt) to bound worker memory and lock duration
- Replace per-key position() table scans with one LATERAL unnest scan per 200-key chunk
- Route storage deletes through StorageService.deleteFiles (S3 DeleteObjects: 1000 keys/HTTP)
- Raise per-run row cap to 100K so long-tail tenants (one prod workspace has 723K doomed rows) drain in days, not weeks

* improvement(cleanup): chunk-index labels, clarify upper-bound failure counter

Addresses Greptile review feedback:
- Disambiguate downstream logs when a plan splits into multiple workspace chunks (e.g. 'free/1', 'free/2')
- Document that deleteRowsById's failed counter is an upper bound (chunk rolls back to 0 deletes on error)
@vercel
Copy link
Copy Markdown

vercel Bot commented May 21, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped May 21, 2026 6:31am

Request Review

@cursor
Copy link
Copy Markdown

cursor Bot commented May 21, 2026

PR Summary

High Risk
Adds new MCP OAuth start/callback endpoints and changes MCP server CRUD/tool execution responses, which are security- and integration-critical. Also refactors table run dispatch/resume behavior and background cleanup tasks, increasing risk of regressions in workflow execution and operational jobs.

Overview
Adds OAuth 2.1/PKCE support for outbound MCP servers. Introduces /api/mcp/oauth/start and /api/mcp/oauth/callback to run the auth flow, validate state/user/workspace, burn state before token exchange, and notify the opener window; MCP server create/update/list now accept OAuth fields while redacting oauthClientSecret and returning hasOauthClientSecret.

Improves MCP UX and error handling. Connection testing now detects OAuth challenges and returns authRequired/authType, settings/tool UI can launch an OAuth popup, and tool discover/execute routes map OAuth/unauthorized errors to 401 (tool execute returns a dedicated reauth_required payload).

Table run/dispatch reliability upgrades. Adds GET /api/table/[tableId]/dispatches plus client-side run-state tracking to preserve queued indicators across refresh, updates SSE handling to process both cell + dispatch events and keep per-row running counts live, and adds a typewriter reveal for newly-updated cell values.

Workflow execution and maintenance changes. Resume polling now routes through executeResumeJob and resume-execution adds cascade-locking plus post-resume cascade continuation; table row querying now rebuilds legacy row.executions from a table_row_executions sidecar table; cleanup tasks are reworked to use batched/chunked deletes and storage operations, with trigger.dev task sizing/queue limits adjusted.

Reviewed by Cursor Bugbot for commit 11ad891. Configure here.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 11ad891. Configure here.

* fix(mcp): surface authorization-server error instead of generic toast

* perf(mcp): cache tool discovery per-server with test coverage

* refactor(mcp): use SDK's typed OAuthError subclasses for error surfacing

* test(mcp): unit-test surfaceOauthError typed and fallback paths

* chore(mcp): remove unused __where mock export in service.test.ts

* chore(mcp): drop redundant clearCache TSDoc and stray trailing comment

* fix(mcp): don't leak non-OAuth errors; clearCache covers disabled servers

* chore(mcp): tighten clearCache comment to one block
@waleedlatif1 waleedlatif1 merged commit 97a609a into main May 21, 2026
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants