Skip to content

fix(web): write PROCESSING status to DB before triggering transcription workflow#1832

Open
oaris-dev wants to merge 1 commit into
CapSoftware:mainfrom
oaris-dev:fix/transcription-race-1550
Open

fix(web): write PROCESSING status to DB before triggering transcription workflow#1832
oaris-dev wants to merge 1 commit into
CapSoftware:mainfrom
oaris-dev:fix/transcription-race-1550

Conversation

@oaris-dev
Copy link
Copy Markdown

@oaris-dev oaris-dev commented May 17, 2026

Summary

Fixes #1550. Writes transcriptionStatus = 'PROCESSING' to the DB synchronously after transcribeVideo()'s existing early-return check but before the start(transcribeVideoWorkflow, ...) call. Prevents the share page's 2-second polling loop from re-firing start() concurrently, which races @workflow/world-local's queue → ArrayBuffer detachment → queue crash before any step executes.

Root cause

The /s/<videoId> page polls getVideoStatus() every 2 seconds. Upstream's flow:

  1. Poll 1: video.transcriptionStatus === null → calls transcribeVideo() → calls start(transcribeVideoWorkflow, ...) → workflow dispatched asynchronously
  2. Poll 2 (2s later): video.transcriptionStatus === null still (the workflow's validateVideo step hasn't written PROCESSING yet) → calls transcribeVideo() again → calls start() again
  3. Repeat...

After a few concurrent start() calls, @workflow/world-local's queue races on its internal buffer handling, the underlying ArrayBuffer gets detached, and dispatch crashes with TypeError: Cannot perform ArrayBuffer.prototype.slice on a detached ArrayBuffer before any workflow step runs. The crash happens upstream of the step boundary, which is why affected users see no /audio/extract calls in their media-server logs (per @julianwitzel's investigation in #1550).

The fix

transcribeVideo() already has an early-return guard for transcriptionStatus === 'PROCESSING' (lines 89–99). The problem is that nothing writes PROCESSING synchronously — the workflow's own step writes it, but only after the queue dispatches. If the queue race fires before the step runs, PROCESSING never gets written and the next poll triggers another start().

By writing PROCESSING to the DB inside transcribeVideo() right before the start() call, the next poll's call hits the early-return at line 89 and never re-fires start(). The first dispatch completes normally.

5-line change in apps/web/lib/transcribe.ts.

Verification

  • @julianwitzel verified end-to-end on a self-hosted Docker deployment that was previously stuck in the ArrayBuffer crash loop. Logs after the patch:
    • [transcribe] Probe result: audioCodec=aac, audioChannels=1, sampleRate=48000
    • [transcribe] Extracted audio: 179860 bytes ✓ (first time /audio/extract ever succeeded in their setup)
    • transcriptionStatus → COMPLETE
    • AI summary + chapters generated correctly via Groq ✓
    • No more [local world] Queue operation failed: ArrayBuffer detached loop

Test plan

  • Self-hosted Docker deployment with DEEPGRAM_API_KEY configured — verified by @julianwitzel
  • Upload a fresh video, watch /s/<videoId> page, confirm transcriptionStatus transitions nullPROCESSINGCOMPLETE exactly once (no re-trigger loop)
  • Confirm media-server logs show /video/probe and /audio/extract calls
  • Confirm no ArrayBuffer errors in cap-web logs

Notes

This supersedes #1630 (which proposed bypassing the workflow entirely for transcription/AI generation). The bypass approach was overkill — this 5-line guard is the actual minimal fix.

🤖 Generated with Claude Code

Greptile Summary

This PR fixes a concurrency bug where the share page's 2-second polling loop could fire transcribeVideo() multiple times before the workflow's own first step wrote PROCESSING, causing concurrent start() calls that raced @workflow/world-local's internal buffer and crashed with an ArrayBuffer detachment error. The fix writes transcriptionStatus = 'PROCESSING' to the DB synchronously inside transcribeVideo(), just before the start() call, so subsequent polls hit the existing early-return guard and never re-dispatch.

  • 5-line addition in apps/web/lib/transcribe.ts: persists PROCESSING to the DB before start(transcribeVideoWorkflow, …), and the existing catch block already resets the status back to null if start() throws, preserving error-recovery behavior.
  • A very small TOCTOU window remains (two requests could both pass the guard before either writes PROCESSING), but in practice it is negligible at a 2-second poll interval and far narrower than the pre-fix window that spanned the entire workflow execution.

Confidence Score: 4/5

The change is minimal and well-reasoned; the existing catch block correctly resets status on failure, so the main risk is the narrow TOCTOU gap that is practically negligible at normal polling intervals.

The fix correctly targets the root cause and error recovery is preserved. The only remaining concern is a theoretical race that requires two requests to overlap within a single DB round-trip, which is unlikely at a 2-second polling interval but could surface if the interval is reduced or DB latency spikes.

apps/web/lib/transcribe.ts — the only changed file; review the interaction between the new PROCESSING write and the catch block's null reset if start() fails.

Important Files Changed

Filename Overview
apps/web/lib/transcribe.ts Adds a synchronous DB write of transcriptionStatus = 'PROCESSING' before calling start(transcribeVideoWorkflow, ...), preventing re-entrant workflow dispatches from the share page's 2-second poll. The existing catch block already resets the status to null on failure, so error recovery is handled. A very narrow TOCTOU window remains (two requests reading the video before either writes PROCESSING), but it is practically negligible at a 2-second polling interval.

Comments Outside Diff (1)

  1. apps/web/lib/transcribe.ts, line 118-129 (link)

    P2 Narrow TOCTOU window still exists between guard read and PROCESSING write

    Two concurrent poll requests can both pass the transcriptionStatus === 'PROCESSING' early-return guard (lines 84–93) before either one writes PROCESSING to the DB — the guard uses the value fetched at the top of the function, not a fresh read. The race window is now just the upload-phase query (~one round-trip), versus the entire workflow execution time before this fix, so it's highly unlikely to fire in practice at a 2-second polling interval. Worth noting in case the polling interval is ever reduced or the DB write is slow under load.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: apps/web/lib/transcribe.ts
    Line: 118-129
    
    Comment:
    **Narrow TOCTOU window still exists between guard read and PROCESSING write**
    
    Two concurrent poll requests can both pass the `transcriptionStatus === 'PROCESSING'` early-return guard (lines 84–93) before either one writes PROCESSING to the DB — the guard uses the value fetched at the top of the function, not a fresh read. The race window is now just the upload-phase query (~one round-trip), versus the entire workflow execution time before this fix, so it's highly unlikely to fire in practice at a 2-second polling interval. Worth noting in case the polling interval is ever reduced or the DB write is slow under load.
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
apps/web/lib/transcribe.ts:118-129
**Narrow TOCTOU window still exists between guard read and PROCESSING write**

Two concurrent poll requests can both pass the `transcriptionStatus === 'PROCESSING'` early-return guard (lines 84–93) before either one writes PROCESSING to the DB — the guard uses the value fetched at the top of the function, not a fresh read. The race window is now just the upload-phase query (~one round-trip), versus the entire workflow execution time before this fix, so it's highly unlikely to fire in practice at a 2-second polling interval. Worth noting in case the polling interval is ever reduced or the DB write is slow under load.

Reviews (1): Last reviewed commit: "fix(web): write PROCESSING status to DB ..." | Re-trigger Greptile

…on workflow

Prevents the share page polling loop (`/s/<videoId>` every 2s) from
re-firing `start(transcribeVideoWorkflow, ...)` concurrently. Without
this write, every poll passes the `!video.transcriptionStatus` check
and queues another workflow dispatch. After a few concurrent enqueues,
`@workflow/world-local`'s queue races on its internal buffers and
detaches the underlying ArrayBuffer, crashing dispatch before any
workflow step executes.

By writing PROCESSING to the DB synchronously *after* the early-return
check in `transcribeVideo` but *before* the `start()` call, the next
poll's call to `transcribeVideo` hits the early-return and never re-fires
`start()`. The first dispatch completes normally.

Fixes CapSoftware#1550.
Verified end-to-end by @julianwitzel on a self-hosted Docker deployment
that was previously stuck in the ArrayBuffer crash loop.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 17, 2026 11:34
@superagent-security superagent-security Bot added contributor:verified Contributor passed trust analysis. pr:verified PR passed security analysis. labels May 17, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to prevent duplicate transcription workflow dispatches from the share page’s 2-second polling loop by persisting transcriptionStatus = "PROCESSING" to the DB immediately before calling start(transcribeVideoWorkflow, ...) in transcribeVideo().

Changes:

  • Writes transcriptionStatus: "PROCESSING" synchronously in transcribeVideo() right before triggering the transcription workflow.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +118 to +122
await db()
.update(videos)
.set({ transcriptionStatus: "PROCESSING" })
.where(eq(videos.id, videoId));

@oaris-dev
Copy link
Copy Markdown
Author

FYI — opened #1833 alongside this PR. When end-to-end validating this fix against a clean upstream build, I found a second blocker on self-hosted Docker: proxy.ts redirects /.well-known/workflow/v1/* (used by the workflow queue's self-dispatch) to /login, so the workflow engine can't run at all.

Both fixes are needed for self-hosted transcription to actually work end-to-end. They're independent code paths in different files, so I split them into two PRs for clarity, but happy to fold into one if you prefer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor:verified Contributor passed trust analysis. pr:verified PR passed security analysis.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Transcription workflow fails on Docker image with Node 24

2 participants