Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .changeset/fix-batch-duplicate-idempotency.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
"@trigger.dev/webapp": patch
---

Fix batchTriggerAndWait running forever when duplicate idempotencyKey is provided in the same batch

When using batchTriggerAndWait with duplicate idempotencyKeys in the same batch, the batch would never complete because the completedCount and expectedCount would be mismatched. This fix ensures that cached runs (duplicate idempotencyKeys) are properly tracked in the batch, with their completedCount incremented immediately if the cached run is already in a final status.
92 changes: 78 additions & 14 deletions apps/webapp/app/v3/services/batchTriggerV3.server.ts
Original file line number Diff line number Diff line change
Expand Up @@ -124,11 +124,11 @@ export class BatchTriggerV3Service extends BaseService {

const existingBatch = options.idempotencyKey
? await this._prisma.batchTaskRun.findFirst({
where: {
runtimeEnvironmentId: environment.id,
idempotencyKey: options.idempotencyKey,
},
})
where: {
runtimeEnvironmentId: environment.id,
idempotencyKey: options.idempotencyKey,
},
})
: undefined;

if (existingBatch) {
Expand Down Expand Up @@ -167,16 +167,16 @@ export class BatchTriggerV3Service extends BaseService {

const dependentAttempt = body?.dependentAttempt
? await this._prisma.taskRunAttempt.findFirst({
where: { friendlyId: body.dependentAttempt },
include: {
taskRun: {
select: {
id: true,
status: true,
},
where: { friendlyId: body.dependentAttempt },
include: {
taskRun: {
select: {
id: true,
status: true,
},
},
})
},
})
: undefined;

if (
Expand Down Expand Up @@ -890,7 +890,71 @@ export class BatchTriggerV3Service extends BaseService {
}
}

return false;
// FIX for Issue #2965: When a run is cached (duplicate idempotencyKey),
// we need to ALWAYS create a BatchTaskRunItem to properly track it.
// This handles cases where cached run may originate from another batch.
// Use unique constraint (batchTaskRunId, taskRunId) to prevent duplicates.
const isAlreadyComplete = isFinalRunStatus(result.run.status);

logger.debug(
"[BatchTriggerV2][processBatchTaskRunItem] Cached run detected, creating batch item",
{
batchId: batch.friendlyId,
runId: task.runId,
cachedRunId: result.run.id,
cachedRunStatus: result.run.status,
isAlreadyComplete,
currentIndex,
}
);

// Always create BatchTaskRunItem for cached runs
// This ensures proper tracking even for cross-batch scenarios
try {
await this._prisma.batchTaskRunItem.create({
data: {
batchTaskRunId: batch.id,
taskRunId: result.run.id,
// Use appropriate status based on the cached run's current status
status: isAlreadyComplete ? "COMPLETED" : batchTaskRunItemStatusForRunStatus(result.run.status),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Cached failed runs incorrectly marked as COMPLETED instead of FAILED

When a cached run has a final failed status (e.g., CRASHED, SYSTEM_FAILURE, CANCELED, etc.), the BatchTaskRunItem is incorrectly set to "COMPLETED" instead of the appropriate "FAILED" status.

Click to expand

Root Cause

The code uses isAlreadyComplete which is isFinalRunStatus(result.run.status). However, isFinalRunStatus returns true for ALL final statuses including failed ones (taskStatus.ts:3-12):

export const FINAL_RUN_STATUSES = [
  "CANCELED",
  "INTERRUPTED",
  "COMPLETED_SUCCESSFULLY",
  "COMPLETED_WITH_ERRORS",
  "SYSTEM_FAILURE",
  "CRASHED",
  "EXPIRED",
  "TIMED_OUT",
];

The ternary at line 919 sets status to "COMPLETED" when isAlreadyComplete is true, regardless of whether the run actually succeeded or failed:

status: isAlreadyComplete ? "COMPLETED" : batchTaskRunItemStatusForRunStatus(result.run.status),

Expected Behavior

The status should use batchTaskRunItemStatusForRunStatus(result.run.status) for all cases, which correctly maps failed statuses to "FAILED" (taskRun.server.ts:113-139).

Impact

Batch items for cached failed runs will be marked as COMPLETED instead of FAILED, misrepresenting the actual outcome of the run in the batch tracking system.

Recommendation: Replace line 919 with: status: batchTaskRunItemStatusForRunStatus(result.run.status), to properly map both successful and failed statuses.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

},
Comment on lines +918 to +920
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Bug: Failed cached runs are incorrectly marked as "COMPLETED".

When isAlreadyComplete is true, the code hardcodes status to "COMPLETED". However, isFinalRunStatus() returns true for failed statuses too (CANCELED, CRASHED, EXPIRED, TIMED_OUT, etc.). This causes failed cached runs to be marked as "COMPLETED" instead of "FAILED".

The batchTaskRunItemStatusForRunStatus() helper already handles all cases correctly (see apps/webapp/app/models/taskRun.server.ts lines 112-139). Simply use it unconditionally, matching the non-cached path on line 869.

🐛 Proposed fix
       await this._prisma.batchTaskRunItem.create({
         data: {
           batchTaskRunId: batch.id,
           taskRunId: result.run.id,
-          // Use appropriate status based on the cached run's current status
-          status: isAlreadyComplete ? "COMPLETED" : batchTaskRunItemStatusForRunStatus(result.run.status),
+          status: batchTaskRunItemStatusForRunStatus(result.run.status),
         },
       });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Use appropriate status based on the cached run's current status
status: isAlreadyComplete ? "COMPLETED" : batchTaskRunItemStatusForRunStatus(result.run.status),
},
await this._prisma.batchTaskRunItem.create({
data: {
batchTaskRunId: batch.id,
taskRunId: result.run.id,
status: batchTaskRunItemStatusForRunStatus(result.run.status),
},
});
🤖 Prompt for AI Agents
In `@apps/webapp/app/v3/services/batchTriggerV3.server.ts` around lines 918 - 920,
The cached-run branch currently forces status to "COMPLETED" when
isAlreadyComplete is true, which mislabels failed final statuses; change the
assignment to always use batchTaskRunItemStatusForRunStatus(result.run.status)
(remove the hardcoded "COMPLETED" branch) so cached runs use the same mapping
helper as the non-cached path and correctly reflect FAILED/CANCELED/etc.; update
the code around isAlreadyComplete to call batchTaskRunItemStatusForRunStatus
with result.run.status unconditionally.

});

// Only increment completedCount if the cached run is already finished
// For in-progress runs, completedCount will be incremented when the run completes
if (isAlreadyComplete) {
await this._prisma.batchTaskRun.update({
where: { id: batch.id },
data: {
completedCount: {
increment: 1,
},
},
});
}

// Return true so expectedCount is incremented
return true;
} catch (error) {
if (isUniqueConstraintError(error, ["batchTaskRunId", "taskRunId"])) {
// BatchTaskRunItem already exists for this batch and cached run
// This can happen if the same idempotencyKey is used multiple times in the same batch
logger.debug(
"[BatchTriggerV2][processBatchTaskRunItem] BatchTaskRunItem already exists for cached run",
{
batchId: batch.friendlyId,
runId: task.runId,
cachedRunId: result.run.id,
currentIndex,
}
);

// Don't increment expectedCount since this item is already tracked
return false;
}

throw error;
}
}

async #enqueueBatchTaskRun(options: BatchProcessingOptions) {
Expand Down