fix(cli): reject execute() immediately when child process is dead #2978

nicktrn · 2026-01-30T12:42:50Z

Summary

When a child process crashes and a retry (RETRY_IMMEDIATELY) is attempted on the same TaskRunProcess, execute() hangs forever because the IPC send is silently skipped and the attempt promise can never resolve
This caused runner pods to stay up indefinitely with no heartbeats or polls
Fix: reject the attempt promise immediately when the child is not connected, so the controller can proceed to warm start or exit

Test plan

Added taskRunProcess.test.ts — verifies execute() rejects promptly instead of hanging when the child process is dead
Deploy and verify no more stuck runner pods accumulate over time

When a child process crashes and a retry is attempted on the same TaskRunProcess, execute() would hang forever because the IPC send was silently skipped and the attempt promise could never resolve. This caused runner pods to stay up indefinitely with no heartbeats.

changeset-bot · 2026-01-30T12:42:57Z

🦋 Changeset detected

Latest commit: f3049f6

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 28 packages

Name	Type
trigger.dev	Patch
d3-chat	Patch
references-d3-openai-agents	Patch
references-nextjs-realtime	Patch
references-realtime-hooks-test	Patch
references-realtime-streams	Patch
references-telemetry	Patch
@trigger.dev/build	Patch
@trigger.dev/core	Patch
@trigger.dev/python	Patch
@trigger.dev/react-hooks	Patch
@trigger.dev/redis-worker	Patch
@trigger.dev/rsc	Patch
@trigger.dev/schema-to-json	Patch
@trigger.dev/sdk	Patch
@trigger.dev/database	Patch
@trigger.dev/otlp-importer	Patch
@internal/cache	Patch
@internal/clickhouse	Patch
@internal/redis	Patch
@internal/replication	Patch
@internal/run-engine	Patch
@internal/schedule-engine	Patch
@internal/testcontainers	Patch
@internal/tracing	Patch
@internal/tsql	Patch
@internal/zod-worker	Patch
@internal/sdk-compat-tests	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

coderabbitai · 2026-01-30T12:43:13Z

Walkthrough

This pull request fixes a hang issue when execute() is called on a dead child process in TaskRunProcess. A new check is added to the execute method that detects when the IPC channel to the child process is not connected and immediately rejects the pending attempt with an UnexpectedExitError, marking the attempt status as REJECTED. A corresponding test validates this behavior and ensures the code does not hang in this scenario.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	The description provides a clear summary of the problem and solution, includes a test plan with one verified item, but is missing several required sections from the template: issue reference, checklist items, and changelog.	Complete the description by adding issue reference (Closes #), filling in the checklist items, adding a proper changelog section, and confirming all deployment testing is complete.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely describes the main fix: rejecting execute() immediately when the child process is dead, which directly addresses the core issue in the changeset.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/dead-process-execute-hang

📜 Recent review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b221719 and f3049f6.

📒 Files selected for processing (3)

.changeset/fix-dead-process-execute-hang.md
packages/cli-v3/src/executions/taskRunProcess.test.ts
packages/cli-v3/src/executions/taskRunProcess.ts

🧰 Additional context used

📓 Path-based instructions (8)

**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead

**/*.{ts,tsx}: Always import tasks from @trigger.dev/sdk, never use @trigger.dev/sdk/v3 or deprecated client.defineJob pattern
Every Trigger.dev task must be exported and have a unique id property with no timeouts in the run function

Files:

packages/cli-v3/src/executions/taskRunProcess.ts
packages/cli-v3/src/executions/taskRunProcess.test.ts

**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use function declarations instead of default exports

Import from @trigger.dev/core using subpaths only, never import from root

Files:

packages/cli-v3/src/executions/taskRunProcess.ts
packages/cli-v3/src/executions/taskRunProcess.test.ts

**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)

**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries

Files:

packages/cli-v3/src/executions/taskRunProcess.ts
packages/cli-v3/src/executions/taskRunProcess.test.ts

**/*.{js,ts,jsx,tsx,json,md,yaml,yml}

📄 CodeRabbit inference engine (AGENTS.md)

Format code using Prettier before committing

Files:

packages/cli-v3/src/executions/taskRunProcess.ts
packages/cli-v3/src/executions/taskRunProcess.test.ts

{packages,integrations}/**/*

📄 CodeRabbit inference engine (CLAUDE.md)

Add a changeset when modifying any public package in packages/* or integrations/* using pnpm run changeset:add

Files:

packages/cli-v3/src/executions/taskRunProcess.ts
packages/cli-v3/src/executions/taskRunProcess.test.ts

**/*.{test,spec}.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use vitest for all tests in the Trigger.dev repository

Files:

packages/cli-v3/src/executions/taskRunProcess.test.ts

**/*.test.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (AGENTS.md)

**/*.test.{ts,tsx,js,jsx}: Test files should live beside the files under test and use descriptive describe and it blocks
Tests should avoid mocks or stubs and use the helpers from @internal/testcontainers when Redis or Postgres are needed
Use vitest for running unit tests

**/*.test.{ts,tsx,js,jsx}: Use vitest exclusively for testing and never mock anything - use testcontainers instead
Place test files next to source files with naming pattern: source file (e.g., MyService.ts) → MyService.test.ts

Files:

packages/cli-v3/src/executions/taskRunProcess.test.ts

**/*.test.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Use testcontainers helpers (redisTest, postgresTest, containerTest) from @internal/testcontainers for Redis/PostgreSQL testing instead of mocks

Files:

packages/cli-v3/src/executions/taskRunProcess.test.ts

🧠 Learnings (3)

📚 Learning: 2024-10-18T15:41:52.352Z

Learnt from: nicktrn
Repo: triggerdotdev/trigger.dev PR: 1418
File: packages/core/src/v3/errors.ts:364-371
Timestamp: 2024-10-18T15:41:52.352Z
Learning: In `packages/core/src/v3/errors.ts`, within the `taskRunErrorEnhancer` function, `error.message` is always defined, so it's safe to directly call `error.message.includes("SIGTERM")` without additional checks.

Applied to files:

packages/cli-v3/src/executions/taskRunProcess.ts

📚 Learning: 2026-01-15T11:50:06.067Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-15T11:50:06.067Z
Learning: Applies to **/*.{ts,tsx} : Every Trigger.dev task must be exported and have a unique `id` property with no timeouts in the run function

Applied to files:

packages/cli-v3/src/executions/taskRunProcess.test.ts

📚 Learning: 2025-10-08T11:48:12.327Z

Learnt from: nicktrn
Repo: triggerdotdev/trigger.dev PR: 2593
File: packages/core/src/v3/workers/warmStartClient.ts:168-170
Timestamp: 2025-10-08T11:48:12.327Z
Learning: The trigger.dev runners execute only in Node 21 and 22 environments, so modern Node.js APIs like AbortSignal.any (introduced in v20.3.0) are supported.

Applied to files:

packages/cli-v3/src/executions/taskRunProcess.test.ts
.changeset/fix-dead-process-execute-hang.md

🧬 Code graph analysis (1)

packages/cli-v3/src/executions/taskRunProcess.ts (1)

packages/core/src/v3/errors.ts (1)

UnexpectedExitError (508-518)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (25)

GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
GitHub Check: sdk-compat / Bun Runtime
GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
GitHub Check: sdk-compat / Node.js 20.20 (ubuntu-latest)
GitHub Check: typecheck / typecheck
GitHub Check: sdk-compat / Node.js 22.12 (ubuntu-latest)
GitHub Check: sdk-compat / Cloudflare Workers

🔇 Additional comments (4)

.changeset/fix-dead-process-execute-hang.md (1)

1-5: LGTM!

The changeset is correctly formatted and appropriately categorized as a patch for this bug fix.

packages/cli-v3/src/executions/taskRunProcess.ts (1)

300-312: LGTM! Solid fix for the hang.

The approach correctly addresses the race condition where execute() could hang indefinitely:

Sets attempt status to REJECTED before rejection (consistent with #handleExit behavior)

Uses UnexpectedExitError with a descriptive stderr message for debugging

The sentinel code -1 appropriately indicates no actual exit code was observed

packages/cli-v3/src/executions/taskRunProcess.test.ts (2)

64-119: LGTM! Well-designed test for the edge case.

The Promise.race approach effectively detects the hang scenario. The internal state manipulation via (proc as any) is a pragmatic choice here—spawning a real child and forcing it into this specific "dead but not cleaned up" state would be fragile and flaky.

12-60: Test fixtures are appropriately minimal.

The use of type casts (as unknown as WorkerManifest, etc.) keeps fixtures focused on what the test actually needs without maintaining full type compliance for unrelated fields.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

Tip

🧪 Unit Test Generation v2 is now available!

We have significantly improved our unit test generation capabilities.

To enable: Add this to your .coderabbit.yaml configuration:

reviews:
  finishing_touches:
    unit_tests:
      enabled: true

Try it out by using the @coderabbitai generate unit tests command on your code files or under ✨ Finishing Touches on the walkthrough!

Have feedback? Share your thoughts on our Discord thread!

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

vibe-kanban-cloud · 2026-01-30T12:46:26Z

Review Complete

Your review story is ready!

View Story

Comment !reviewfast on this PR to re-generate the story.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(cli): reject execute() immediately when child process is dead #2978

fix(cli): reject execute() immediately when child process is dead #2978

nicktrn commented Jan 30, 2026

Uh oh!

changeset-bot bot commented Jan 30, 2026

Uh oh!

coderabbitai bot commented Jan 30, 2026 •

edited

Loading

Uh oh!

vibe-kanban-cloud bot commented Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

fix(cli): reject execute() immediately when child process is dead #2978

Are you sure you want to change the base?

fix(cli): reject execute() immediately when child process is dead #2978

Conversation

nicktrn commented Jan 30, 2026

Summary

Test plan

Uh oh!

changeset-bot bot commented Jan 30, 2026

🦋 Changeset detected

Uh oh!

coderabbitai bot commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

Uh oh!

vibe-kanban-cloud bot commented Jan 30, 2026

Review Complete

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Jan 30, 2026 •

edited

Loading