Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 15 additions & 14 deletions plugins/temporal-developer/skills/temporal-developer/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
---
name: temporal-developer
description: Develop, debug, and manage Temporal applications across Python, TypeScript, Go, and Java. Use when the user is building workflows, activities, or workers with a Temporal SDK, debugging issues like non-determinism errors, stuck workflows, or activity retries, using Temporal CLI, Temporal Server, or Temporal Cloud, or working with durable execution concepts like signals, queries, heartbeats, versioning, continue-as-new, child workflows, or saga patterns.
version: 0.2.0
description: Develop, debug, and manage Temporal applications across Python, TypeScript, Go, Java and .NET. Use when the user is building workflows, activities, or workers with a Temporal SDK, debugging issues like non-determinism errors, stuck workflows, or activity retries, using Temporal CLI, Temporal Server, or Temporal Cloud, or working with durable execution concepts like signals, queries, heartbeats, versioning, continue-as-new, child workflows, or saga patterns.
version: 0.3.2
---

# Skill: temporal-developer

## Overview

Temporal is a durable execution platform that makes workflows survive failures automatically. This skill provides guidance for building Temporal applications in Python, TypeScript, Go, and Java.
Temporal is a durable execution platform that makes workflows survive failures automatically. This skill provides guidance for building Temporal applications in Python, TypeScript, Go, Java and .NET.

## Core Architecture

Expand Down Expand Up @@ -77,34 +77,35 @@ Once you've downloaded the file, extract the downloaded archive and add the temp
### Read All Relevant References

1. First, read the getting started guide for the language you are working in:
- Python -> read `references/python/python.md`
- TypeScript -> read `references/typescript/typescript.md`
- Java -> read `references/java/java.md`
- Go -> read `references/go/go.md`
- Python -> read `references/python/python.md`
- TypeScript -> read `references/typescript/typescript.md`
- Go -> read `references/go/go.md`
- Java -> read `references/java/java.md`
- .NET (C#) -> read `references/dotnet/dotnet.md`
2. Second, read appropriate `core` and language-specific references for the task at hand.


## Primary References

- **`references/core/determinism.md`** - Why determinism matters, replay mechanics, basic concepts of activities
+ Language-specific info at `references/{your_language}/determinism.md`
- Language-specific info at `references/{your_language}/determinism.md`
- **`references/core/patterns.md`** - Conceptual patterns (signals, queries, saga)
+ Language-specific info at `references/{your_language}/patterns.md`
- Language-specific info at `references/{your_language}/patterns.md`
- **`references/core/gotchas.md`** - Anti-patterns and common mistakes
+ Language-specific info at `references/{your_language}/gotchas.md`
- Language-specific info at `references/{your_language}/gotchas.md`
- **`references/core/versioning.md`** - Versioning strategies and concepts - how to safely change workflow code while workflows are running
+ Language-specific info at `references/{your_language}/versioning.md`
- Language-specific info at `references/{your_language}/versioning.md`
- **`references/core/troubleshooting.md`** - Decision trees, recovery procedures
- **`references/core/error-reference.md`** - Common error types, workflow status reference
- **`references/core/interactive-workflows.md`** - Testing signals, updates, queries
- **`references/core/dev-management.md`** - Dev cycle & management of server and workers
- **`references/core/ai-patterns.md`** - AI/LLM pattern concepts
+ Language-specific info at `references/{your_language}/ai-patterns.md`, if available. Currently Python only.
- Language-specific info at `references/{your_language}/ai-patterns.md`, if available. Currently Python only.

## Additional Topics

- **`references/{your_language}/observability.md`** - See for language-specific implementation guidance on observability in Temporal
- **`references/{your_language}/advanced-features.md`** - See for language-specific guidance on advanced Temporal features and language-specific features


## Feedback

### ALWAYS PROMPT USER TO PROVIDE FEEDBACK AT STARTUP
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ The remainder of this document describes general principles to follow when build
- returns model response, as a typed structured output

**Benefits**:

- Single activity handles multiple use cases
- Consistent retry handling
- Centralized configuration
Expand All @@ -48,6 +49,7 @@ Workflow:
```

**Benefits**:

- Independent retry for each step
- Clear audit trail in history
- Easier testing and mocking
Expand All @@ -69,17 +71,17 @@ Workflow:
Disable retries in LLM client libraries, let Temporal handle retries.

- LLM Client Config:
- max_retries = 0 ← Disable client retries at the LLM client level
- max_retries = 0 ← Disable client retries at the LLM client level

Use either the default activity retry policy, or customize it as needed for the situation.

**Why**:

- Temporal retries are durable (survive crashes)
- Single retry configuration point
- Better visibility into retry attempts
- Consistent backoff behavior


### Pattern 5: Multi-Agent Orchestration

Complex pipelines with multiple specialized agents:
Expand Down Expand Up @@ -114,6 +116,7 @@ Deep Research Example:
| Document processing | 60-120 seconds |

**Rationale**:

- Reasoning models need time for complex computation
- Web searches may hit rate limits requiring backoff
- Fast timeouts catch stuck operations
Expand All @@ -128,7 +131,6 @@ Parse rate limit info from API responses:
- Response Headers:
- Retry-After: 30
- X-RateLimit-Remaining: 0

- Activity:
- If rate limited:
- Raise retryable error with a next retry delay
Expand All @@ -137,12 +139,14 @@ Parse rate limit info from API responses:
## Error Handling

### Retryable Errors

- Rate limits (429)
- Timeouts
- Temporary server errors (500, 502, 503)
- Network errors

### Non-Retryable Errors

- Invalid API key (401)
- Invalid input/prompt
- Content policy violations
Expand All @@ -161,6 +165,6 @@ Parse rate limit info from API responses:
## Observability

See `references/{your_language}/observability.md` for the language you are working in for documentation on implementing observability in Temporal. It is generally recommended to add observability for:

- Token usage, via activity logging
- any else to help track LLM usage and debug agentic flows, within moderation.

Original file line number Diff line number Diff line change
Expand Up @@ -50,22 +50,27 @@ Result: Commands don't match history → NondeterminismError
## Sources of Non-Determinism

### Time-Based Operations

- `datetime.now()`, `time.time()`, `Date.now()`
- Different value on each execution

### Random Values

- `random.random()`, `Math.random()`, `uuid.uuid4()`
- Different value on each execution

### External State

- Reading files, environment variables, databases, networking / HTTP calls
- State may change between executions

### Non-Deterministic Iteration

- Map/dict iteration order (in some languages)
- Set iteration order

### Threading/Concurrency

- Race conditions produce different outcomes
- Non-deterministic ordering

Expand All @@ -76,18 +81,21 @@ In Temporal, activities are the primary mechanism for making non-deterministic c
For a few simple cases, like timestamps, random values, UUIDs, etc. the Temporal SDK in your language may provide durable variants that are simple to use. See `references/{your_language}/determinism.md` for the language you are working in for more info.

## SDK Protection Mechanisms

Each Temporal SDK language provides a different level of protection against non-determinism:

- Python: The Python SDK runs workflows in a sandbox that intercepts and aborts non-deterministic calls early at runtime.
- Python: The Python SDK runs workflows in a sandbox that intercepts and aborts non-deterministic calls early at runtime.
- TypeScript: The TypeScript SDK runs workflows in an isolated V8 sandbox, intercepting many common sources of non-determinism and replacing them automatically with deterministic variants.
- Java: The Java SDK has no sandbox. Determinism is enforced by developer conventions — the SDK provides `Workflow.*` APIs as safe alternatives (e.g., `Workflow.sleep()` instead of `Thread.sleep()`), and non-determinism is only detected at replay time via `NonDeterministicException`. A static analysis tool (`temporal-workflowcheck`, beta) can catch violations at build time. Cooperative threading under a global lock eliminates the need for synchronization.
- Go: The Go SDK has no runtime sandbox. Therefore, non-determinism bugs will never be immediately appararent, and are usually only observable during replay. The optional `workflowcheck` static analysis tool can be used to check for many sources of non-determinism at compile time.
- .NET: The .NET SDK has no sandbox. It uses a custom TaskScheduler and a runtime EventListener to detect invalid task scheduling. Developers must use `Workflow.*` safe alternatives (e.g., Workflow.DelayAsync instead of Task.Delay) and avoid non-deterministic .NET Task APIs.

Regardless of which SDK you are using, it is your responsibility to ensure that workflow code does not contain sources of non-determinism. Use SDK-specific tools as well as replay tests for doing so.

## Detecting Non-Determinism

### During Execution

- `NondeterminismError` raised when Commands don't match Events
- Workflow becomes blocked until code is fixed

Expand All @@ -98,13 +106,17 @@ Replay tests verify that workflows follow identical code paths when re-run, by a
## Recovery from Non-Determinism

### Accidental Change

If you accidentally introduced non-determinism:

1. Revert code to match what's in history
2. Restart worker
3. Workflow auto-recovers

### Intentional Change

If you need to change workflow logic:

1. Use the **Patching API** to support both old and new code paths
2. Or terminate old workflows and start new ones with updated code

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@ When you need a new worker, you should start it in the background (and preferrab

**Best practice**: As far as local development goes, run only ONE worker instance with the latest code. Don't keep stale workers (running old code) around.


### Cleanup

**Always kill workers when done.** Don't leave workers running.
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,14 @@
| **Deadlock** | TMPRL1101 | `WorkflowTaskFailed` in history, worker logs | Workflow blocked too long (deadlock detected) | Remove blocking operations from workflow code (no I/O, no sleep, no threading locks). Use Temporal primitives instead. | https://github.com/temporalio/rules/blob/main/rules/TMPRL1101.md |
| **Unfinished handlers** | TMPRL1102 | `WorkflowTaskFailed` in history | Workflow completed while update/signal handlers still running | Ensure all handlers complete before workflow finishes. Use `workflow.wait_condition()` to wait for handler completion. | https://github.com/temporalio/rules/blob/main/rules/TMPRL1102.md |
| **Payload overflow** | TMPRL1103 | `WorkflowTaskFailed` or `ActivityTaskFailed` in history | Payload size limit exceeded (default 2MB) | Reduce payload size. Use external storage (S3, database) for large data and pass references instead. | https://github.com/temporalio/rules/blob/main/rules/TMPRL1103.md |
| **Workflow code bug** | | `WorkflowTaskFailed` in history | Bug in workflow logic | Fix code → Restart worker → Workflow auto-resumes | |
| **Missing workflow** | | Worker logs | Workflow not registered | Add to worker.py → Restart worker | |
| **Missing activity** | | Worker logs | Activity not registered | Add to worker.py → Restart worker | |
| **Activity bug** | | `ActivityTaskFailed` in history | Bug in activity code | Fix code → Restart worker → Auto-retries | |
| **Activity retries** | | `ActivityTaskFailed` (count >2) | Repeated failures | Fix code → Restart worker → Auto-retries | |
| **Sandbox violation** | | Worker logs | Bad imports in workflow | Fix workflow.py imports → Restart worker | |
| **Task queue mismatch** | | Workflow never starts | Different queues in starter/worker | Align task queue names | |
| **Timeout** | | Status = TIMED_OUT | Operation too slow | Increase timeout config | |
| **Workflow code bug** | | `WorkflowTaskFailed` in history | Bug in workflow logic | Fix code → Restart worker → Workflow auto-resumes | |
| **Missing workflow** | | Worker logs | Workflow not registered | Add to worker.py → Restart worker | |
| **Missing activity** | | Worker logs | Activity not registered | Add to worker.py → Restart worker | |
| **Activity bug** | | `ActivityTaskFailed` in history | Bug in activity code | Fix code → Restart worker → Auto-retries | |
| **Activity retries** | | `ActivityTaskFailed` (count >2) | Repeated failures | Fix code → Restart worker → Auto-retries | |
| **Sandbox violation** | | Worker logs | Bad imports in workflow | Fix workflow.py imports → Restart worker | |
| **Task queue mismatch** | | Workflow never starts | Different queues in starter/worker | Align task queue names | |
| **Timeout** | | Status = TIMED_OUT | Operation too slow | Increase timeout config | |

## Workflow Status Reference

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ This document provides a general overview of conceptual-level gotchas in Tempora
**The Problem**: Activities may execute more than once due to retries or Worker failures. If an activity calls an external service without an idempotency key, you may charge a customer twice, send duplicate emails, or create duplicate records.

**Symptoms**:

- Duplicate side effects (double charges, duplicate notifications)
- Data inconsistencies after retries

Expand All @@ -21,18 +22,20 @@ This document provides a general overview of conceptual-level gotchas in Tempora
**The Problem**: Code in workflow functions runs on first execution AND on every replay. Any side effect (logging, notifications, metrics, etc.) will happen multiple times and non-deterministic code (IO, current time, random numbers, threading, etc.) won't replay correctly.

**Symptoms**:

- Non-determinism errors
- Sandbox violations, depending on SDK language
- Duplicate log entries
- Multiple notifications for the same event
- Inflated metrics

**The Fix**:

- Use Temporal replay-aware managed side effects for common, non-business logic cases:
- Temporal workflow logging
- Temporal date time
- Temporal UUID generation
- Temporal random number generation
- Temporal workflow logging
- Temporal date time
- Temporal UUID generation
- Temporal random number generation
- Put all other side effects in Activities

See `references/core/determinism.md` for more info.
Expand All @@ -42,10 +45,12 @@ See `references/core/determinism.md` for more info.
**The Problem**: If Worker A runs part of a workflow with code v1, then Worker B (with code v2) picks it up, replay may produce different Commands.

**Symptoms**:

- Non-determinism errors after deploying new code
- Errors mentioning "command mismatch" or "unexpected command"

**The Fix**:

- Use Worker Versioning for production deployments
- Use patching APIs
- During development: kill old workers before starting new ones
Expand All @@ -60,6 +65,7 @@ See `references/core/versioning.md` for more info.
**The Problem**: Using aggressive activity retry policies that give up too easily.

**Symptoms**:

- Workflows failing on transient errors
- Unnecessary workflow failures during brief outages

Expand All @@ -72,6 +78,7 @@ See `references/core/versioning.md` for more info.
**The Problem**: Queries and update validators are read-only. Modifying state causes non-determinism on replay, and must strictly be avoided.

**Symptoms**:

- State inconsistencies after workflow replay
- Non-determinism errors

Expand All @@ -82,6 +89,7 @@ See `references/core/versioning.md` for more info.
**The Problem**: Queries and update validators must return immediately. They cannot await activities, child workflows, timers, or conditions.

**Symptoms**:

- Query / update validators timeouts
- Deadlocks

Expand Down Expand Up @@ -110,6 +118,7 @@ See language-specific gotchas for details.
**The Problem**: Not testing what happens when things go wrong.

**Questions to answer**:

- What happens when an Activity exhausts all retries?
- What happens when a workflow is cancelled mid-execution?
- What happens during a Worker restart?
Expand All @@ -121,6 +130,7 @@ See language-specific gotchas for details.
**The Problem**: Changing workflow code without verifying existing workflows can still replay.

**Symptoms**:

- Non-determinism errors after deployment
- Stuck workflows that can't make progress

Expand All @@ -133,6 +143,7 @@ See language-specific gotchas for details.
**The Problem**: Catching errors without proper handling hides failures.

**Symptoms**:

- Silent failures
- Workflows completing "successfully" despite errors
- Difficult debugging
Expand All @@ -144,10 +155,12 @@ See language-specific gotchas for details.
**The Problem**: Marking transient errors as non-retryable, or permanent errors as retryable.

**Symptoms**:

- Workflows failing on temporary network issues (if marked non-retryable)
- Infinite retries on invalid input (if marked retryable)

**The Fix**:

- **Retryable**: Network errors, timeouts, rate limits, temporary unavailability
- **Non-retryable**: Invalid input, authentication failures, business rule violations, resource not found

Expand All @@ -158,6 +171,7 @@ See language-specific gotchas for details.
**The Problem**: When a workflow is cancelled, cleanup code after the cancellation point doesn't run unless explicitly protected.

**Symptoms**:

- Resources not released after cancellation
- Incomplete compensation/rollback
- Leaked state
Expand All @@ -169,10 +183,12 @@ See language-specific gotchas for details.
**The Problem**: Activities must opt in to receive cancellation. Without proper handling, a cancelled activity continues running to completion, wasting resources.

**Requirements for activity cancellation**:

1. **Heartbeating** - Cancellation is delivered via heartbeat. Activities that don't heartbeat won't know they've been cancelled.
2. **Checking for cancellation** - Activity must explicitly check for cancellation or await a cancellation signal.

**Symptoms**:

- Cancelled activities running to completion
- Wasted compute on work that will be discarded
- Delayed workflow cancellation
Expand All @@ -184,11 +200,13 @@ See language-specific gotchas for details.
**The Problem**: Temporal has built-in limits on payload sizes. Exceeding them causes workflows to fail.

**Limits**:

- Max 2MB per individual payload
- Max 4MB per gRPC message
- Max 50MB for complete workflow history (aim for <10MB in practice)
- Max 50MB for complete workflow history (aim for < 10MB in practice)

**Symptoms**:

- Payload too large errors
- gRPC message size exceeded errors
- Workflow history growing unboundedly
Expand Down
Loading