Since: 2.1.0
Source of truth: src/geny_executor/core/errors.py (ExecutorErrorCode enum)
Every exception raised by geny-executor carries a stable string identifier
in the form exec.<component>.<reason>. Hosts use this code for:
- Logging / Sentry grouping — drop the free-form
str(exception)from your dashboards and group on the code instead. - i18n — map each code to a localized message template in your UI layer (see Geny's example).
- Telemetry routing — alert differently for
exec.api.*(vendor outages) vsexec.cli.*(host config bugs). - Retry / fallback decisions — recoverability is also exposed via
ErrorCategory.is_recoverable, butcodelets you fine-tune.
- Once published in a release, a code's string value never changes.
- Renaming or repurposing a code is a breaking change — deprecate the old code, add a new one.
- Adding new codes is non-breaking and ships in minor versions.
- The
tests/error_codes/test_code_stability.pyregression locks the string values so accidental rename CI-fails before release.
Every GenyExecutorError subclass exposes the code as the code
attribute. The pipeline's structured events (stage.error,
pipeline.error, api.retry) also carry it:
{
"type": "pipeline.error",
"data": {
"error": "Claude Code CLI is not authenticated …",
"code": "exec.cli.auth_failed",
"exception_type": "geny_executor.core.errors.APIError"
}
}These come from the SDK-driven providers (Anthropic, OpenAI, Google,
vLLM). The companion ErrorCategory on the APIError decides retry
behavior; the code is the stable identifier consumers branch on.
| Code | Recoverable? | Source | Description |
|---|---|---|---|
exec.api.auth.invalid_key |
❌ no | APIError(category=AUTH) |
API key missing / malformed / rejected by vendor. Action: paste a valid key in the host's LLM Backends settings. |
exec.api.auth.expired |
❌ no | APIError(category=AUTH) (future use) |
Vendor reports the credential is past its TTL. Action: re-issue / refresh. |
exec.api.rate_limited |
✅ yes | APIError(category=RATE_LIMITED) |
Vendor 429. The retry strategy backs off and retries automatically. Persisted-rate errors after EXEC_API_RETRY_EXHAUSTED. |
exec.api.timeout |
✅ yes | APIError(category=TIMEOUT) |
Request exceeded the per-call timeout. Retry with backoff. |
exec.api.network |
✅ yes | APIError(category=NETWORK) |
Connection reset / DNS / TLS / transport. Retry with backoff. |
exec.api.token_limit |
❌ no | APIError(category=TOKEN_LIMIT) |
Prompt + max_tokens exceeded the model's context window. Action: shrink context or pick a larger-window model. |
exec.api.bad_request |
❌ no | APIError(category=BAD_REQUEST) |
Vendor 4xx other than auth/rate-limit. Usually a schema bug in the host's request shape. |
exec.api.server_error |
✅ yes | APIError(category=SERVER_ERROR) |
Vendor 5xx. Retried by the executor. |
exec.api.terminal |
❌ no | APIError(category=TERMINAL) |
Vendor declared the request fatally unprocessable (e.g. policy block). Don't retry. |
exec.api.unknown |
❌ no | APIError(category=UNKNOWN) |
Catch-all for vendor errors the executor couldn't classify. Investigate the underlying cause. |
exec.api.no_client |
❌ no | Stage 6 build error | state.llm_client is None. Host forgot to call Pipeline.from_manifest(credentials=…) or attach_runtime(llm_client=…). |
exec.api.stream_incomplete |
❌ no | Stage 6 streaming | The stream ended without a message_complete event. Usually a vendor SDK bug or an interrupted upstream connection. |
exec.api.retry_exhausted |
❌ no | Stage 6 retry loop | Hit max_retries after a recoverable error category. Look at the chained cause for the original failure. |
| Code | Recoverable? | Source | Description |
|---|---|---|---|
exec.cli.binary_not_found |
❌ no | APIError(category=CLI_NOT_FOUND) |
The CLI binary (e.g. claude) is not on PATH and binary_path was not set. Action: install the CLI or configure the binary path. |
exec.cli.auth_failed |
❌ no | APIError(category=CLI_AUTH_FAILED) |
The spawned CLI reported authentication_failed. Action: re-run the CLI's login command (e.g. claude auth login) or paste a valid ANTHROPIC_API_KEY. |
exec.cli.timeout |
✅ yes | APIError(category=CLI_TIMEOUT) |
The CLI did not return within the configured timeout_s. Retry. |
exec.cli.protocol_error |
✅ yes | APIError(category=CLI_PROTOCOL_ERROR) |
The CLI emitted malformed stream-json output or unrecognised envelope. Retry; report if it persists. |
exec.cli.permission_denied |
❌ no | APIError(category=CLI_PERMISSION_DENIED) |
The CLI's permission system blocked the call (e.g. --dangerously-skip-permissions was attempted as root). Action: configure permissions.allow in the spawned settings. |
exec.cli.exited |
✅ yes | CLI subprocess non-zero exit | The CLI process exited with a non-zero return code outside the categorised cases above. Inspect the chained cause. |
| Code | Source | Description |
|---|---|---|
exec.pipeline.not_initialized |
PipelineError |
The pipeline was used before build() / from_manifest() was called. |
exec.pipeline.invalid_manifest |
PipelineError (future use) |
The manifest's schema/strict load rejected the configuration. |
exec.stage.failed |
StageError (default) |
A stage raised an exception that was wrapped by the pipeline's stage runner. Inspect the chained cause for the original failure. |
exec.stage.guard_rejected |
GuardRejectError |
A Stage 4 guard refused execution (budget / cost / iteration / permission). The guard_name field on the exception identifies which guard. |
These mirror the existing ToolErrorCode enum at the routing layer.
Host pipelines see them surface via ToolError.code on the
tool_result payload too.
| Code | Source | Description |
|---|---|---|
exec.tool.unknown |
RegistryRouter.unknown_tool() |
The LLM emitted a tool_use for a name that isn't registered. Usually a hallucination or a stale registry. |
exec.tool.invalid_input |
RegistryRouter.invalid_input() |
The tool's input schema validation failed. details.field_path says where. |
exec.tool.access_denied |
RegistryRouter.access_denied() |
The session's tool binding disallows this tool. |
exec.tool.crashed |
RegistryRouter.tool_crashed() |
The tool's execute() raised an unexpected exception. details.exception_type carries the class. |
exec.tool.transport |
RegistryRouter.transport() |
MCP adapter / RPC transport failure. details.server identifies the server. |
| Code | Source | Description |
|---|---|---|
exec.mutation.invalid |
MutationError |
Bad stage / slot / impl in the mutation request. |
exec.mutation.locked |
MutationLocked |
The target stage is currently executing; try again after stage exit. |
| Code | Source | Description |
|---|---|---|
exec.mcp.connect_failed |
MCPConnectionError(phase="connect") |
Could not reach the MCP server (transport / process spawn / handshake). |
exec.mcp.initialize_failed |
MCPConnectionError(phase="initialize") |
The MCP server connected but the initialize handshake failed. |
exec.mcp.list_tools_failed |
MCPConnectionError(phase="list_tools") |
tools/list errored after a successful initialize. |
exec.mcp.sdk_missing |
MCPConnectionError(phase="sdk_missing") |
The MCP SDK is not installed in the host's environment. |
| Code | When | Description |
|---|---|---|
exec.unknown |
last resort | The exception is a non-GenyExecutorError (e.g. raw RuntimeError / ValueError) and no code could be inferred. Indicates a raise site that hasn't been migrated to the typed exception hierarchy yet — file an issue. |
- Add the enum value to
ExecutorErrorCodeincore/errors.py. Lowercase, dot-separated, ≤4 segments. - Add a row to the table above under the right component.
- If the code corresponds to a legacy
ErrorCategory, extend_CATEGORY_TO_CODE_DEFAULTso existing call sites pick it up. - The stability regression test will auto-pick up the new code; no test change needed for additions.
Don't delete it. Instead:
- Mark the enum value with a deprecation comment.
- Add a new code and migrate raise sites incrementally.
- Keep the deprecated value in the table with a "deprecated → new code" note for at least one minor-version cycle.
- Only remove the enum value in a major version bump.
Phase 1 (this release, 2.1.0) — critical-path raise sites in
claude_code.py and s06_api/stage.py; all APIError(category=…)
sites automatically inherit the right code via from_category.
Phase 2+ (planned) — refit stage/guard/mutation/MCP raise sites with
explicit codes, drain generic RuntimeError/ValueError to typed
exceptions where appropriate.