Skip to content

feat(reasoning): add reasoning effort configuration to providers#152

Merged
jrob5756 merged 1 commit intomainfrom
feat/reasoning-effort
May 6, 2026
Merged

feat(reasoning): add reasoning effort configuration to providers#152
jrob5756 merged 1 commit intomainfrom
feat/reasoning-effort

Conversation

@jrob5756
Copy link
Copy Markdown
Collaborator

@jrob5756 jrob5756 commented May 5, 2026

Summary

Adds an optional unified reasoning.effort field that lets users dial up
model "thinking" / reasoning at both the workflow level
(runtime.default_reasoning_effort) and per-agent (reasoning.effort).
Per-agent value overrides the runtime default.

runtime:
  provider: copilot
  default_reasoning_effort: medium

agents:
  - name: explainer
    model: gpt-5
    prompt: \"Explain this algorithm.\"
    # inherits runtime default 'medium'

  - name: architect
    model: gpt-5
    reasoning:
      effort: high      # override
    prompt: \"Design a system architecture.\"

Allowed values: `low`, `medium`, `high`, `xhigh`.

Per-provider translation

Effort Copilot Claude (`thinking.budget_tokens`)
low `reasoning_effort="low"` 2048
medium `reasoning_effort="medium"` 8192
high `reasoning_effort="high"` 16384
xhigh `reasoning_effort="xhigh"` 32768

Copilot

  • Forwards `reasoning_effort` to `CopilotClient.create_session`.
  • Validates against the model's `supportedReasoningEfforts` (from
    `list_models()`) and raises `ValidationError` with a clear message
    when the model does not support the requested effort.
  • Skipped in mock-handler mode.
  • Retry loops fixed to not swallow `ValidationError`.

Claude

  • Enables extended thinking via
    `messages.create(thinking={"type":"enabled","budget_tokens":N})`.
  • Auto-coerces `temperature=1.0` (Anthropic API requirement) and bumps
    `max_tokens` to fit `budget+4096` (capped at 64000 for thinking-
    enabled requests).
  • Validates against thinking-capable model prefixes (`claude-3-7-`,
    `claude-opus-4`, `claude-sonnet-4`, `claude-haiku-4`) and raises
    `ValidationError` for non-thinking models.
  • Surfaces thinking content blocks via the existing `agent_reasoning`
    event (provider parity with Copilot's `assistant.reasoning`).

Provider parity

Both providers:

  • Accept the same `reasoning.effort` field with identical semantics.
  • Validate model support and raise a consistent `ValidationError` when
    unsupported.
  • Emit `agent_reasoning` events from any reasoning content the model
    produces, so the dashboard, JSONL logger, and console subscriber
    render it consistently.

Schema

  • `AgentDef.reasoning: ReasoningConfig | None` (new `ReasoningConfig`
    with single `effort` field).
  • `RuntimeConfig.default_reasoning_effort: Literal[...] | None`.
  • Forbidden on `script`, `human_gate`, and sub-workflow agent types
    (parity with how `model` and `retry` are handled).

Files changed

  • `src/conductor/config/schema.py` — new `ReasoningConfig`,
    `AgentDef.reasoning`, `RuntimeConfig.default_reasoning_effort`,
    forbid rules.
  • `src/conductor/providers/reasoning.py` (new) — shared effort→budget
    mapping, Claude thinking-model prefix allow-list, agent/runtime
    resolver.
  • `src/conductor/providers/copilot.py` — forward `reasoning_effort`,
    validate against `supportedReasoningEfforts`, retry loop fix.
  • `src/conductor/providers/claude.py` — forward `thinking` kwarg at
    every `messages.create` site, auto-coerce temperature/max_tokens,
    emit `agent_reasoning` from thinking blocks.
  • `src/conductor/providers/factory.py` — typed `ReasoningEffort`
    forwarding to both providers.
  • `examples/reasoning-effort.yaml` (new) — demonstrates inheritance
    and per-agent override.
  • `AGENTS.md` — key pattern + provider parity bullet.

Tests

45 new tests:

  • 21 schema cases (parametrized): accept/reject effort values, runtime
    default, per-type forbid rules.
  • 7 Copilot cases: `reasoning_effort` forwarded; runtime default and
    per-agent precedence; key absent when unset; `ValidationError` on
    unsupported effort; mock-mode skip.
  • 13 Claude cases: `thinking` kwarg shape; effort→budget mapping;
    `temperature` coerced to 1.0; `max_tokens` bumped; `ValidationError`
    on non-thinking model; runtime/per-agent precedence; thinking blocks
    emit `agent_reasoning`; key absent when unset.
  • 4 e2e + factory wiring cases.

Verification

  • `make lint` ✅
  • `make typecheck` ✅ (only a pre-existing unrelated warning in
    `dialog_evaluator.py` remains)
  • `make validate-examples` ✅ (all 14 examples)
  • `make test` ✅ — 2330 passed, 9 skipped

Out of scope

  • Anthropic `display: summarized | omitted` flag.
  • Anthropic `adaptive` thinking type.
  • Mid-workflow dynamic effort adjustment.

🤖 Generated with Copilot CLI

Adds an optional unified reasoning.effort field at both the runtime
default level (runtime.default_reasoning_effort) and per-agent level
(reasoning.effort), with values low | medium | high | xhigh. Per-agent
overrides the runtime default.

Each provider translates the unified value to its native API:

- Copilot: passes reasoning_effort to CopilotClient.create_session.
  Validates against the model's supportedReasoningEfforts (from
  list_models()) and raises ValidationError with a clear message when
  the model does not support the requested effort. Skipped in mock-
  handler mode.

- Claude: enables extended thinking via messages.create(thinking={...}).
  Effort to budget mapping: low=2048, medium=8192, high=16384,
  xhigh=32768 tokens. Auto-coerces temperature to 1.0 and bumps
  max_tokens to fit budget+4096 (capped at 64000 for thinking-enabled
  requests). Validates against thinking-capable model prefixes
  (claude-3-7-, claude-opus-4, claude-sonnet-4, claude-haiku-4).
  Surfaces thinking content blocks via the existing agent_reasoning
  event callback (provider parity with Copilot's assistant.reasoning).

Provider parity is preserved: both providers accept the same field with
identical semantics, raise ValidationError consistently when the model
does not support reasoning, and emit agent_reasoning events.

New shared helper module src/conductor/providers/reasoning.py
centralizes the effort-to-budget mapping, the Claude model prefix
allow-list, and the agent/runtime resolution logic.

Schema validators forbid reasoning on script, human_gate, and
sub-workflow agent types (parity with how 'model' and 'retry' are
handled).

Includes:
- Schema tests (21 cases, parametrized)
- Copilot provider tests (7 cases)
- Claude provider tests (13 cases)
- E2E workflow + factory wiring tests (4 cases)
- examples/reasoning-effort.yaml demonstrating runtime default and
  per-agent override
- AGENTS.md updated with reasoning effort key pattern and provider
  parity bullet

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jrob5756 jrob5756 force-pushed the feat/reasoning-effort branch from e5ee905 to 867cb6e Compare May 6, 2026 00:17
@jrob5756 jrob5756 merged commit 6de42e4 into main May 6, 2026
7 checks passed
@jrob5756 jrob5756 deleted the feat/reasoning-effort branch May 6, 2026 00:54
@jrob5756 jrob5756 mentioned this pull request May 6, 2026
jrob5756 added a commit that referenced this pull request May 6, 2026
* chore: release 0.1.12

Bumps version to 0.1.12 and updates CHANGELOG with the four PRs
merged since v0.1.11:

- #149: Windows install diagnostics
- #151: Tag-based registry versioning with # ref syntax
- #152: Unified reasoning.effort configuration
- #153: Dashboard layout fix for human_gate options + loop-backs

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore: add #155 (Windows update reliability) to 0.1.12 changelog

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant