feat(reasoning): add reasoning effort configuration to providers by jrob5756 · Pull Request #152 · microsoft/conductor

jrob5756 · 2026-05-05T23:17:40Z

Summary

Adds an optional unified reasoning.effort field that lets users dial up
model "thinking" / reasoning at both the workflow level
(runtime.default_reasoning_effort) and per-agent (reasoning.effort).
Per-agent value overrides the runtime default.

runtime:
  provider: copilot
  default_reasoning_effort: medium

agents:
  - name: explainer
    model: gpt-5
    prompt: \"Explain this algorithm.\"
    # inherits runtime default 'medium'

  - name: architect
    model: gpt-5
    reasoning:
      effort: high      # override
    prompt: \"Design a system architecture.\"

Allowed values: `low`, `medium`, `high`, `xhigh`.

Per-provider translation

Effort	Copilot	Claude (`thinking.budget_tokens`)
low	`reasoning_effort="low"`	2048
medium	`reasoning_effort="medium"`	8192
high	`reasoning_effort="high"`	16384
xhigh	`reasoning_effort="xhigh"`	32768

Copilot

Forwards `reasoning_effort` to `CopilotClient.create_session`.
Validates against the model's `supportedReasoningEfforts` (from
`list_models()`) and raises `ValidationError` with a clear message
when the model does not support the requested effort.
Skipped in mock-handler mode.
Retry loops fixed to not swallow `ValidationError`.

Claude

Enables extended thinking via
`messages.create(thinking={"type":"enabled","budget_tokens":N})`.
Auto-coerces `temperature=1.0` (Anthropic API requirement) and bumps
`max_tokens` to fit `budget+4096` (capped at 64000 for thinking-
enabled requests).
Validates against thinking-capable model prefixes (`claude-3-7-`,
`claude-opus-4`, `claude-sonnet-4`, `claude-haiku-4`) and raises
`ValidationError` for non-thinking models.
Surfaces thinking content blocks via the existing `agent_reasoning`
event (provider parity with Copilot's `assistant.reasoning`).

Provider parity

Both providers:

Accept the same `reasoning.effort` field with identical semantics.
Validate model support and raise a consistent `ValidationError` when
unsupported.
Emit `agent_reasoning` events from any reasoning content the model
produces, so the dashboard, JSONL logger, and console subscriber
render it consistently.

Schema

`AgentDef.reasoning: ReasoningConfig | None` (new `ReasoningConfig`
with single `effort` field).
`RuntimeConfig.default_reasoning_effort: Literal[...] | None`.
Forbidden on `script`, `human_gate`, and sub-workflow agent types
(parity with how `model` and `retry` are handled).

Files changed

`src/conductor/config/schema.py` — new `ReasoningConfig`,
`AgentDef.reasoning`, `RuntimeConfig.default_reasoning_effort`,
forbid rules.
`src/conductor/providers/reasoning.py` (new) — shared effort→budget
mapping, Claude thinking-model prefix allow-list, agent/runtime
resolver.
`src/conductor/providers/copilot.py` — forward `reasoning_effort`,
validate against `supportedReasoningEfforts`, retry loop fix.
`src/conductor/providers/claude.py` — forward `thinking` kwarg at
every `messages.create` site, auto-coerce temperature/max_tokens,
emit `agent_reasoning` from thinking blocks.
`src/conductor/providers/factory.py` — typed `ReasoningEffort`
forwarding to both providers.
`examples/reasoning-effort.yaml` (new) — demonstrates inheritance
and per-agent override.
`AGENTS.md` — key pattern + provider parity bullet.

Tests

45 new tests:

21 schema cases (parametrized): accept/reject effort values, runtime
default, per-type forbid rules.
7 Copilot cases: `reasoning_effort` forwarded; runtime default and
per-agent precedence; key absent when unset; `ValidationError` on
unsupported effort; mock-mode skip.
13 Claude cases: `thinking` kwarg shape; effort→budget mapping;
`temperature` coerced to 1.0; `max_tokens` bumped; `ValidationError`
on non-thinking model; runtime/per-agent precedence; thinking blocks
emit `agent_reasoning`; key absent when unset.
4 e2e + factory wiring cases.

Verification

`make lint` ✅
`make typecheck` ✅ (only a pre-existing unrelated warning in
`dialog_evaluator.py` remains)
`make validate-examples` ✅ (all 14 examples)
`make test` ✅ — 2330 passed, 9 skipped

Out of scope

Anthropic `display: summarized | omitted` flag.
Anthropic `adaptive` thinking type.
Mid-workflow dynamic effort adjustment.

🤖 Generated with Copilot CLI

Adds an optional unified reasoning.effort field at both the runtime default level (runtime.default_reasoning_effort) and per-agent level (reasoning.effort), with values low | medium | high | xhigh. Per-agent overrides the runtime default. Each provider translates the unified value to its native API: - Copilot: passes reasoning_effort to CopilotClient.create_session. Validates against the model's supportedReasoningEfforts (from list_models()) and raises ValidationError with a clear message when the model does not support the requested effort. Skipped in mock- handler mode. - Claude: enables extended thinking via messages.create(thinking={...}). Effort to budget mapping: low=2048, medium=8192, high=16384, xhigh=32768 tokens. Auto-coerces temperature to 1.0 and bumps max_tokens to fit budget+4096 (capped at 64000 for thinking-enabled requests). Validates against thinking-capable model prefixes (claude-3-7-, claude-opus-4, claude-sonnet-4, claude-haiku-4). Surfaces thinking content blocks via the existing agent_reasoning event callback (provider parity with Copilot's assistant.reasoning). Provider parity is preserved: both providers accept the same field with identical semantics, raise ValidationError consistently when the model does not support reasoning, and emit agent_reasoning events. New shared helper module src/conductor/providers/reasoning.py centralizes the effort-to-budget mapping, the Claude model prefix allow-list, and the agent/runtime resolution logic. Schema validators forbid reasoning on script, human_gate, and sub-workflow agent types (parity with how 'model' and 'retry' are handled). Includes: - Schema tests (21 cases, parametrized) - Copilot provider tests (7 cases) - Claude provider tests (13 cases) - E2E workflow + factory wiring tests (4 cases) - examples/reasoning-effort.yaml demonstrating runtime default and per-agent override - AGENTS.md updated with reasoning effort key pattern and provider parity bullet Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore: release 0.1.12 Bumps version to 0.1.12 and updates CHANGELOG with the four PRs merged since v0.1.11: - #149: Windows install diagnostics - #151: Tag-based registry versioning with # ref syntax - #152: Unified reasoning.effort configuration - #153: Dashboard layout fix for human_gate options + loop-backs Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * chore: add #155 (Windows update reliability) to 0.1.12 changelog Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

jrob5756 force-pushed the feat/reasoning-effort branch from e5ee905 to 867cb6e Compare May 6, 2026 00:17

jrob5756 merged commit 6de42e4 into main May 6, 2026
7 checks passed

jrob5756 deleted the feat/reasoning-effort branch May 6, 2026 00:54

jrob5756 mentioned this pull request May 6, 2026

chore: release 0.1.12 #154

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(reasoning): add reasoning effort configuration to providers#152

feat(reasoning): add reasoning effort configuration to providers#152
jrob5756 merged 1 commit intomainfrom
feat/reasoning-effort

jrob5756 commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jrob5756 commented May 5, 2026

Summary

Per-provider translation

Copilot

Claude

Provider parity

Schema

Files changed

Tests

Verification

Out of scope

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant