Distinguish CCA vs CLI baseline-build expectations by steveisok · Pull Request #127586 · dotnet/runtime

steveisok · 2026-04-29T21:56:27Z

Why

The Baseline Build section in copilot-instructions.md was written with a single, uniformly strict rule: 'You MUST complete a baseline build BEFORE making any code changes.' That language is correct for CCA, where the environment is a fresh sandbox with no pre-existing artifacts and skipping the baseline causes confusing 'missing testhost' / 'shared framework' failures 20+ minutes into the task.

But the same language can end up adding unnecessary churn in CLI (interactive) use:

A developer's workspace often already has a recent baseline for the component they're working on. Re-running a 40-minute build for every task is pure waste.
The strict 'MUST baseline first' wording pushes a CLI-driving agent to do exactly that — kicking off a 40-minute build before touching any code, even when the existing artifacts would have worked fine.
The baseline instructions assume you are targeting your host configuration. We have plenty of cross build scenarios where this assumption does not hold.
The original wording also gave no recovery path: if a CLI session did skip the baseline and later hit a missing-testhost error, the rule offered no guidance other than 'should have run it first.'

The two surfaces have genuinely different needs:

CCA: fresh sandbox, no artifacts, no human nearby, skipping is catastrophic — needs forceful, no-exceptions language.
CLI: existing workspace state, human or driving agent in the loop, probe-and-fail-cheap is feasible — needs flexibility and a deterministic fallback.

A single uniformly-strict or uniformly-soft rule mis-serves one of them.

What changes

Split the Baseline Build section into two mode-specific subsections with strictness reflected in the headings:

'When running under CCA — MANDATORY' keeps the original forceful language (MUST, BEFORE, no exceptions, STOP on failure, 'IS a task failure') so a CCA-mode model cannot rationalize skipping.
'When running under CLI (interactive) — flexible' introduces a probe-and-fall-back rule that works for both human users and local agents driving the CLI:
1. Check the component's sentinel artifact under artifacts/. If missing, baseline.
2. Otherwise attempt the incremental work; on a documented baseline-missing error, baseline once and retry. No looping.
3. Honor explicit user signals ('just built' / 'fresh checkout').

A default-to-strict tiebreaker ('If you're uncertain which mode you're in, follow the CCA rule') prevents a misclassified mode from skipping the baseline.

To make the CLI rule operational, every Component-Specific Workflow (Libraries, CoreCLR, Mono, WASM Libraries, Host, Tools, Build Tasks, Runtime Tests) now lists a concrete Baseline sentinel path under artifacts/ that the model can ls in a single command.

Step 2's 'clean working tree' guidance is also softened to acknowledge both the baseline-up-front case (clean HEAD required) and the baseline-after-probe case (stash work-in-progress or accept that the baseline incorporates it).

Net effect

CCA behavior is unchanged: same up-front mandatory baseline, same forceful language, same stop-on-failure.
CLI behavior gains permission to skip a 40-minute baseline when the workspace already has one, with a deterministic fallback if the skip turns out to be wrong.

## Why The Baseline Build section in copilot-instructions.md was written with a single, uniformly strict rule: 'You MUST complete a baseline build BEFORE making any code changes.' That language is correct for CCA, where the environment is a fresh sandbox with no pre-existing artifacts and skipping the baseline causes confusing 'missing testhost' / 'shared framework' failures 20+ minutes into the task. But the same language is actively harmful in CLI (interactive) use: - A developer's workspace almost always already has a recent baseline for the component they're working on. Re-running a 40-minute build for every task is pure waste. - The strict 'MUST baseline first' wording pushes a CLI-driving agent to do exactly that — kicking off a 40-minute build before touching any code, even when the existing artifacts would have worked fine. - The original wording also gave no recovery path: if a CLI session did skip the baseline and later hit a missing-testhost error, the rule offered no guidance other than 'should have run it first.' The two surfaces have genuinely different needs: - CCA: fresh sandbox, no artifacts, no human nearby, skipping is catastrophic — needs forceful, no-exceptions language. - CLI: existing workspace state, human or driving agent in the loop, probe-and-fail-cheap is feasible — needs flexibility and a deterministic fallback. A single uniformly-strict or uniformly-soft rule mis-serves one of them. ## What changes Split the Baseline Build section into two mode-specific subsections with strictness reflected in the headings: - 'When running under CCA — MANDATORY' keeps the original forceful language (MUST, BEFORE, no exceptions, STOP on failure, 'IS a task failure') so a CCA-mode model cannot rationalize skipping. - 'When running under CLI (interactive) — flexible' introduces a probe-and-fall-back rule that works for both human users and local agents driving the CLI: 1. Check the component's sentinel artifact under artifacts/. If missing, baseline. 2. Otherwise attempt the incremental work; on a documented baseline-missing error, baseline once and retry. No looping. 3. Honor explicit user signals ('just built' / 'fresh checkout'). A default-to-strict tiebreaker ('If you're uncertain which mode you're in, follow the CCA rule') prevents a misclassified mode from skipping the baseline. To make the CLI rule operational, every Component-Specific Workflow (Libraries, CoreCLR, Mono, WASM Libraries, Host, Tools, Build Tasks, Runtime Tests) now lists a concrete Baseline sentinel path under artifacts/ that the model can ls in a single command. Step 2's 'clean working tree' guidance is also softened to acknowledge both the baseline-up-front case (clean HEAD required) and the baseline-after-probe case (stash work-in-progress or accept that the baseline incorporates it). ## Net effect - CCA behavior is unchanged: same up-front mandatory baseline, same forceful language, same stop-on-failure. - CLI behavior gains permission to skip a 40-minute baseline when the workspace already has one, with a deterministic fallback if the skip turns out to be wrong. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

dotnet-policy-service · 2026-04-29T21:57:51Z

Tagging subscribers to this area: @dotnet/runtime-infrastructure
See info in area-owners.md if you want to be subscribed.

Copilot AI review requested due to automatic review settings April 29, 2026 21:56

This comment was marked as resolved.

Sign in to view

github-actions Bot added the area-Infrastructure label Apr 29, 2026

github-project-automation Bot added this to Runtime Infra Apr 29, 2026

dotnet-policy-service Bot assigned steveisok Apr 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distinguish CCA vs CLI baseline-build expectations#127586

Distinguish CCA vs CLI baseline-build expectations#127586
steveisok wants to merge 1 commit intodotnet:mainfrom
steveisok:more-baseline-balance

steveisok commented Apr 29, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

dotnet-policy-service Bot commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

steveisok commented Apr 29, 2026

Why

What changes

Net effect

Uh oh!

This comment was marked as resolved.

Uh oh!

dotnet-policy-service Bot commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants