Skip to content

Distinguish CCA vs CLI baseline-build expectations#127586

Open
steveisok wants to merge 1 commit intodotnet:mainfrom
steveisok:more-baseline-balance
Open

Distinguish CCA vs CLI baseline-build expectations#127586
steveisok wants to merge 1 commit intodotnet:mainfrom
steveisok:more-baseline-balance

Conversation

@steveisok
Copy link
Copy Markdown
Member

Why

The Baseline Build section in copilot-instructions.md was written with a single, uniformly strict rule: 'You MUST complete a baseline build BEFORE making any code changes.' That language is correct for CCA, where the environment is a fresh sandbox with no pre-existing artifacts and skipping the baseline causes confusing 'missing testhost' / 'shared framework' failures 20+ minutes into the task.

But the same language can end up adding unnecessary churn in CLI (interactive) use:

  • A developer's workspace often already has a recent baseline for the component they're working on. Re-running a 40-minute build for every task is pure waste.
  • The strict 'MUST baseline first' wording pushes a CLI-driving agent to do exactly that — kicking off a 40-minute build before touching any code, even when the existing artifacts would have worked fine.
  • The baseline instructions assume you are targeting your host configuration. We have plenty of cross build scenarios where this assumption does not hold.
  • The original wording also gave no recovery path: if a CLI session did skip the baseline and later hit a missing-testhost error, the rule offered no guidance other than 'should have run it first.'

The two surfaces have genuinely different needs:

  • CCA: fresh sandbox, no artifacts, no human nearby, skipping is catastrophic — needs forceful, no-exceptions language.
  • CLI: existing workspace state, human or driving agent in the loop, probe-and-fail-cheap is feasible — needs flexibility and a deterministic fallback.

A single uniformly-strict or uniformly-soft rule mis-serves one of them.

What changes

Split the Baseline Build section into two mode-specific subsections with strictness reflected in the headings:

  • 'When running under CCA — MANDATORY' keeps the original forceful language (MUST, BEFORE, no exceptions, STOP on failure, 'IS a task failure') so a CCA-mode model cannot rationalize skipping.
  • 'When running under CLI (interactive) — flexible' introduces a probe-and-fall-back rule that works for both human users and local agents driving the CLI:
    1. Check the component's sentinel artifact under artifacts/. If missing, baseline.
    2. Otherwise attempt the incremental work; on a documented baseline-missing error, baseline once and retry. No looping.
    3. Honor explicit user signals ('just built' / 'fresh checkout').

A default-to-strict tiebreaker ('If you're uncertain which mode you're in, follow the CCA rule') prevents a misclassified mode from skipping the baseline.

To make the CLI rule operational, every Component-Specific Workflow (Libraries, CoreCLR, Mono, WASM Libraries, Host, Tools, Build Tasks, Runtime Tests) now lists a concrete Baseline sentinel path under artifacts/ that the model can ls in a single command.

Step 2's 'clean working tree' guidance is also softened to acknowledge both the baseline-up-front case (clean HEAD required) and the baseline-after-probe case (stash work-in-progress or accept that the baseline incorporates it).

Net effect

  • CCA behavior is unchanged: same up-front mandatory baseline, same forceful language, same stop-on-failure.
  • CLI behavior gains permission to skip a 40-minute baseline when the workspace already has one, with a deterministic fallback if the skip turns out to be wrong.

## Why

The Baseline Build section in copilot-instructions.md was written with a
single, uniformly strict rule: 'You MUST complete a baseline build
BEFORE making any code changes.' That language is correct for CCA, where
the environment is a fresh sandbox with no pre-existing artifacts and
skipping the baseline causes confusing 'missing testhost' / 'shared
framework' failures 20+ minutes into the task.

But the same language is actively harmful in CLI (interactive) use:

- A developer's workspace almost always already has a recent baseline
  for the component they're working on. Re-running a 40-minute build
  for every task is pure waste.
- The strict 'MUST baseline first' wording pushes a CLI-driving agent
  to do exactly that — kicking off a 40-minute build before touching
  any code, even when the existing artifacts would have worked fine.
- The original wording also gave no recovery path: if a CLI session
  did skip the baseline and later hit a missing-testhost error, the
  rule offered no guidance other than 'should have run it first.'

The two surfaces have genuinely different needs:

- CCA: fresh sandbox, no artifacts, no human nearby, skipping is
  catastrophic — needs forceful, no-exceptions language.
- CLI: existing workspace state, human or driving agent in the loop,
  probe-and-fail-cheap is feasible — needs flexibility and a
  deterministic fallback.

A single uniformly-strict or uniformly-soft rule mis-serves one of them.

## What changes

Split the Baseline Build section into two mode-specific subsections
with strictness reflected in the headings:

- 'When running under CCA — MANDATORY' keeps the original forceful
  language (MUST, BEFORE, no exceptions, STOP on failure, 'IS a task
  failure') so a CCA-mode model cannot rationalize skipping.
- 'When running under CLI (interactive) — flexible' introduces a
  probe-and-fall-back rule that works for both human users and local
  agents driving the CLI:
    1. Check the component's sentinel artifact under artifacts/. If
       missing, baseline.
    2. Otherwise attempt the incremental work; on a documented
       baseline-missing error, baseline once and retry. No looping.
    3. Honor explicit user signals ('just built' / 'fresh checkout').

A default-to-strict tiebreaker ('If you're uncertain which mode you're
in, follow the CCA rule') prevents a misclassified mode from skipping
the baseline.

To make the CLI rule operational, every Component-Specific Workflow
(Libraries, CoreCLR, Mono, WASM Libraries, Host, Tools, Build Tasks,
Runtime Tests) now lists a concrete Baseline sentinel path under
artifacts/ that the model can ls in a single command.

Step 2's 'clean working tree' guidance is also softened to acknowledge
both the baseline-up-front case (clean HEAD required) and the
baseline-after-probe case (stash work-in-progress or accept that the
baseline incorporates it).

## Net effect

- CCA behavior is unchanged: same up-front mandatory baseline, same
  forceful language, same stop-on-failure.
- CLI behavior gains permission to skip a 40-minute baseline when the
  workspace already has one, with a deterministic fallback if the
  skip turns out to be wrong.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 29, 2026 21:56

This comment was marked as resolved.

@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @dotnet/runtime-infrastructure
See info in area-owners.md if you want to be subscribed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants