Skip to content

WIP 🐛 OCPBUGS-77829: fix(Boxcutter): Fix ClusterExtension spec changes ignored while revision is rolling out#2566

Open
camilamacedo86 wants to merge 1 commit intooperator-framework:mainfrom
camilamacedo86:fix-bug-analyse
Open

WIP 🐛 OCPBUGS-77829: fix(Boxcutter): Fix ClusterExtension spec changes ignored while revision is rolling out#2566
camilamacedo86 wants to merge 1 commit intooperator-framework:mainfrom
camilamacedo86:fix-bug-analyse

Conversation

@camilamacedo86
Copy link
Copy Markdown
Contributor

@camilamacedo86 camilamacedo86 commented Mar 16, 2026

What's wrong

When a revision is rolling out, the controller skips the resolver — even if the CE consumer changes the spec. The CE consumer's only option to recover is manually deleting revisions.

What this PR does

Fixes Boxcutter rollout behavior so ClusterExtension spec updates (during an in-progress/stuck rollout) are no longer ignored, by persisting and comparing a catalog-spec fingerprint per revision.

Changes:

  • Add a SHA-256 “source digest” annotation on each Boxcutter revision and plumb it through revision state reading.
  • Reuse a rolling-out revision only when its stored digest matches the current ClusterExtension catalog spec; otherwise re-resolve.
  • Add E2E + controller-unit coverage for “spec changes during rollout triggers re-resolution” and digest normalization behavior.

Backward compatibility

Revisions created before this PR have no fingerprint. The controller detects this and falls back to the old behavior until a new revision is created.

RFC: https://docs.google.com/document/d/1qNjdal3zaz7QTGHPAnPRO1aRnaum5iup0rRyiFar8CM/edit?tab=t.0#heading=h.x3tfh25grvnv
Motivated by: https://redhat.atlassian.net/browse/OCPBUGS-77829

PS: :jira-blocker: by thread: https://redhat-internal.slack.com/archives/C06KP34REFJ/p1774251840822449?thread_ts=1773998183.972329&cid=C06KP34REFJ

Copilot AI review requested due to automatic review settings March 16, 2026 10:29
@netlify
Copy link
Copy Markdown

netlify Bot commented Mar 16, 2026

Deploy Preview for olmv1 ready!

Name Link
🔨 Latest commit c79e8c2
🔍 Latest deploy log https://app.netlify.com/projects/olmv1/deploys/69f46b047fa44700089c9077
😎 Deploy Preview https://deploy-preview-2566--olmv1.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
🤖 Make changes Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes operator-controller (Boxcutter runtime) behavior where a ClusterExtension spec change could be ignored while a prior ClusterExtensionRevision is still rolling out, by re-resolving from the catalog when the rolling-out revision no longer matches the current spec.

Changes:

  • Update ResolveBundle to reuse the rolling-out revision only when it still matches the current package/version constraints; otherwise re-resolve from the catalog.
  • Add rollingOutRevisionMatchesSpec helper to compare rolling-out revision metadata against spec version constraints.
  • Add unit and E2E coverage for “spec change while rolling out” behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
internal/operator-controller/controllers/clusterextension_reconcile_steps.go Implements spec-vs-rolling-out compatibility check and re-resolve behavior.
internal/operator-controller/controllers/clusterextension_controller_test.go Updates an existing Boxcutter test fixture and adds 2 new tests for re-resolve vs reuse behavior.
test/e2e/features/update.feature Adds E2E scenario verifying version change during a stuck rollout triggers re-resolution.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment thread internal/operator-controller/controllers/clusterextension_reconcile_steps.go Outdated
Comment thread internal/operator-controller/controllers/clusterextension_reconcile_steps.go Outdated
Comment thread internal/operator-controller/controllers/clusterextension_controller_test.go Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 16, 2026

Codecov Report

❌ Patch coverage is 70.64220% with 32 lines in your changes missing coverage. Please review.
✅ Project coverage is 67.75%. Comparing base (eea6f95) to head (0f23c6c).
⚠️ Report is 97 commits behind head on main.

Files with missing lines Patch % Lines
...er/controllers/clusterextension_reconcile_steps.go 68.93% 25 Missing and 7 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2566      +/-   ##
==========================================
- Coverage   67.81%   67.75%   -0.06%     
==========================================
  Files         137      137              
  Lines        9526     9639     +113     
==========================================
+ Hits         6460     6531      +71     
- Misses       2572     2606      +34     
- Partials      494      502       +8     
Flag Coverage Δ
e2e 38.37% <34.86%> (+0.22%) ⬆️
experimental-e2e 51.07% <58.71%> (-0.03%) ⬇️
unit 52.95% <66.05%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@pedjak pedjak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

IMHO, the added e2e test could be made faster.

Comment thread test/e2e/features/update.feature
@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Mar 16, 2026
Copilot AI review requested due to automatic review settings March 16, 2026 17:31
@openshift-ci openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label Mar 16, 2026
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Mar 16, 2026

New changes are detected. LGTM label has been removed.

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Mar 16, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from pedjak. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

This comment was marked as outdated.

@camilamacedo86 camilamacedo86 changed the title 🐛 fix(Boxcutter): Fix ClusterExtension spec changes ignored while revision is rolling out WIP 🐛 fix(Boxcutter): Fix ClusterExtension spec changes ignored while revision is rolling out Mar 16, 2026
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 16, 2026
@camilamacedo86 camilamacedo86 force-pushed the fix-bug-analyse branch 2 times, most recently from 392ae57 to d952dcf Compare March 17, 2026 14:42
@camilamacedo86 camilamacedo86 changed the title WIP 🐛 fix(Boxcutter): Fix ClusterExtension spec changes ignored while revision is rolling out 🐛 fix(Boxcutter): Fix ClusterExtension spec changes ignored while revision is rolling out Mar 17, 2026
@camilamacedo86 camilamacedo86 requested a review from Copilot March 17, 2026 14:45
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 17, 2026

This comment was marked as outdated.

@camilamacedo86 camilamacedo86 force-pushed the fix-bug-analyse branch 2 times, most recently from 259bebc to 9c7214a Compare March 17, 2026 15:06
Copilot AI review requested due to automatic review settings March 17, 2026 15:06
Copilot AI review requested due to automatic review settings April 29, 2026 15:27

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes Boxcutter’s rollout behavior so ClusterExtension catalog-spec changes made during an in-progress (or stuck) rollout are detected and trigger re-resolution, rather than continuing to reuse a stale rolling-out revision.

Changes:

  • Persist a SHA-256 “catalog source spec digest” on each Boxcutter revision and read it back into revision state.
  • Reuse an existing rolling-out revision only when its stored digest matches the current ClusterExtension catalog spec; otherwise re-run resolution.
  • Add controller unit tests + an E2E scenario covering “spec change during stuck rollout triggers re-resolution” and digest normalization behavior.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
test/e2e/features/update.feature Adds an E2E scenario ensuring version changes during a stuck rollout cause a new resolution and successful rollout.
internal/operator-controller/labels/labels.go Introduces a new annotation key constant for storing the source digest on revisions.
internal/operator-controller/controllers/suite_test.go Adds a test helper to compute digests consistent with CRD defaults in envtest.
internal/operator-controller/controllers/clusterextension_reconcile_steps.go Implements digest computation/normalization and rolling-out revision reuse vs re-resolve logic; improves version constraint matching when deciding fallback behavior.
internal/operator-controller/controllers/clusterextension_controller_test.go Adds extensive unit coverage for re-resolution behavior during rollout and digest normalization.
internal/operator-controller/controllers/clusterextension_controller.go Extends RevisionMetadata to include SourceDigest.
internal/operator-controller/controllers/boxcutter_reconcile_steps.go Plumbs digest from/to ClusterObjectSet annotations (write on apply, read in revision state getter).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread internal/operator-controller/labels/labels.go Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes Boxcutter rollout behavior so ClusterExtension spec changes made during an in-progress/stuck rollout are no longer ignored. It does this by persisting a catalog-spec fingerprint per revision and only reusing a rolling-out revision when its fingerprint matches the current spec.

Changes:

  • Add a SHA-256 “source digest” (olm.operatorframework.io/source-digest) annotation to Boxcutter revisions and plumb it through revision state reading.
  • Reuse a rolling-out revision only when its stored digest matches the current ClusterExtension.spec.source.catalog inputs; otherwise re-resolve.
  • Add unit + E2E coverage for “spec changes during rollout triggers re-resolution” and digest normalization.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file
File Description
test/e2e/features/update.feature Adds an E2E scenario validating that changing version during a stuck rollout triggers a new resolution.
internal/operator-controller/labels/labels.go Introduces the SourceDigestKey annotation constant for revision fingerprinting.
internal/operator-controller/controllers/suite_test.go Adds a test helper to compute digests consistent with CRD defaulting (for envtest expectations).
internal/operator-controller/controllers/clusterextension_reconcile_steps.go Implements digest computation/normalization and rolling-out revision reuse vs re-resolve decision logic; improves version constraint handling in resolution error fallback.
internal/operator-controller/controllers/clusterextension_controller_test.go Adds unit coverage for re-resolve/reuse behavior during rollout and digest normalization behavior.
internal/operator-controller/controllers/clusterextension_controller.go Extends RevisionMetadata with SourceDigest to carry the fingerprint through controller logic.
internal/operator-controller/controllers/boxcutter_reconcile_steps.go Reads/writes the source digest annotation on ClusterObjectSet revisions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings April 30, 2026 14:41

This comment was marked as outdated.

@camilamacedo86 camilamacedo86 changed the title 🐛 OCPBUGS-77829: fix(Boxcutter): Fix ClusterExtension spec changes ignored while revision is rolling out WIP 🐛 OCPBUGS-77829: fix(Boxcutter): Fix ClusterExtension spec changes ignored while revision is rolling out Apr 30, 2026
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 30, 2026
Copilot AI review requested due to automatic review settings April 30, 2026 16:37
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes Boxcutter rollout behavior so ClusterExtension catalog-spec changes made during an in-progress rollout are detected and trigger re-resolution instead of continuing to reuse a stale rolling-out revision.

Changes:

  • Add per-revision “source digest” annotation (olm.operatorframework.io/source-digest) and plumb it through revision state reading.
  • Reuse a rolling-out revision only when its stored digest matches the current ClusterExtension.spec.source.catalog fingerprint; otherwise re-resolve.
  • Add controller-unit tests plus an E2E scenario covering “spec changes during stuck rollout triggers re-resolution” and digest normalization cases.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
test/e2e/features/update.feature Adds an E2E scenario asserting spec version changes during a stuck rollout cause a new resolution and successful install.
internal/operator-controller/labels/labels.go Introduces the SourceDigestKey annotation constant for persisted revision fingerprints.
internal/operator-controller/controllers/suite_test.go Adds a test helper to compute digests in a way consistent with CRD defaulting.
internal/operator-controller/controllers/clusterextension_reconcile_steps.go Implements digest computation + rolling-out revision reuse vs re-resolve decision logic; improves version constraint matching.
internal/operator-controller/controllers/clusterextension_controller_test.go Adds extensive unit coverage for digest-based re-resolution and selector/channel normalization.
internal/operator-controller/controllers/clusterextension_controller.go Extends RevisionMetadata with SourceDigest to carry the persisted fingerprint.
internal/operator-controller/controllers/boxcutter_reconcile_steps.go Reads/writes the source digest annotation when listing revisions and creating new revisions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +329 to +336
UpgradeConstraintPolicy ocv1.UpgradeConstraintPolicy `json:"u,omitempty"`
}{
PackageName: cat.PackageName,
Version: cat.Version,
Channels: channels,
Selector: normalizeLabelSelector(cat.Selector),
UpgradeConstraintPolicy: cat.UpgradeConstraintPolicy,
}
Implements catalog spec digest tracking to detect spec changes during stuck
rollouts, enabling recovery by updating the ClusterExtension spec.

**Problem**: When a ClusterExtension rollout gets stuck (e.g., bad image causing
probe failures), updating the spec to a different version doesn't trigger
re-resolution. The controller reuses the stuck revision indefinitely.

**Solution**: Track a digest of resolution-relevant spec fields
(packageName, version, channels, selector, upgradeConstraintPolicy) in
ClusterObjectSet annotations. When spec changes during rollout, detect
digest mismatch and trigger re-resolution to create a new revision.

**Changes**:
- Add CatalogSpecDigest() to compute SHA-256 hash of catalog spec fields
- Store digest in ClusterObjectSet annotation "olm.operatorframework.io/catalog-spec-digest"
- Check digest match in reuseRollingOutRevision() before reusing stuck revision
- Trigger re-resolution when digest mismatch detected
- Add e2e test validation steps to verify digest mechanism
- Add comprehensive unit tests for digest computation

**Naming**: All code uses consistent "CatalogSpecDigest" pattern (<type>SpecDigest)

**Testing**:
- Unit tests verify digest changes when spec fields change
- E2E test validates digest annotations exist and differ between revisions
- Mock tests replicate stuck rollout scenario

Fixes: Red Hat BZ OCP-88138
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants