WIP 🐛 OCPBUGS-77829: fix(Boxcutter): Fix ClusterExtension spec changes ignored while revision is rolling out#2566
Conversation
✅ Deploy Preview for olmv1 ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
There was a problem hiding this comment.
Pull request overview
Fixes operator-controller (Boxcutter runtime) behavior where a ClusterExtension spec change could be ignored while a prior ClusterExtensionRevision is still rolling out, by re-resolving from the catalog when the rolling-out revision no longer matches the current spec.
Changes:
- Update
ResolveBundleto reuse the rolling-out revision only when it still matches the current package/version constraints; otherwise re-resolve from the catalog. - Add
rollingOutRevisionMatchesSpechelper to compare rolling-out revision metadata against spec version constraints. - Add unit and E2E coverage for “spec change while rolling out” behavior.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
internal/operator-controller/controllers/clusterextension_reconcile_steps.go |
Implements spec-vs-rolling-out compatibility check and re-resolve behavior. |
internal/operator-controller/controllers/clusterextension_controller_test.go |
Updates an existing Boxcutter test fixture and adds 2 new tests for re-resolve vs reuse behavior. |
test/e2e/features/update.feature |
Adds E2E scenario verifying version change during a stuck rollout triggers re-resolution. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
b31a1f4 to
d31cbc3
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2566 +/- ##
==========================================
- Coverage 67.81% 67.75% -0.06%
==========================================
Files 137 137
Lines 9526 9639 +113
==========================================
+ Hits 6460 6531 +71
- Misses 2572 2606 +34
- Partials 494 502 +8
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
pedjak
left a comment
There was a problem hiding this comment.
/lgtm
IMHO, the added e2e test could be made faster.
d31cbc3 to
ca77804
Compare
|
New changes are detected. LGTM label has been removed. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
392ae57 to
d952dcf
Compare
259bebc to
9c7214a
Compare
d2f9541 to
e75a515
Compare
e75a515 to
80cb1ed
Compare
80cb1ed to
109c791
Compare
There was a problem hiding this comment.
Pull request overview
This PR fixes Boxcutter’s rollout behavior so ClusterExtension catalog-spec changes made during an in-progress (or stuck) rollout are detected and trigger re-resolution, rather than continuing to reuse a stale rolling-out revision.
Changes:
- Persist a SHA-256 “catalog source spec digest” on each Boxcutter revision and read it back into revision state.
- Reuse an existing rolling-out revision only when its stored digest matches the current
ClusterExtensioncatalog spec; otherwise re-run resolution. - Add controller unit tests + an E2E scenario covering “spec change during stuck rollout triggers re-resolution” and digest normalization behavior.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| test/e2e/features/update.feature | Adds an E2E scenario ensuring version changes during a stuck rollout cause a new resolution and successful rollout. |
| internal/operator-controller/labels/labels.go | Introduces a new annotation key constant for storing the source digest on revisions. |
| internal/operator-controller/controllers/suite_test.go | Adds a test helper to compute digests consistent with CRD defaults in envtest. |
| internal/operator-controller/controllers/clusterextension_reconcile_steps.go | Implements digest computation/normalization and rolling-out revision reuse vs re-resolve logic; improves version constraint matching when deciding fallback behavior. |
| internal/operator-controller/controllers/clusterextension_controller_test.go | Adds extensive unit coverage for re-resolution behavior during rollout and digest normalization. |
| internal/operator-controller/controllers/clusterextension_controller.go | Extends RevisionMetadata to include SourceDigest. |
| internal/operator-controller/controllers/boxcutter_reconcile_steps.go | Plumbs digest from/to ClusterObjectSet annotations (write on apply, read in revision state getter). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
This PR fixes Boxcutter rollout behavior so ClusterExtension spec changes made during an in-progress/stuck rollout are no longer ignored. It does this by persisting a catalog-spec fingerprint per revision and only reusing a rolling-out revision when its fingerprint matches the current spec.
Changes:
- Add a SHA-256 “source digest” (
olm.operatorframework.io/source-digest) annotation to Boxcutter revisions and plumb it through revision state reading. - Reuse a rolling-out revision only when its stored digest matches the current
ClusterExtension.spec.source.cataloginputs; otherwise re-resolve. - Add unit + E2E coverage for “spec changes during rollout triggers re-resolution” and digest normalization.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| test/e2e/features/update.feature | Adds an E2E scenario validating that changing version during a stuck rollout triggers a new resolution. |
| internal/operator-controller/labels/labels.go | Introduces the SourceDigestKey annotation constant for revision fingerprinting. |
| internal/operator-controller/controllers/suite_test.go | Adds a test helper to compute digests consistent with CRD defaulting (for envtest expectations). |
| internal/operator-controller/controllers/clusterextension_reconcile_steps.go | Implements digest computation/normalization and rolling-out revision reuse vs re-resolve decision logic; improves version constraint handling in resolution error fallback. |
| internal/operator-controller/controllers/clusterextension_controller_test.go | Adds unit coverage for re-resolve/reuse behavior during rollout and digest normalization behavior. |
| internal/operator-controller/controllers/clusterextension_controller.go | Extends RevisionMetadata with SourceDigest to carry the fingerprint through controller logic. |
| internal/operator-controller/controllers/boxcutter_reconcile_steps.go | Reads/writes the source digest annotation on ClusterObjectSet revisions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
109c791 to
00db227
Compare
00db227 to
429d19a
Compare
429d19a to
c41b17f
Compare
c41b17f to
6de1bc6
Compare
There was a problem hiding this comment.
Pull request overview
Fixes Boxcutter rollout behavior so ClusterExtension catalog-spec changes made during an in-progress rollout are detected and trigger re-resolution instead of continuing to reuse a stale rolling-out revision.
Changes:
- Add per-revision “source digest” annotation (
olm.operatorframework.io/source-digest) and plumb it through revision state reading. - Reuse a rolling-out revision only when its stored digest matches the current
ClusterExtension.spec.source.catalogfingerprint; otherwise re-resolve. - Add controller-unit tests plus an E2E scenario covering “spec changes during stuck rollout triggers re-resolution” and digest normalization cases.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| test/e2e/features/update.feature | Adds an E2E scenario asserting spec version changes during a stuck rollout cause a new resolution and successful install. |
| internal/operator-controller/labels/labels.go | Introduces the SourceDigestKey annotation constant for persisted revision fingerprints. |
| internal/operator-controller/controllers/suite_test.go | Adds a test helper to compute digests in a way consistent with CRD defaulting. |
| internal/operator-controller/controllers/clusterextension_reconcile_steps.go | Implements digest computation + rolling-out revision reuse vs re-resolve decision logic; improves version constraint matching. |
| internal/operator-controller/controllers/clusterextension_controller_test.go | Adds extensive unit coverage for digest-based re-resolution and selector/channel normalization. |
| internal/operator-controller/controllers/clusterextension_controller.go | Extends RevisionMetadata with SourceDigest to carry the persisted fingerprint. |
| internal/operator-controller/controllers/boxcutter_reconcile_steps.go | Reads/writes the source digest annotation when listing revisions and creating new revisions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| UpgradeConstraintPolicy ocv1.UpgradeConstraintPolicy `json:"u,omitempty"` | ||
| }{ | ||
| PackageName: cat.PackageName, | ||
| Version: cat.Version, | ||
| Channels: channels, | ||
| Selector: normalizeLabelSelector(cat.Selector), | ||
| UpgradeConstraintPolicy: cat.UpgradeConstraintPolicy, | ||
| } |
Implements catalog spec digest tracking to detect spec changes during stuck rollouts, enabling recovery by updating the ClusterExtension spec. **Problem**: When a ClusterExtension rollout gets stuck (e.g., bad image causing probe failures), updating the spec to a different version doesn't trigger re-resolution. The controller reuses the stuck revision indefinitely. **Solution**: Track a digest of resolution-relevant spec fields (packageName, version, channels, selector, upgradeConstraintPolicy) in ClusterObjectSet annotations. When spec changes during rollout, detect digest mismatch and trigger re-resolution to create a new revision. **Changes**: - Add CatalogSpecDigest() to compute SHA-256 hash of catalog spec fields - Store digest in ClusterObjectSet annotation "olm.operatorframework.io/catalog-spec-digest" - Check digest match in reuseRollingOutRevision() before reusing stuck revision - Trigger re-resolution when digest mismatch detected - Add e2e test validation steps to verify digest mechanism - Add comprehensive unit tests for digest computation **Naming**: All code uses consistent "CatalogSpecDigest" pattern (<type>SpecDigest) **Testing**: - Unit tests verify digest changes when spec fields change - E2E test validates digest annotations exist and differ between revisions - Mock tests replicate stuck rollout scenario Fixes: Red Hat BZ OCP-88138
What's wrong
When a revision is rolling out, the controller skips the resolver — even if the CE consumer changes the spec. The CE consumer's only option to recover is manually deleting revisions.
What this PR does
Fixes Boxcutter rollout behavior so ClusterExtension spec updates (during an in-progress/stuck rollout) are no longer ignored, by persisting and comparing a catalog-spec fingerprint per revision.
Changes:
Backward compatibility
Revisions created before this PR have no fingerprint. The controller detects this and falls back to the old behavior until a new revision is created.
RFC: https://docs.google.com/document/d/1qNjdal3zaz7QTGHPAnPRO1aRnaum5iup0rRyiFar8CM/edit?tab=t.0#heading=h.x3tfh25grvnv
Motivated by: https://redhat.atlassian.net/browse/OCPBUGS-77829
PS: :jira-blocker: by thread: https://redhat-internal.slack.com/archives/C06KP34REFJ/p1774251840822449?thread_ts=1773998183.972329&cid=C06KP34REFJ