-
Notifications
You must be signed in to change notification settings - Fork 71
✨ Shiftweek playground #2503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
✨ Shiftweek playground #2503
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
✅ Deploy Preview for olmv1 ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
/hold not to be merged |
8d747c8 to
16d7dda
Compare
…el status reporting Consolidate the three separate revision conditions (Available, Progressing, Succeeded) into a single "Ready" condition with granular reason codes (Ready, Reconciling, ProbeFailure, ValidationFailed, ObjectCollision, ProgressDeadlineExceeded, Transitioning, RollingOut, Archived). This simplifies the revision status model while providing more specific failure information. Key changes: - Replace Available/Progressing/Succeeded conditions with a unified Ready condition on ClusterExtensionRevision - Add per-phase status reporting (phaseStatuses) with states: Applied, Progressing, Failed, Pending, Transitioning - including failing probe details per object - Move progress deadline checking from the revision controller to the ClusterExtension reconcile steps, tracking deadlines via sync.Map per ClusterExtension UID - Remove progressDeadlineMinutes from ClusterExtensionRevision spec (now driven by ClusterExtension spec only) - Refactor RevisionStates from a struct with Installed/RollingOut fields to a sorted slice with Installed()/RollingOut()/Latest() accessor methods, adding State and ClusterExtensionGeneration fields to RevisionMetadata - Track ClusterExtension generation in revision annotations to enable generation-aware resolution skipping during active rollouts - Add ServiceAccount watch to revision controller for parent ClusterExtension - Update CRD printer columns from Available/Progressing to Ready/Reason - Add test catalog entries (test-operator 1.0.2 and 1.0.3) - Update all tests and generated manifests/CRDs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR is an experimental playground for significant architectural changes to OLM's ClusterExtensionRevision status model. The changes aim to simplify the status conditions and move certain configuration from the revision level to the extension level.
Changes:
- Consolidates three ClusterExtensionRevision conditions (Available, Progressing, Succeeded) into a single
Readycondition with multiple reasons - Moves
ProgressDeadlineMinutesconfiguration from ClusterExtensionRevision to ClusterExtension level - Refactors
RevisionStatesfrom a struct with separate Installed/RollingOut pointers to a flat slice with helper methods - Adds ClusterExtension generation tracking on revisions via annotations to enable spec change detection
- Introduces detailed
phaseStatusesfield with per-phase state, probe failures, and transition tracking
Reviewed changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| api/v1/clusterextensionrevision_types.go | Defines new Ready condition with comprehensive reasons, adds phaseStatuses types, removes progressDeadlineMinutes field |
| api/v1/clusterextension_types.go | Minor formatting cleanup of constant definitions |
| api/v1/validation_test.go | Removes tests for progressDeadlineMinutes validation (moved to ClusterExtension level) |
| api/v1/zz_generated.deepcopy.go | Adds generated DeepCopy methods for new phaseStatuses types |
| internal/operator-controller/labels/labels.go | Adds ClusterExtensionGenerationKey constant for tracking extension generation |
| internal/operator-controller/controllers/clusterextension_controller.go | Refactors RevisionStates to slice type with helper methods, adds generation tracking to both Helm and Boxcutter getters |
| internal/operator-controller/controllers/clusterextension_reconcile_steps.go | Implements CheckProgressDeadline step at ClusterExtension level, adds generation-based resolution skip logic |
| internal/operator-controller/controllers/clusterextensionrevision_controller.go | Replaces multiple condition types with single Ready condition, removes revision-level progress deadline logic, adds comprehensive phase status tracking, adds ServiceAccount watch |
| internal/operator-controller/controllers/boxcutter_reconcile_steps.go | Updates revision state getter to use Ready condition, implements new Progressing condition logic based on revision states |
| internal/operator-controller/controllers/common_controller.go | Updates to use new RevisionStates slice API and Ready condition |
| internal/operator-controller/controllers/*_test.go | Updates all tests to use new condition names and RevisionStates API |
| internal/operator-controller/applier/boxcutter.go | Removes progressDeadlineMinutes propagation, updates migration status to use Ready condition |
| cmd/operator-controller/main.go | Adds CheckProgressDeadline step to both reconciler configurations |
| manifests/*.yaml | Updates CRDs with new condition documentation and phaseStatuses schema |
| helm/olmv1/base/operator-controller/crd/experimental/*.yaml | Mirrors CRD changes in Helm charts |
| testdata/images/catalogs/test-catalog/v1/configs/catalog.yaml | Adds intermediate test bundle versions for testing |
Comments suppressed due to low confidence (2)
internal/operator-controller/controllers/suite_test.go:66
- When m.Err is not nil, returning
nilas the first parameter creates a nil RevisionStates slice. However, the actual GetRevisionStates implementations (BoxcutterRevisionStatesGetter and HelmRevisionStatesGetter) returnnilas the error value when successful, not as the RevisionStates value. This inconsistency could cause issues where nil RevisionStates is returned on error but the error handling code might expect an empty slice instead. Consider returning an empty RevisionStates{} instead of nil for consistency with the type being a slice.
func (m *MockRevisionStatesGetter) GetRevisionStates(ctx context.Context, ext *ocv1.ClusterExtension) (controllers.RevisionStates, error) {
if m.Err != nil {
return nil, m.Err
}
return m.RevisionStates, nil
internal/operator-controller/controllers/clusterextensionrevision_controller_test.go:885
- The test case slice for Test_ClusterExtensionRevisionReconciler_Reconcile_ProgressDeadline is empty (line 853:
}{}). This test function was likely meant to contain test cases for progress deadline functionality that was moved from ClusterExtensionRevision to ClusterExtension level. Either remove this empty test function or add appropriate test cases for any remaining revision-level progress deadline behavior.
func Test_ClusterExtensionRevisionReconciler_Reconcile_ProgressDeadline(t *testing.T) {
const (
clusterExtensionRevisionName = "test-ext-1"
)
testScheme := newScheme(t)
require.NoError(t, corev1.AddToScheme(testScheme))
for _, tc := range []struct {
name string
existingObjs func() []client.Object
revisionResult machinery.RevisionResult
validate func(*testing.T, client.Client)
reconcileErr error
reconcileResult ctrl.Result
}{} {
t.Run(tc.name, func(t *testing.T) {
// create extension and cluster extension
testClient := fake.NewClientBuilder().
WithScheme(testScheme).
WithStatusSubresource(&ocv1.ClusterExtensionRevision{}).
WithObjects(tc.existingObjs()...).
Build()
// reconcile cluster extension revision
mockEngine := &mockRevisionEngine{
reconcile: func(ctx context.Context, rev machinerytypes.Revision, opts ...machinerytypes.RevisionReconcileOption) (machinery.RevisionResult, error) {
return tc.revisionResult, nil
},
}
result, err := (&controllers.ClusterExtensionRevisionReconciler{
Client: testClient,
RevisionEngineFactory: &mockRevisionEngineFactory{engine: mockEngine},
TrackingCache: &mockTrackingCache{
client: testClient,
},
}).Reconcile(t.Context(), ctrl.Request{
NamespacedName: types.NamespacedName{
Name: clusterExtensionRevisionName,
},
})
require.Equal(t, tc.reconcileResult, result)
require.Equal(t, tc.reconcileErr, err)
tc.validate(t, testClient)
})
}
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -108,16 +111,16 @@ func Test_ClusterExtensionRevisionReconciler_Reconcile_RevisionReconciliation(t | |||
| Name: clusterExtensionRevisionName, | |||
| }, rev) | |||
| require.NoError(t, err) | |||
| cond := meta.FindStatusCondition(rev.Status.Conditions, ocv1.ClusterExtensionRevisionTypeAvailable) | |||
| cond := meta.FindStatusCondition(rev.Status.Conditions, ocv1.ClusterExtensionRevisionTypeReady) | |||
| require.NotNil(t, cond) | |||
| require.Equal(t, metav1.ConditionUnknown, cond.Status) | |||
| require.Equal(t, metav1.ConditionFalse, cond.Status) | |||
| require.Equal(t, ocv1.ClusterExtensionRevisionReasonReconciling, cond.Reason) | |||
| require.Equal(t, "some error", cond.Message) | |||
| require.Equal(t, int64(1), cond.ObservedGeneration) | |||
| }, | |||
Copilot
AI
Feb 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test name says "Ready condition is updated to Unknown on error if its been already set" but the test validates that the condition status is ConditionFalse, not ConditionUnknown. This is inconsistent with the test name. Either update the test name to reflect that it's now False (not Unknown), or clarify the intended behavior in the test name.
| // Add the parent ClusterExtension's ServiceAccount to the watch list | ||
| ownerName, ok := rev.Labels[labels.OwnerNameKey] | ||
| if ok { | ||
| ext := &ocv1.ClusterExtension{} | ||
| if err := c.Client.Get(ctx, client.ObjectKey{Name: ownerName}, ext); err == nil { | ||
| // Watch the ServiceAccount referenced by the parent ClusterExtension | ||
| gvks.Insert(schema.GroupVersionKind{ | ||
| Group: corev1.GroupName, | ||
| Version: "v1", | ||
| Kind: "ServiceAccount", | ||
| }) | ||
| } | ||
| // If we can't get the ClusterExtension, continue without adding the ServiceAccount watch | ||
| // The revision will still watch its own objects | ||
| } |
Copilot
AI
Feb 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ServiceAccount watch addition is overly broad and inefficient. This code unconditionally adds ServiceAccount to the watch list for ALL revisions that have a parent ClusterExtension, regardless of whether the actual bundle content references any ServiceAccounts. This will cause the revision controller to watch all ServiceAccount changes in the cluster, potentially leading to unnecessary reconciliations. Consider only adding the ServiceAccount watch when the bundle actually contains ServiceAccount resources, similar to how other GVKs are added from the phase objects.
| } else if isRollingOut { | ||
| // Rolling out - check if the latest active revision is stuck | ||
| // The latest revision is the one currently rolling out | ||
| if len(ext.Status.ActiveRevisions) > 0 { | ||
| latestRevStatus := ext.Status.ActiveRevisions[len(ext.Status.ActiveRevisions)-1] | ||
| readyCond := apimeta.FindStatusCondition(latestRevStatus.Conditions, ocv1.ClusterExtensionRevisionTypeReady) | ||
| if readyCond != nil && readyCond.Status == metav1.ConditionFalse { | ||
| // Latest revision is not ready - check how long it's been stuck | ||
| timeSinceLastTransition = time.Since(readyCond.LastTransitionTime.Time) | ||
| isStuck = true | ||
| } | ||
| } | ||
| } |
Copilot
AI
Feb 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The progress deadline logic only checks the Ready condition's LastTransitionTime for the latest active revision, but doesn't account for revisions that might not have reached the active state yet. If a revision is stuck before becoming active (e.g., in validation or early rollout phases), the deadline check will not detect it because it only examines ext.Status.ActiveRevisions. Consider also checking revisions that are in the RollingOut state but not yet in ActiveRevisions to ensure comprehensive deadline enforcement.
| // Set Progressing condition based on revision states | ||
| // If there's an installed revision and no rolling revisions, we've reached the desired state | ||
| if state.revisionStates.Installed() != nil && len(state.revisionStates.RollingOut()) == 0 { | ||
| setStatusProgressing(ext, nil) | ||
| } else { | ||
| // There are rolling revisions, so we're still progressing | ||
| SetStatusCondition(&ext.Status.Conditions, metav1.Condition{ | ||
| Type: ocv1.TypeProgressing, | ||
| Status: metav1.ConditionTrue, | ||
| Reason: ocv1.ReasonRollingOut, | ||
| Message: "Rolling out new revision", | ||
| ObservedGeneration: ext.GetGeneration(), | ||
| }) | ||
| } |
Copilot
AI
Feb 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Progressing condition logic may be incorrect when there are both installed and rolling out revisions. Lines 173-174 set Progressing to Succeeded when there's an installed revision and no rolling revisions. However, lines 176-184 set Progressing to RollingOut when there are rolling revisions, even if there's also an installed revision. This means the Progressing condition doesn't reflect the overall state accurately - it will show RollingOut even when there's a stable installed revision available. The logic should consider whether the rolling revision is newer than the installed one, and potentially maintain a Succeeded state if the installed revision is already at the desired state.
| // Mirror Ready condition from the installed revision | ||
| if i := state.revisionStates.Installed(); i != nil { | ||
| if cnd := apimeta.FindStatusCondition(i.Conditions, ocv1.ClusterExtensionRevisionTypeReady); cnd != nil { | ||
| cnd.ObservedGeneration = ext.GetGeneration() | ||
| apimeta.SetStatusCondition(&ext.Status.Conditions, *cnd) | ||
| } | ||
| ext.Status.Install = &ocv1.ClusterExtensionInstallStatus{ | ||
| Bundle: i.BundleMetadata, | ||
| } | ||
| ext.Status.ActiveRevisions = []ocv1.RevisionStatus{{Name: i.RevisionName}} | ||
| } | ||
| for idx, r := range state.revisionStates.RollingOut { | ||
| for idx, r := range state.revisionStates.RollingOut() { | ||
| rs := ocv1.RevisionStatus{Name: r.RevisionName} | ||
| for _, cndType := range []string{ocv1.ClusterExtensionRevisionTypeAvailable, ocv1.ClusterExtensionRevisionTypeProgressing} { | ||
| if cnd := apimeta.FindStatusCondition(r.Conditions, cndType); cnd != nil { | ||
| cnd.ObservedGeneration = ext.GetGeneration() | ||
| apimeta.SetStatusCondition(&rs.Conditions, *cnd) | ||
| } | ||
| if cnd := apimeta.FindStatusCondition(r.Conditions, ocv1.ClusterExtensionRevisionTypeReady); cnd != nil { | ||
| cnd.ObservedGeneration = ext.GetGeneration() | ||
| apimeta.SetStatusCondition(&rs.Conditions, *cnd) | ||
| } | ||
| // Mirror Progressing condition from the latest active revision | ||
| if idx == len(state.revisionStates.RollingOut)-1 { | ||
| if pcnd := apimeta.FindStatusCondition(r.Conditions, ocv1.ClusterExtensionRevisionTypeProgressing); pcnd != nil { | ||
| pcnd.ObservedGeneration = ext.GetGeneration() | ||
| apimeta.SetStatusCondition(&ext.Status.Conditions, *pcnd) | ||
| // Mirror Ready condition from the latest active revision | ||
| if idx == len(state.revisionStates.RollingOut())-1 { | ||
| if rcnd := apimeta.FindStatusCondition(r.Conditions, ocv1.ClusterExtensionRevisionTypeReady); rcnd != nil { | ||
| rcnd.ObservedGeneration = ext.GetGeneration() | ||
| apimeta.SetStatusCondition(&ext.Status.Conditions, *rcnd) | ||
| } |
Copilot
AI
Feb 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Ready condition is being mirrored from both the installed revision (lines 143-147) and the latest rolling revision (lines 160-164), which could lead to conflicting condition updates. If there's an installed revision with Ready=True and a rolling revision with Ready=False, the final state will depend on the order of execution. Since the rolling revision logic executes after the installed revision logic, the ClusterExtension's Ready condition will reflect the rolling revision's state, which might be confusing to users who expect Ready=True to indicate the installed bundle is ready. Consider using a separate condition type or clarifying the semantics of which revision's condition should be mirrored to the ClusterExtension.
| func (rs RevisionStates) Installed() *RevisionMetadata { | ||
| for _, r := range rs { | ||
| if r.State == RevisionStateInstalled { | ||
| return r | ||
| } | ||
| } | ||
| return nil | ||
| } |
Copilot
AI
Feb 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Installed() method returns the first revision with State == RevisionStateInstalled it finds in the slice (lines 527-530), but if multiple revisions could have this state (as identified in the previous issue with BoxcutterRevisionStatesGetter), this method will return an arbitrary one rather than the correct one. Since RevisionStates is documented as sorted ascending by revision number, consider either: 1) iterating in reverse to find the latest installed revision, or 2) ensuring only one revision can be in Installed state at a time through better state determination logic.
| // - When status is False and reason is ProbeFailure, one or more objects are failing their readiness probes. | ||
| // - When status is False and reason is ValidationFailed, the revision failed preflight validation checks. | ||
| // - When status is False and reason is ObjectCollision, the revision encountered object ownership collisions. | ||
| // - When status is False and reason is ProgressDeadlineExceeded, the revision has not completed within the specified progress deadline. |
Copilot
AI
Feb 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation states "the revision has not completed within the specified progress deadline" for ProgressDeadlineExceeded reason, but the progressDeadlineMinutes field has been removed from ClusterExtensionRevision spec and moved to ClusterExtension level. This documentation should be updated to clarify that the progress deadline is now controlled at the ClusterExtension level, not the revision level.
| // - When status is False and reason is ProgressDeadlineExceeded, the revision has not completed within the specified progress deadline. | |
| // - When status is False and reason is ProgressDeadlineExceeded, the revision has not completed within the progress deadline configured on the parent ClusterExtension. |
| state := RevisionStateRollingOut | ||
| if apimeta.IsStatusConditionTrue(rev.Status.Conditions, ocv1.ClusterExtensionRevisionTypeReady) { | ||
| state = RevisionStateInstalled | ||
| } |
Copilot
AI
Feb 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The revision state determination logic only checks if Ready=True to mark a revision as Installed (line 67-68). However, this doesn't account for scenarios where multiple revisions might have Ready=True simultaneously, which could happen during a rollout where an old revision is still ready while a new one becomes ready. This could result in multiple revisions being marked as "Installed" state, which violates the expectation that only one revision should be installed at a time. Consider adding logic to identify which revision should be the singular "Installed" one based on additional criteria like revision number or whether it's actively serving traffic.
16d7dda to
8adcfc3
Compare
Description
Just having some fun experimenting with Claude and sketching changes to OLM:
Ready+ add phase level statusNote: All of this was generated through prompting 🤣
Reviewer Checklist