OCPBUGS-86179: Update ImageModeStatusReporting MCP machine count tests to be resilient on SNO topology#6089
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
Skipping CI for Draft Pull Request. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughThis PR updates test validation logic for MachineConfigPool machine-count transitions: the helper now signals connection-refused errors separately, and the retry orchestrator increases timeouts/intervals for layered updates and retries connection-refused failures without consuming primary retry attempts. ChangesMachine-count validation resilience for MCP status checks
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 5 | ❌ 10❌ Failed checks (10 inconclusive)
✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning Tools execution failed with the following error: Failed to run tools: 13 INTERNAL: Received RST_STREAM with code 2 (Internal server error) Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: isabella-janssen The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/payload-aggregate periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-aws-mco-single-node-disruptive-techpreview 10 |
|
@isabella-janssen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/548d2bc0-5936-11f1-9a2b-6dbd41087b03-0 |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/extended/image_mode_status_reporting.go`:
- Around line 378-385: The code currently treats any non-ECONNREFUSED mcpErr as
a "count mismatch" by returning (false, false); instead change the non-retryable
branch to fail fast by surfacing the real API error: log a descriptive message
including mcpName and the mcpErr (use logger.Errorf with the error) and
propagate the error to the caller (i.e. change the function signature/return
path to return an error or wrap and return mcpErr rather than hiding it),
leaving the ECONNREFUSED branch to continue retrying; apply the same change to
the corresponding block referenced at lines 393-399 (the branches that examine
mcpErr and compare to syscall.ECONNREFUSED).
- Around line 330-346: The inner retry loop decreases the loop counter (i--) on
connection errors which can pin i and create an infinite loop that defeats the
outer Eventually timeout; update the loop around
mcnAndNodeAnnotationMachineCountsMatch so connection-error retries are bounded:
introduce a small maxConnRetries constant (e.g. maxConnRetries := 3) and a
connRetry counter local to the loop, increment connRetry on isConnErr and only
retry (sleep and continue) while connRetry < maxConnRetries; if connRetry >=
maxConnRetries, break or return the connection error so the outer Eventually can
observe the timeout. Ensure you reference and update the loop variables around
the existing i, loopCount, isConnErr and the
mcnAndNodeAnnotationMachineCountsMatch call.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 0805a147-5b9a-4f6f-ba78-916becb1a05f
📒 Files selected for processing (1)
test/extended/image_mode_status_reporting.go
|
/payload-job periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-gcp-mco-disruptive-techpreview-3of3 periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-gcp-mco-disruptive-techpreview-1of3 |
|
@isabella-janssen: trigger 2 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/a1353c50-5937-11f1-9be3-d05ca575d0d2-0 |
|
/payload-aggregate periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-aws-mco-single-node-disruptive-techpreview 10 |
|
@isabella-janssen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/fcc62570-593c-11f1-8f96-b3924dacb20b-0 |
|
/payload-job periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-gcp-mco-disruptive-techpreview-3of3 periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-gcp-mco-disruptive-techpreview-1of3 |
|
@isabella-janssen: trigger 2 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/05818f10-593d-11f1-808d-7aea0549d0ca-0 |
|
/payload-aggregate periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-aws-mco-single-node-disruptive-techpreview 5 |
|
/payload-job periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-gcp-mco-disruptive-techpreview-3of3 periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-gcp-mco-disruptive-techpreview-1of3 |
|
@isabella-janssen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/4e0bf0e0-59c9-11f1-850e-d81c4de91830-0 |
|
@isabella-janssen: trigger 2 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/54984580-59c9-11f1-9d81-62e2291da5e5-0 |
|
/payload-aggregate periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-aws-mco-single-node-disruptive-techpreview 5 |
|
@isabella-janssen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/08fca9a0-59cd-11f1-9a35-17af151ff2db-0 |
|
/payload-job periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-gcp-mco-disruptive-techpreview-3of3 periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-gcp-mco-disruptive-techpreview-1of3 |
|
@isabella-janssen: trigger 2 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/0ca195c0-59cd-11f1-9abe-6df7e55a3944-0 |
672f373 to
cbfb92a
Compare
|
@isabella-janssen: This pull request references Jira Issue OCPBUGS-86179, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/verified by @isabella-janssen See that the ImageModeStatusReporting tests are passing on SNO in #6089 (comment) and the corresponding tests are still passing in the standard disruptive suite. |
|
@isabella-janssen: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
cbfb92a to
541a3c9
Compare
|
Actionable comments posted: 0 |
|
@isabella-janssen: This pull request references Jira Issue OCPBUGS-86179, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@isabella-janssen: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Closes: OCPBUGS-86179
- What I did
This updates the MCP machine count tests for the
ImageModeStatusReportingFeatureGate to pass in the MCO's SNO suite.- How to verify it
The
MachineConfigPool machine counts should transition when OCB is enabled in a default MCPandMachineConfigPool machine counts should transition correctly on an update in a default MCPtests should continue passing in the MCO's disruptive test suites and should pass in the new SNO suite.- Description for the changelog
OCPBUGS-86179: Update ImageModeStatusReporting MCP machine count tests to be resilient on SNO topology
Summary by CodeRabbit