Skip to content

OCPBUGS-86179: Update ImageModeStatusReporting MCP machine count tests to be resilient on SNO topology#6089

Open
isabella-janssen wants to merge 1 commit into
openshift:mainfrom
isabella-janssen:ocpbugs-86179
Open

OCPBUGS-86179: Update ImageModeStatusReporting MCP machine count tests to be resilient on SNO topology#6089
isabella-janssen wants to merge 1 commit into
openshift:mainfrom
isabella-janssen:ocpbugs-86179

Conversation

@isabella-janssen
Copy link
Copy Markdown
Member

@isabella-janssen isabella-janssen commented May 26, 2026

Closes: OCPBUGS-86179

- What I did
This updates the MCP machine count tests for the ImageModeStatusReporting FeatureGate to pass in the MCO's SNO suite.

- How to verify it
The MachineConfigPool machine counts should transition when OCB is enabled in a default MCP and MachineConfigPool machine counts should transition correctly on an update in a default MCP tests should continue passing in the MCO's disruptive test suites and should pass in the new SNO suite.

- Description for the changelog
OCPBUGS-86179: Update ImageModeStatusReporting MCP machine count tests to be resilient on SNO topology

Summary by CodeRabbit

  • Tests
    • Improved retry behavior and extended timeouts for machine-count validation during updates and cleanup, increasing resilience for single-node and layered image deployments.
    • Added detection and special handling of transient connection-refused errors so temporary connectivity issues trigger retries rather than validation failures.
    • Replaced hard fetch assertions with clearer success/failure signaling and adjusted logging wording for clarity.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 26, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 26, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 26, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: df8ebad2-5114-4686-a93c-4d5de3e02ffe

📥 Commits

Reviewing files that changed from the base of the PR and between cbfb92a and 541a3c9.

📒 Files selected for processing (1)
  • test/extended/image_mode_status_reporting.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/extended/image_mode_status_reporting.go

Walkthrough

This PR updates test validation logic for MachineConfigPool machine-count transitions: the helper now signals connection-refused errors separately, and the retry orchestrator increases timeouts/intervals for layered updates and retries connection-refused failures without consuming primary retry attempts.

Changes

Machine-count validation resilience for MCP status checks

Layer / File(s) Summary
Helper connection-error detection contract
test/extended/image_mode_status_reporting.go
Imports added for errors and syscall. mcnAndNodeAnnotationMachineCountsMatch refactored to return (countsMatch, isConnErr), detects ECONNREFUSED when fetching MCPs or nodes, and signals connection errors instead of failing the test outright.
Configurable retry strategy with layered-update backoff
test/extended/image_mode_status_reporting.go
validateMCPMachineCountTransitions uses the two-value helper, introduces loopCount, lengthens timeouts/intervals for layered (on-cluster image) updates, and retries connection-refused errors with computed sleeps without advancing main retry progress.
Helper implementation updates and final-result mapping
test/extended/image_mode_status_reporting.go
Helper logic updated to return (false, false) for non-matching counts, (true, false) on success, and (false, true) when a connection-refused error is observed during MCP/node retrieval.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • dkhater-redhat
🚥 Pre-merge checks | ✅ 5 | ❌ 10

❌ Failed checks (10 inconclusive)

Check name Status Explanation Resolution
Stable And Deterministic Test Names ❓ Inconclusive Repository clone failed, so this custom check could not run with code access. Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures.
Test Structure And Quality ❓ Inconclusive Repository clone failed, so this custom check could not run with code access. Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures.
Microshift Test Compatibility ❓ Inconclusive Repository clone failed, so this custom check could not run with code access. Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures.
Single Node Openshift (Sno) Test Compatibility ❓ Inconclusive Repository clone failed, so this custom check could not run with code access. Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures.
Topology-Aware Scheduling Compatibility ❓ Inconclusive Repository clone failed, so this custom check could not run with code access. Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures.
Ote Binary Stdout Contract ❓ Inconclusive Repository clone failed, so this custom check could not run with code access. Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures.
Ipv6 And Disconnected Network Test Compatibility ❓ Inconclusive Repository clone failed, so this custom check could not run with code access. Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures.
No-Weak-Crypto ❓ Inconclusive Repository clone failed, so this custom check could not run with code access. Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures.
Container-Privileges ❓ Inconclusive Repository clone failed, so this custom check could not run with code access. Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures.
No-Sensitive-Data-In-Logs ❓ Inconclusive Repository clone failed, so this custom check could not run with code access. Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures.
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title clearly summarizes the main change: updating MCP machine count tests for ImageModeStatusReporting to be resilient on SNO topology, which directly matches the core modifications to validateMCPMachineCountTransitions and mcnAndNodeAnnotationMachineCountsMatch.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

Tools execution failed with the following error:

Failed to run tools: 13 INTERNAL: Received RST_STREAM with code 2 (Internal server error)


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 26, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: isabella-janssen

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 26, 2026
@isabella-janssen
Copy link
Copy Markdown
Member Author

/payload-aggregate periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-aws-mco-single-node-disruptive-techpreview 10

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 26, 2026

@isabella-janssen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-aws-mco-single-node-disruptive-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/548d2bc0-5936-11f1-9a2b-6dbd41087b03-0

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/extended/image_mode_status_reporting.go`:
- Around line 378-385: The code currently treats any non-ECONNREFUSED mcpErr as
a "count mismatch" by returning (false, false); instead change the non-retryable
branch to fail fast by surfacing the real API error: log a descriptive message
including mcpName and the mcpErr (use logger.Errorf with the error) and
propagate the error to the caller (i.e. change the function signature/return
path to return an error or wrap and return mcpErr rather than hiding it),
leaving the ECONNREFUSED branch to continue retrying; apply the same change to
the corresponding block referenced at lines 393-399 (the branches that examine
mcpErr and compare to syscall.ECONNREFUSED).
- Around line 330-346: The inner retry loop decreases the loop counter (i--) on
connection errors which can pin i and create an infinite loop that defeats the
outer Eventually timeout; update the loop around
mcnAndNodeAnnotationMachineCountsMatch so connection-error retries are bounded:
introduce a small maxConnRetries constant (e.g. maxConnRetries := 3) and a
connRetry counter local to the loop, increment connRetry on isConnErr and only
retry (sleep and continue) while connRetry < maxConnRetries; if connRetry >=
maxConnRetries, break or return the connection error so the outer Eventually can
observe the timeout. Ensure you reference and update the loop variables around
the existing i, loopCount, isConnErr and the
mcnAndNodeAnnotationMachineCountsMatch call.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 0805a147-5b9a-4f6f-ba78-916becb1a05f

📥 Commits

Reviewing files that changed from the base of the PR and between 78e9e0e and e5d43fa.

📒 Files selected for processing (1)
  • test/extended/image_mode_status_reporting.go

Comment thread test/extended/image_mode_status_reporting.go Outdated
Comment thread test/extended/image_mode_status_reporting.go
@isabella-janssen
Copy link
Copy Markdown
Member Author

/payload-job periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-gcp-mco-disruptive-techpreview-3of3 periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-gcp-mco-disruptive-techpreview-1of3

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 26, 2026

@isabella-janssen: trigger 2 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-gcp-mco-disruptive-techpreview-3of3
  • periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-gcp-mco-disruptive-techpreview-1of3

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/a1353c50-5937-11f1-9be3-d05ca575d0d2-0

@isabella-janssen
Copy link
Copy Markdown
Member Author

/payload-aggregate periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-aws-mco-single-node-disruptive-techpreview 10

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 26, 2026

@isabella-janssen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-aws-mco-single-node-disruptive-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/fcc62570-593c-11f1-8f96-b3924dacb20b-0

@isabella-janssen
Copy link
Copy Markdown
Member Author

/payload-job periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-gcp-mco-disruptive-techpreview-3of3 periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-gcp-mco-disruptive-techpreview-1of3

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 26, 2026

@isabella-janssen: trigger 2 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-gcp-mco-disruptive-techpreview-3of3
  • periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-gcp-mco-disruptive-techpreview-1of3

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/05818f10-593d-11f1-808d-7aea0549d0ca-0

@isabella-janssen
Copy link
Copy Markdown
Member Author

/payload-aggregate periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-aws-mco-single-node-disruptive-techpreview 5

@isabella-janssen
Copy link
Copy Markdown
Member Author

/payload-job periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-gcp-mco-disruptive-techpreview-3of3 periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-gcp-mco-disruptive-techpreview-1of3

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 27, 2026

@isabella-janssen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-aws-mco-single-node-disruptive-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/4e0bf0e0-59c9-11f1-850e-d81c4de91830-0

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 27, 2026

@isabella-janssen: trigger 2 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-gcp-mco-disruptive-techpreview-3of3
  • periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-gcp-mco-disruptive-techpreview-1of3

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/54984580-59c9-11f1-9d81-62e2291da5e5-0

@isabella-janssen
Copy link
Copy Markdown
Member Author

/payload-aggregate periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-aws-mco-single-node-disruptive-techpreview 5

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 27, 2026

@isabella-janssen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-aws-mco-single-node-disruptive-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/08fca9a0-59cd-11f1-9a35-17af151ff2db-0

@isabella-janssen
Copy link
Copy Markdown
Member Author

/payload-job periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-gcp-mco-disruptive-techpreview-3of3 periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-gcp-mco-disruptive-techpreview-1of3

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 27, 2026

@isabella-janssen: trigger 2 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-gcp-mco-disruptive-techpreview-3of3
  • periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-gcp-mco-disruptive-techpreview-1of3

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/0ca195c0-59cd-11f1-9abe-6df7e55a3944-0

@isabella-janssen isabella-janssen changed the title (WIP) OCPBUGS-86179 OCPBUGS-86179: Update ImageModeStatusReporting MCP machine count tests to be resilient on SNO topology May 27, 2026
@isabella-janssen isabella-janssen marked this pull request as ready for review May 27, 2026 20:40
@openshift-ci-robot openshift-ci-robot added jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels May 27, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@isabella-janssen: This pull request references Jira Issue OCPBUGS-86179, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Closes: OCPBUGS-86179

- What I did
This updates the MCP machine count tests for the ImageModeStatusReporting FeatureGate to pass in the MCO's SNO suite.

- How to verify it
The MachineConfigPool machine counts should transition when OCB is enabled in a default MCP and MachineConfigPool machine counts should transition correctly on an update in a default MCP tests should continue passing in the MCO's disruptive test suites and should pass in the new SNO suite.

- Description for the changelog
OCPBUGS-86179: Update ImageModeStatusReporting MCP machine count tests to be resilient on SNO topology

Summary by CodeRabbit

  • Tests
  • Improved retry behavior and extended timeouts for machine-count validation during updates and cleanup, increasing resilience for single-node and layered image deployments.
  • Added detection and special handling of transient connection failures so temporary connectivity issues don't count as validation failures.
  • Removed hard fetch assertions to allow safer retries and clearer success/failure signaling.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 27, 2026
@isabella-janssen
Copy link
Copy Markdown
Member Author

/verified by @isabella-janssen

See that the ImageModeStatusReporting tests are passing on SNO in #6089 (comment) and the corresponding tests are still passing in the standard disruptive suite.

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label May 27, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@isabella-janssen: This PR has been marked as verified by @isabella-janssen.

Details

In response to this:

/verified by @isabella-janssen

See that the ImageModeStatusReporting tests are passing on SNO in #6089 (comment) and the corresponding tests are still passing in the standard disruptive suite.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

Actionable comments posted: 0

@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@isabella-janssen: This pull request references Jira Issue OCPBUGS-86179, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

Closes: OCPBUGS-86179

- What I did
This updates the MCP machine count tests for the ImageModeStatusReporting FeatureGate to pass in the MCO's SNO suite.

- How to verify it
The MachineConfigPool machine counts should transition when OCB is enabled in a default MCP and MachineConfigPool machine counts should transition correctly on an update in a default MCP tests should continue passing in the MCO's disruptive test suites and should pass in the new SNO suite.

- Description for the changelog
OCPBUGS-86179: Update ImageModeStatusReporting MCP machine count tests to be resilient on SNO topology

Summary by CodeRabbit

  • Tests
  • Improved retry behavior and extended timeouts for machine-count validation during updates and cleanup, increasing resilience for single-node and layered image deployments.
  • Added detection and special handling of transient connection-refused errors so temporary connectivity issues trigger retries rather than validation failures.
  • Replaced hard fetch assertions with clearer success/failure signaling and adjusted logging wording for clarity.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 28, 2026

@isabella-janssen: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants