Skip to content

USHIFT-7223: Automate Polarion tests OCP-66820 and OCP-66882#6905

Open
agullon wants to merge 5 commits into
openshift:mainfrom
agullon:USHIFT-7223
Open

USHIFT-7223: Automate Polarion tests OCP-66820 and OCP-66882#6905
agullon wants to merge 5 commits into
openshift:mainfrom
agullon:USHIFT-7223

Conversation

@agullon

@agullon agullon commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add Robot Framework tests for two prerun data management scenarios in test/suites/backup/prerun-data-management.robot
  • OCP-66820: Verifies that modifying the version file to be 3 minors back causes MicroShift to fail and log the failure reason to prerun_failed.log, reported by greenboot healthcheck
  • OCP-66882: Verifies that removing /var/lib/microshift while health.json shows healthy causes MicroShift to start fresh, ignoring stale health info
  • Both tests added to existing CI scenarios (el98-src@backup-and-restore-on-reboot.sh and el98-lrel@backups.sh)
  • Increase Reboot MicroShift Host timeout from 5m to 7m to accommodate ARM CI

ARM reboot timeout fix

The new prerun-data-management tests involve multiple reboots, including a destructive
cycle that removes /var/lib/microshift and triggers a full MicroShift bootstrap from
scratch (certificate generation, etcd initialization, etc.).

On ARM (Graviton) CI instances, reboot cycles are significantly slower than x86:

  • SSH becomes available in ~60-120s on ARM vs ~30-50s on x86
  • A fresh MicroShift bootstrap after data removal can push the total reboot cycle to ~400s

The previous 5m (300s) timeout in Reboot MicroShift Host was insufficient for ARM,
causing test failures exclusively on ARM jobs while identical x86 runs passed. Bumping
to 7m provides enough headroom without masking genuine hangs.

Metric x86 ARM
Typical reboot (SSH up) ~30-50s ~60-120s
Fresh bootstrap reboot ~70-80s up to ~400s
Previous timeout 300s 300s
New timeout 420s 420s

Test plan

  • Run el98-src@backup-and-restore-on-reboot presubmit scenario — verify both existing and new tests pass
  • Run el98-lrel@backups release scenario — verify both existing and new tests pass
  • Verify OCP-66820 test correctly detects version mismatch failure in prerun_failed.log
  • Verify OCP-66882 test correctly detects fresh start journal messages after data removal
  • Verify ARM jobs pass with the increased reboot timeout

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Tests
    • Expanded automated coverage for backup/restore-on-reboot and release backup flows by running an additional prerun data management Robot suite alongside existing backup-related scenarios.
    • Added end-to-end validations for prerun behavior, including version compatibility handling with healthcheck completion and journal/log assertions, plus “fresh start” behavior when the MicroShift data directory is removed.
  • Chores
    • Increased the post-reboot wait time for the MicroShift host to become SSH-reachable (up to 7 minutes) to improve test reliability.

Add Robot Framework tests for prerun data management scenarios:

- OCP-66820: Verify that when the version file indicates a version
  3 minors behind the executable (exceeding MAX_VERSION_SKEW=2),
  MicroShift fails to start and logs the failure reason to
  prerun_failed.log, which is then reported by greenboot healthcheck.

- OCP-66882: Verify that when the MicroShift data directory is removed
  but health.json shows healthy status, MicroShift starts fresh as if
  it were the first run, ignoring the stale health info.

Both tests are added to existing CI scenarios:
- Presubmit: el98-src@backup-and-restore-on-reboot.sh
- Release: el98-lrel@backups.sh

Ref: USHIFT-7223

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

pre-commit.check-secrets: ENABLED
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 18, 2026
@openshift-ci-robot

openshift-ci-robot commented Jun 18, 2026

Copy link
Copy Markdown

@agullon: This pull request references USHIFT-7223 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Summary

  • Add Robot Framework tests for two prerun data management scenarios in test/suites/backup/prerun-data-management.robot
  • OCP-66820: Verifies that modifying the version file to be 3 minors back causes MicroShift to fail and log the failure reason to prerun_failed.log, reported by greenboot healthcheck
  • OCP-66882: Verifies that removing /var/lib/microshift while health.json shows healthy causes MicroShift to start fresh, ignoring stale health info
  • Both tests added to existing CI scenarios (el98-src@backup-and-restore-on-reboot.sh and el98-lrel@backups.sh)

Test plan

  • Run el98-src@backup-and-restore-on-reboot presubmit scenario — verify both existing and new tests pass
  • Run el98-lrel@backups release scenario — verify both existing and new tests pass
  • Verify OCP-66820 test correctly detects version mismatch failure in prerun_failed.log
  • Verify OCP-66882 test correctly detects fresh start journal messages after data removal

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: f60865b3-5cec-4f40-b95b-cb7161887e06

📥 Commits

Reviewing files that changed from the base of the PR and between 8be3c4c and 5cc97a2.

📒 Files selected for processing (1)
  • test/resources/microshift-host.resource

Walkthrough

A new Robot Framework suite adds prerun data management coverage for version-file rollback and missing-data-directory scenarios. Two scenario scripts include the suite in test runs, and reboot waiting is extended for host recovery.

Changes

Prerun Data Management Tests

Layer / File(s) Summary
Scenario wiring
test/scenarios/presubmits/el98-src@backup-and-restore-on-reboot.sh, test/scenarios/releases/el98-lrel@backups.sh
Both scenario scripts add suites/backup/prerun-data-management.robot to the run_tests host1 invocation.
Suite foundation
test/suites/backup/prerun-data-management.robot
The new suite defines metadata, variables, and suite-level login and logout keywords.
Version-file prerun checks
test/suites/backup/prerun-data-management.robot
The suite adds the version rollback test and helper keywords for version backup and restore, greenboot state and journal checks, and prerun_failed.log validation.
Missing-data prerun checks
test/suites/backup/prerun-data-management.robot, test/resources/microshift-host.resource
The suite adds the missing-data-directory test and related helpers, and the shared reboot keyword waits longer for SSH recovery after reboot.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • pmtk
  • pacevedom
🚥 Pre-merge checks | ✅ 15
✅ Passed checks (15 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: automating the two Polarion tests referenced in the PR.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed Added Robot test case names are static and descriptive; no dynamic IDs, dates, IPs, or generated values appear in any test title.
Test Structure And Quality ✅ Passed PASS: New Robot tests are single-purpose, use suite/test teardown for cleanup, and all reboot/wait paths have explicit timeouts; patterns match existing suites.
Microshift Test Compatibility ✅ Passed PASS: The added Robot tests only use host/system commands and files; no unsupported OpenShift APIs, namespaces, or multi-node assumptions were found.
Single Node Openshift (Sno) Test Compatibility ✅ Passed PR adds Robot Framework tests and scenario wiring only; no Ginkgo e2e tests or SNO-specific multi-node assumptions are present.
Topology-Aware Scheduling Compatibility ✅ Passed Only test/scenario Robot files and a reboot timeout changed; no manifests, controllers, or scheduling constraints were introduced.
Ote Binary Stdout Contract ✅ Passed PR only changes Robot/scenario shell files; no Go main/TestMain/init/RunSpecs stdout writes were added.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed The new coverage is Robot Framework, not Ginkgo, and it uses only local host/SSH and journal checks with no IPv4 literals or public internet access.
No-Weak-Crypto ✅ Passed Diff only adds Robot tests and scenario wiring; no MD5/SHA1/DES/RC4/3DES/Blowfish/ECB, custom crypto, or secret/token comparisons found.
Container-Privileges ✅ Passed Touched files are shell/Robot tests only; no privileged, hostPID/Network/IPC, SYS_ADMIN, or allowPrivilegeEscalation settings appear.
No-Sensitive-Data-In-Logs ✅ Passed PASS: The new Robot tests only assert on journal/file contents; no passwords, tokens, PII, or session IDs are logged in the added changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@openshift-ci openshift-ci Bot requested review from ggiguash and jogeo June 18, 2026 14:40
@openshift-ci

openshift-ci Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: agullon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 18, 2026
@agullon

agullon commented Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

/retest

agullon added 2 commits June 19, 2026 11:00
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

pre-commit.check-secrets: ENABLED
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

pre-commit.check-secrets: ENABLED
@agullon

agullon commented Jun 22, 2026

Copy link
Copy Markdown
Contributor Author

/retest

- Add Make New SSH Connection to Greenboot Health Check Should Be
  Finished keyword to survive greenboot-triggered reboots during
  the retry loop (SSH connection dies on each reboot)
- Add initial reboot to OCP-66882 test to ensure a backup exists
  before testing fresh start behavior, matching Polarion step 1

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

pre-commit.check-secrets: ENABLED
@agullon

agullon commented Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

/retest

1 similar comment
@agullon

agullon commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

/retest

ARM (Graviton) CI instances take significantly longer to complete
reboot cycles compared to x86. SSH typically becomes available in
~60-120s on ARM vs ~30-50s on x86. When tests involve destructive
operations like removing /var/lib/microshift followed by a fresh
bootstrap (certificate generation, etcd initialization), the total
reboot cycle on ARM can reach ~400s, exceeding the previous 5m
(300s) timeout.

This was observed in PR openshift#6905 where the new prerun-data-management
tests triggered reboot timeouts exclusively on ARM jobs, while
identical x86 runs passed comfortably.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

pre-commit.check-secrets: ENABLED
@agullon

agullon commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

/test e2e-aws-tests-release
/test e2e-aws-tests-release-arm

@openshift-ci

openshift-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

@agullon: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-tests-bootc-release-el10 5cc97a2 link true /test e2e-aws-tests-bootc-release-el10
ci/prow/e2e-aws-tests-bootc-release-arm-el10 5cc97a2 link true /test e2e-aws-tests-bootc-release-arm-el10
ci/prow/e2e-aws-tests 5cc97a2 link true /test e2e-aws-tests
ci/prow/e2e-aws-tests-bootc-release-el9 5cc97a2 link true /test e2e-aws-tests-bootc-release-el9
ci/prow/e2e-aws-tests-release-arm 5cc97a2 link true /test e2e-aws-tests-release-arm
ci/prow/e2e-aws-tests-release 5cc97a2 link true /test e2e-aws-tests-release

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants