USHIFT-6438: Greenboot healthcheck service restart replacement#6074
Conversation
|
@ggiguash: This pull request references USHIFT-6438 which is a valid jira issue. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Skipping CI for Draft Pull Request. |
|
/test ? |
|
/test e2e-aws-tests-bootc-upstream |
|
/test e2e-aws-tests |
|
/test e2e-aws-tests |
|
/retest |
1 similar comment
|
/retest |
|
/test e2e-aws-tests |
1 similar comment
|
/test e2e-aws-tests |
4f65a08 to
bfa51bf
Compare
|
/test e2e-aws-tests-bootc-upstream |
|
/retest-required |
|
/test e2e-aws-tests-bootc |
1 similar comment
|
/test e2e-aws-tests-bootc |
|
/test e2e-aws-tests-bootc |
|
/test e2e-aws-tests-bootc-upstream |
|
@ggiguash: This pull request references USHIFT-6438 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
test/resources/microshift-host.resource (1)
127-139: Prefer${FALSE}over bareFalsefor idiomatic RF.Line 135 returns the string
"False".Should Be Truehappens to evaluate it correctly as PythonFalse, so behavior is correct — but bare string literals for booleans are non-idiomatic in Robot Framework.♻️ Suggested fix
- RETURN False ${old_bootid} + RETURN ${FALSE} ${old_bootid}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/resources/microshift-host.resource` around lines 127 - 139, The keyword "Is System Rebooted" returns a bare Python-like literal "False" on the empty-bootid branch; change that to Robot's boolean variable ${FALSE} so the return uses idiomatic RF booleans. Locate the IF branch in "Is System Rebooted" (the line that currently does RETURN False ${old_bootid}) and replace the bare False with ${FALSE}; keep the rest of the return values and the existing "Get Current Boot Id" usage intact.test/suites/tuned/workload-partitioning.robot (1)
258-261: Stale keyword name after Greenboot replacement.The keyword is still named
Setup Suite And Wait For Greenbootbut now callsWait For MicroShift Healthcheck Success. The suite-level reference on line 11 (Suite Setup Setup Suite And Wait For Greenboot) will also keep propagating the misleading name.♻️ Suggested rename
-Setup Suite And Wait For Greenboot - [Documentation] Run setup suit and wait for greenboot to become ready +Setup Suite And Wait For Healthcheck + [Documentation] Run suite setup and wait for MicroShift healthcheck to succeed Setup Suite Wait For MicroShift Healthcheck SuccessAnd update line 11:
-Suite Setup Setup Suite And Wait For Greenboot +Suite Setup Setup Suite And Wait For Healthcheck🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/suites/tuned/workload-partitioning.robot` around lines 258 - 261, Rename the Robot Framework keyword "Setup Suite And Wait For Greenboot" to a name that reflects its new behavior (e.g., "Setup Suite And Wait For MicroShift Healthcheck") and update the Suite Setup reference that currently calls "Setup Suite And Wait For Greenboot" to the new name; locate the keyword definition block containing "Setup Suite And Wait For Greenboot" and the top-level "Suite Setup Setup Suite And Wait For Greenboot" reference and replace both occurrences so the keyword name matches the called keyword "Wait For MicroShift Healthcheck Success".test/resources/ostree-health.resource (1)
29-29: Nit: missing=on${rc}assignment — inconsistent with codebase style.Every other
Execute Commandmulti-return in the repo uses${rc}=. Robot Framework works either way, but the inconsistency stands out.✏️ Proposed fix
- ${stdout} ${stderr} ${rc} Execute Command + ${stdout} ${stderr} ${rc}= Execute Command🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/resources/ostree-health.resource` at line 29, Update the Robot Framework multi-return line so the return variable for the exit code follows the repo style: change the `Execute Command` call that currently uses `${stdout} ${stderr} ${rc} Execute Command` to assign the exit code with `=${rc}` (i.e., `${stdout} ${stderr} ${rc}=`) so it reads `${stdout} ${stderr} ${rc}=` followed by `Execute Command`; this keeps `${stdout}`, `${stderr}`, `${rc}` and the `Execute Command` invocation consistent with the rest of the codebase.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/resources/libostree.py`:
- Around line 141-147: get_last_reboots_count unpacks remote_sudo() as a tuple
but remote_sudo returns a str causing ValueError or wrong results; fix by
calling stdout = remote_sudo("last reboot | grep -c '^reboot' || true") (no
unpack), then return int(stdout.strip()) to handle trailing newlines/spaces;
update the get_last_reboots_count function accordingly and keep reference to
remote_sudo and get_last_reboots_count.
---
Nitpick comments:
In `@test/resources/microshift-host.resource`:
- Around line 127-139: The keyword "Is System Rebooted" returns a bare
Python-like literal "False" on the empty-bootid branch; change that to Robot's
boolean variable ${FALSE} so the return uses idiomatic RF booleans. Locate the
IF branch in "Is System Rebooted" (the line that currently does RETURN False
${old_bootid}) and replace the bare False with ${FALSE}; keep the rest of the
return values and the existing "Get Current Boot Id" usage intact.
In `@test/resources/ostree-health.resource`:
- Line 29: Update the Robot Framework multi-return line so the return variable
for the exit code follows the repo style: change the `Execute Command` call that
currently uses `${stdout} ${stderr} ${rc} Execute Command` to assign
the exit code with `=${rc}` (i.e., `${stdout} ${stderr} ${rc}=`) so it
reads `${stdout} ${stderr} ${rc}=` followed by `Execute Command`; this
keeps `${stdout}`, `${stderr}`, `${rc}` and the `Execute Command` invocation
consistent with the rest of the codebase.
In `@test/suites/tuned/workload-partitioning.robot`:
- Around line 258-261: Rename the Robot Framework keyword "Setup Suite And Wait
For Greenboot" to a name that reflects its new behavior (e.g., "Setup Suite And
Wait For MicroShift Healthcheck") and update the Suite Setup reference that
currently calls "Setup Suite And Wait For Greenboot" to the new name; locate the
keyword definition block containing "Setup Suite And Wait For Greenboot" and the
top-level "Suite Setup Setup Suite And Wait For Greenboot" reference and
replace both occurrences so the keyword name matches the called keyword "Wait
For MicroShift Healthcheck Success".
|
@ggiguash: This pull request references USHIFT-6438 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@ggiguash: This pull request references USHIFT-6438 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@coderabbitai perform full review |
|
✅ Actions performedFull review triggered. |
|
/test e2e-aws-tests-bootc-periodic |
|
@ggiguash: This pull request references USHIFT-6438 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
test/suites/optional/cert-manager.robot (1)
411-417:⚠️ Potential issue | 🟡 Minor
Cleanup DNS For Testis missing a healthcheck wait afterRestart MicroShift.
Setup DNS For Testnow waits forWait For MicroShift Healthcheck Successpost-restart, but its counterpartCleanup DNS For Testdoes not. Any test or suite-level teardown that runs after this keyword may act against an unhealthy MicroShift.🛠️ Proposed fix
Cleanup DNS For Test [Documentation] Cleanup DNS configuration [Arguments] ${dns_name} Remove Entry From Hosts ${dns_name} Remove Drop In MicroShift Config 20-dns Restart MicroShift + Wait For MicroShift Healthcheck Success🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/suites/optional/cert-manager.robot` around lines 411 - 417, The teardown keyword Cleanup DNS For Test currently restarts MicroShift via the Restart MicroShift keyword but does not wait for readiness; add a call to Wait For MicroShift Healthcheck Success immediately after Restart MicroShift in the Cleanup DNS For Test keyword so teardown mirrors Setup DNS For Test and ensures MicroShift is healthy before subsequent tests or suite teardowns run.test/suites/backup/auto-recovery-extra.robot (1)
66-71:⚠️ Potential issue | 🟡 MinorFix teardown to remove
99-auto-recovery.shinstead of100-auto-recovery.sh.Setup at line 161 installs
/etc/greenboot/red.d/99-auto-recovery.sh, but teardown at line 69 removes/etc/greenboot/red.d/100-auto-recovery.sh. The mismatched filename leaves the installed script on disk after cleanup, which can trigger unintended greenboot auto-recovery on subsequent test runs.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/suites/backup/auto-recovery-extra.robot` around lines 66 - 71, Teardown currently removes /etc/greenboot/red.d/100-auto-recovery.sh but the setup installs /etc/greenboot/red.d/99-auto-recovery.sh; update the teardown in the Run Keywords block (the [Teardown] Run Keywords sequence that calls Command Should Work rm -f /etc/greenboot/red.d/100-auto-recovery.sh) to remove /etc/greenboot/red.d/99-auto-recovery.sh instead so the installed script is properly cleaned up.
🧹 Nitpick comments (3)
test/suites/optional/cert-manager.robot (1)
379-388:Verify Cert Manager Kustomization Successalso restarts MicroShift without a healthcheck wait.The log-pattern retry loop confirms the kustomization applied, but doesn't guarantee MicroShift is fully healthy when the keyword returns — inconsistent with the PR's intent to replace all Greenboot gates.
♻️ Proposed fix
Verify Cert Manager Kustomization Success [Documentation] Verify that cert-manager kustomization was successfully applied by checking journalctl logs ${cursor}= Get Journal Cursor Restart MicroShift + Wait For MicroShift Healthcheck Success Pattern Should Appear In Log Output ... ${cursor} ... Applying kustomization at /usr/lib/microshift/manifests.d/060-microshift-cert-manager was successful ... unit=microshift ... retries=30 ... wait=10🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/suites/optional/cert-manager.robot` around lines 379 - 388, The test case "Verify Cert Manager Kustomization Success" restarts MicroShift via the "Restart MicroShift" keyword but never waits for MicroShift to become healthy before checking logs; update the test to wait for a successful health check after restart (either call an existing "Wait For MicroShift Healthy"/"Wait Until MicroShiftReady" keyword or add a retry loop that polls a health endpoint) before running "Pattern Should Appear In Log Output", or modify the "Restart MicroShift" keyword to perform that health wait internally so the subsequent log assertion only runs once MicroShift is confirmed healthy.test/suites/backup/auto-recovery-extra.robot (1)
57-64: Redundant equality assertion on line 62.
System Should Be Rebooted(used insideWait Until Keyword Succeeds) already assertscur_bootid ≠ old_bootidbefore returning. When that succeeds,${new_bootid} ≠ ${cur_bootid}is guaranteed, so the extraShould Not Be Equal As Stringscan never catch anything new.♻️ Proposed simplification
${new_bootid}= Wait Until Keyword Succeeds 5m 15s ... System Should Be Rebooted ${cur_bootid} - Should Not Be Equal As Strings ${cur_bootid} ${new_bootid} # Verify the system is healthy after auto-recovery🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/suites/backup/auto-recovery-extra.robot` around lines 57 - 64, The assertion Should Not Be Equal As Strings ${cur_bootid} ${new_bootid} is redundant because the Wait Until Keyword Succeeds call that runs System Should Be Rebooted already verifies the boot ID changed; remove the redundant Should Not Be Equal As Strings line from the test in auto-recovery-extra.robot so the test relies on System Should Be Rebooted (used inside Wait Until Keyword Succeeds) to validate reboot and then proceeds to Wait Until Greenboot Health Check Exited.test/suites/tuned/microshift-tuned.robot (1)
101-103: Line 103 assertion is redundant.
Is System RebootedreturnsFalseonly when the boot IDs are identical, soShould Be Equal As Strings ${cur_bootid} ${bootid}is a guaranteed no-op whenever line 102 passes.♻️ Simplification
${rebooted} ${cur_bootid}= Is System Rebooted ${bootid} Should Not Be True ${rebooted} - Should Be Equal As Strings ${cur_bootid} ${bootid}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/suites/tuned/microshift-tuned.robot` around lines 101 - 103, The assertion on line using "Should Be Equal As Strings ${cur_bootid} ${bootid}" is redundant because "Is System Rebooted" already returns False only when the boot IDs match; remove the redundant "Should Be Equal As Strings" line (or alternatively remove the "Should Not Be True ${rebooted}" and keep the equality check if you prefer an explicit bootid comparison) and keep the "Is System Rebooted ${bootid}" result assertion via "Should Not Be True ${rebooted}" to simplify the test; refer to the keywords Is System Rebooted, Should Not Be True, and the variables ${rebooted}, ${cur_bootid}, ${bootid} when editing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/suites/tuned/workload-partitioning.robot`:
- Around line 258-261: Update the docstring for the keyword "Setup Suite And
Wait For Healthcheck" to remove the stale reference to "greenboot" and instead
describe that it runs setup and waits for the MicroShift healthcheck to become
ready; ensure the documentation mentions the called keywords "Setup Suite" and
"Wait For MicroShift Healthcheck Success" so it's clear what the keyword does.
---
Outside diff comments:
In `@test/suites/backup/auto-recovery-extra.robot`:
- Around line 66-71: Teardown currently removes
/etc/greenboot/red.d/100-auto-recovery.sh but the setup installs
/etc/greenboot/red.d/99-auto-recovery.sh; update the teardown in the Run
Keywords block (the [Teardown] Run Keywords sequence that calls Command Should
Work rm -f /etc/greenboot/red.d/100-auto-recovery.sh) to remove
/etc/greenboot/red.d/99-auto-recovery.sh instead so the installed script is
properly cleaned up.
In `@test/suites/optional/cert-manager.robot`:
- Around line 411-417: The teardown keyword Cleanup DNS For Test currently
restarts MicroShift via the Restart MicroShift keyword but does not wait for
readiness; add a call to Wait For MicroShift Healthcheck Success immediately
after Restart MicroShift in the Cleanup DNS For Test keyword so teardown mirrors
Setup DNS For Test and ensures MicroShift is healthy before subsequent tests or
suite teardowns run.
---
Nitpick comments:
In `@test/suites/backup/auto-recovery-extra.robot`:
- Around line 57-64: The assertion Should Not Be Equal As Strings ${cur_bootid}
${new_bootid} is redundant because the Wait Until Keyword Succeeds call that
runs System Should Be Rebooted already verifies the boot ID changed; remove the
redundant Should Not Be Equal As Strings line from the test in
auto-recovery-extra.robot so the test relies on System Should Be Rebooted (used
inside Wait Until Keyword Succeeds) to validate reboot and then proceeds to Wait
Until Greenboot Health Check Exited.
In `@test/suites/optional/cert-manager.robot`:
- Around line 379-388: The test case "Verify Cert Manager Kustomization Success"
restarts MicroShift via the "Restart MicroShift" keyword but never waits for
MicroShift to become healthy before checking logs; update the test to wait for a
successful health check after restart (either call an existing "Wait For
MicroShift Healthy"/"Wait Until MicroShiftReady" keyword or add a retry loop
that polls a health endpoint) before running "Pattern Should Appear In Log
Output", or modify the "Restart MicroShift" keyword to perform that health wait
internally so the subsequent log assertion only runs once MicroShift is
confirmed healthy.
In `@test/suites/tuned/microshift-tuned.robot`:
- Around line 101-103: The assertion on line using "Should Be Equal As Strings
${cur_bootid} ${bootid}" is redundant because "Is System Rebooted" already
returns False only when the boot IDs match; remove the redundant "Should Be
Equal As Strings" line (or alternatively remove the "Should Not Be True
${rebooted}" and keep the equality check if you prefer an explicit bootid
comparison) and keep the "Is System Rebooted ${bootid}" result assertion via
"Should Not Be True ${rebooted}" to simplify the test; refer to the keywords Is
System Rebooted, Should Not Be True, and the variables ${rebooted},
${cur_bootid}, ${bootid} when editing.
538d91e to
1913f17
Compare
|
/test e2e-aws-tests-bootc-periodic |
|
@ggiguash: This pull request references USHIFT-6438 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@test/resources/microshift-host.resource`:
- Around line 50-73: The Robot keyword "Get Last Reboots Count" referenced in
the resource must exist in libostree.py; verify there is a Python function named
get_last_reboots_count (or a method) that returns the reboot count and is
exposed as a Robot keyword (either decorated with `@keyword` or present in a class
registered as a library) and if missing implement it: add def
get_last_reboots_count(self): -> int (or module-level def) that computes/reads
the last reboot count and ensure Robot sees it as "Get Last Reboots Count"
(decorate with robot.api.deco.keyword or adjust class/library export), then run
the grep/search to confirm the symbol exists and update any import/Library
declarations so the resource can use the keyword.
|
/test e2e-aws-tests-bootc-periodic |
pacevedom
left a comment
There was a problem hiding this comment.
/lgtm
One minor nit, but it can be addressed in a different PR.
There was a problem hiding this comment.
Wondering if this should be renamed now, its not strictly about greenboot anymore? More like restarts/self-healing.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ggiguash, pacevedom The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Release jobs are failing due to the OLM manifest problems |
|
@ggiguash: Overrode contexts on behalf of ggiguash: ci/prow/e2e-aws-tests-bootc-release, ci/prow/e2e-aws-tests-bootc-release-arm DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Periodic tests failures are not related to greenboot |
|
/verified by ci |
|
@ggiguash: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@ggiguash: Overrode contexts on behalf of ggiguash: ci/prow/e2e-aws-tests-bootc-periodic, ci/prow/e2e-aws-tests-periodic DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@ggiguash: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary by CodeRabbit
Bug Fixes
Chores
Tests