[release-4.21] Backport Gateway API upgrade test#31232
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
Skipping CI for Draft Pull Request. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: gcs278 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Update gatewayAPIController tests to skip certain tests with OLM dependencies when GatewayAPIWithoutOLM FeatureGate is enabled This will unblock openshift/cluster-ingress-operator#1354 by skipping any tests that require OLM capabilities without causing failures in the origin testing. JIRA link: https://redhat.atlassian.net/browse/NE-2292
This includes 5 unique tests which are to be used to graduate the featuregate from techpreview to GA. There are some tests which fall common within gatewayAPIController and GatewayAPIWithoutOLM. JIRA: https://redhat.atlassian.net/browse/NE-2292
Dual-stack support for Gateway API is not yet declared, skip the GatewayAPIController tests on AWS dual-stack clusters. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add support for running Gateway API e2e tests on vSphere and baremetal while gracefully handling missing LoadBalancer and DNS capabilities. On vsphere/baremetal without LoadBalancer/DNS: - Tests GatewayClass, Gateway, HTTPRoute creation/attachment - Tests OSSM/Istio integration - Skips LoadBalancer service validation - Skips DNS record validation - Skips HTTP connectivity tests https://redhat.atlassian.net/browse/NE-2286 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixed multiple instances where o.Expect(err).NotTo(o.HaveOccurred()) was used inside wait.PollUntilContextTimeout loops, causing tests to fail immediately instead of retrying when resources were not found. The pattern now matches the existing Subscription check which correctly handles errors by logging and returning false to retry. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Gateway API tests fail on IPv6 and dual-stack clusters, particularly on baremetal platforms where catalog sources are typically disabled. This prevents OSSM operator installation via OLM. Replace AWS-specific dual-stack check with platform-agnostic detection that checks the cluster's ServiceNetwork CIDRs for IPv6 addressing. This will skip Gateway API tests on: - Baremetal/vSphere/EquinixMetal IPv6 or dual-stack clusters - AWS dual-stack clusters - Any other platform with IPv6 networking Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add upgrade test validating Gateway API migration from OLM-based Istio to CIO-managed Sail Library during 4.21 to 4.22 upgrades. Setup creates Gateway/HTTPRoute with OLM provisioning and tests connectivity. Test validates migration: Gateway remains programmed, Istiod running, Istio CRDs stay OLM-managed, GatewayClass has CIO finalizer, Istio CR deleted, subscription persists. Teardown cleans up all resources.
…ip logic The Gateway API upgrade test was calling g.Skip() from Setup(), which runs inside a goroutine managed by the disruption framework. Since g.Skip() panics and Ginkgo can only recover panics inside leaf nodes, this caused unrecoverable panics on IPv6/dual-stack, OKD, and unsupported platform clusters. Implement the upgrades.Skippable interface with a Skip() method that the disruption framework calls before Setup, avoiding the goroutine panic. Refactor checkPlatformSupportAndGetCapabilities into shouldSkipGatewayAPITests (safe outside Ginkgo nodes) and getPlatformCapabilities (returns LB/DNS support). https://redhat.atlassian.net/browse/OCPBUGS-83267 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Gateway API controller tests tracked Gateways in a shared in-memory gateways slice, deleting them during AfterEach cleanup. However, openshift-tests distributes tests across separate parallel worker processes. The annotation-based checkAllTestsDone coordination works correctly because annotations are stored on the cluster-scoped GatewayClass, but the gateways slice is not shared across processes. The process that runs the final AfterEach cleanup has an empty gateways slice, so it deletes the GatewayClass and istiod but never deletes the Gateways created by other processes. This leaves gateway deployments orphaned on the cluster. As a secondary issue, even when gateways were deleted, the GatewayClass and istiod were removed without waiting for the gateway proxy deployments to be fully cleaned up by GC. Since the deployments have an owner reference to the Gateway (not a finalizer), the cascade deletion is asynchronous, creating a race where gateway pods lose their control plane and crash-loop. Fix both issues by cleaning up gateways at the individual test level using defer deleteGateway, which deletes the Gateway and waits for its proxy deployment to be removed by GC. Add deleteGateway and waitForGatewayDeploymentDeletion helpers shared by both the controller tests and the upgrade test Teardown. Cleanup errors now hard fail to surface leftover resources immediately rather than causing confusing downstream test failures. https://redhat.atlassian.net/browse/OCPBUGS-83281 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-Authored-By: Grant Spence <gspence@redhat.com> Co-Authored-By: Ishmam Amin <iamin@redhat.com>
6bca8a5 to
6e20b57
Compare
The GatewayAPIWithoutOLM feature gate is not available on this release branch. Hardcode isNoOLMFeatureGateEnabled to return false so tests run correctly with OLM-based provisioning. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add pre/post upgrade check on the Subscription's status.installedCSV to detect when the OSSM operator InstallPlan approval flow fails. This catches the case where OLM generates an InstallPlan for the channel head (e.g. v3.3.3) but the CIO only approves plans matching its pinned version (e.g. v3.2.0), leaving the operator stuck on the old version. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6e20b57 to
e14c7a9
Compare
|
Upgrades are actually not working for OLM (only affect 4.20->4.21 right now). https://redhat.atlassian.net/browse/OCPBUGS-86778 Let's test this via: |
|
@gcs278: trigger 0 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command |
|
/payload-job periodic-ci-openshift-release-main-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-upgrade |
|
@gcs278: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/71de0a60-5b7e-11f1-9d30-2d7158c6abc3-0 |
Summary
isNoOLMFeatureGateEnabledto always return false since theGatewayAPIWithoutOLMfeature gate has not been backported to openshift/api yetCherry-picked commits
cf1f8260f2NE-2292: Add Gateway API OLM to NO-OLM migration upgrade test (NE-2561: Add Gateway API OLM to NO-OLM migration upgrade test #30897)8ef51c3945OCPBUGS-83267: Use upgrades.Skippable for Gateway API upgrade test skip logic3f8a12d619OCPBUGS-83281: Fix Gateway cleanup in parallel e2e test workerse29073f79dfail the test if it fails on test cleanupca41c3642dupdate the OLM resources cleanup for non-OLM clustersisNoOLMFeatureGateEnabledto return false until feature gate is backportedDependencies
Test plan
go build ./test/extended/router/ ./test/e2e/upgrade/)🤖 Generated with Claude Code