Skip to content

[release-4.21] Backport Gateway API upgrade test#31232

Draft
gcs278 wants to merge 18 commits into
openshift:release-4.21from
gcs278:backport-gwapi-upgrade-test-4.21
Draft

[release-4.21] Backport Gateway API upgrade test#31232
gcs278 wants to merge 18 commits into
openshift:release-4.21from
gcs278:backport-gwapi-upgrade-test-4.21

Conversation

@gcs278
Copy link
Copy Markdown
Contributor

@gcs278 gcs278 commented May 29, 2026

Summary

Cherry-picked commits

  • cf1f8260f2 NE-2292: Add Gateway API OLM to NO-OLM migration upgrade test (NE-2561: Add Gateway API OLM to NO-OLM migration upgrade test #30897)
  • 8ef51c3945 OCPBUGS-83267: Use upgrades.Skippable for Gateway API upgrade test skip logic
  • 3f8a12d619 OCPBUGS-83281: Fix Gateway cleanup in parallel e2e test workers
  • e29073f79d fail the test if it fails on test cleanup
  • ca41c3642d update the OLM resources cleanup for non-OLM clusters
  • Stub commit: hardcode isNoOLMFeatureGateEnabled to return false until feature gate is backported

Dependencies

Test plan

  • Tests compile (go build ./test/extended/router/ ./test/e2e/upgrade/)
  • Gateway API upgrade test runs and validates OLM-based provisioning
  • No OLM code paths are dead (feature gate always off)

🤖 Generated with Claude Code

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 29, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 09974842-5e90-4bf6-a9cd-653727f01d8b

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 29, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 29, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 29, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: gcs278
Once this PR has been reviewed and has the lgtm label, please assign stbenjam for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ShudiLi and others added 16 commits May 28, 2026 21:12
Update gatewayAPIController tests to skip certain tests with OLM dependencies when
GatewayAPIWithoutOLM FeatureGate is enabled This will unblock
openshift/cluster-ingress-operator#1354 by skipping any
tests that require OLM capabilities without causing failures in the
origin testing.

JIRA link: https://redhat.atlassian.net/browse/NE-2292
This includes 5 unique tests which are to be used to graduate
the featuregate from techpreview to GA. There are some tests
which fall common within gatewayAPIController and
GatewayAPIWithoutOLM.

JIRA: https://redhat.atlassian.net/browse/NE-2292
Dual-stack support for Gateway API is not yet declared, skip the
GatewayAPIController tests on AWS dual-stack clusters.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add support for running Gateway API e2e tests on vSphere and baremetal
while gracefully handling missing LoadBalancer and DNS capabilities.

On vsphere/baremetal without LoadBalancer/DNS:
- Tests GatewayClass, Gateway, HTTPRoute creation/attachment
- Tests OSSM/Istio integration
- Skips LoadBalancer service validation
- Skips DNS record validation
- Skips HTTP connectivity tests

https://redhat.atlassian.net/browse/NE-2286

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixed multiple instances where o.Expect(err).NotTo(o.HaveOccurred())
was used inside wait.PollUntilContextTimeout loops, causing tests to
fail immediately instead of retrying when resources were not found.

The pattern now matches the existing Subscription check which correctly
handles errors by logging and returning false to retry.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Gateway API tests fail on IPv6 and dual-stack clusters, particularly on
baremetal platforms where catalog sources are typically disabled. This
prevents OSSM operator installation via OLM.

Replace AWS-specific dual-stack check with platform-agnostic detection
that checks the cluster's ServiceNetwork CIDRs for IPv6 addressing.

This will skip Gateway API tests on:
- Baremetal/vSphere/EquinixMetal IPv6 or dual-stack clusters
- AWS dual-stack clusters
- Any other platform with IPv6 networking

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add upgrade test validating Gateway API migration from OLM-based Istio
to CIO-managed Sail Library during 4.21 to 4.22 upgrades.

Setup creates Gateway/HTTPRoute with OLM provisioning and tests
connectivity. Test validates migration: Gateway remains programmed,
Istiod running, Istio CRDs stay OLM-managed, GatewayClass has CIO
finalizer, Istio CR deleted, subscription persists. Teardown cleans
up all resources.
…ip logic

The Gateway API upgrade test was calling g.Skip() from Setup(), which
runs inside a goroutine managed by the disruption framework. Since
g.Skip() panics and Ginkgo can only recover panics inside leaf nodes,
this caused unrecoverable panics on IPv6/dual-stack, OKD, and
unsupported platform clusters.

Implement the upgrades.Skippable interface with a Skip() method that
the disruption framework calls before Setup, avoiding the goroutine
panic. Refactor checkPlatformSupportAndGetCapabilities into
shouldSkipGatewayAPITests (safe outside Ginkgo nodes) and
getPlatformCapabilities (returns LB/DNS support).

https://redhat.atlassian.net/browse/OCPBUGS-83267

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Gateway API controller tests tracked Gateways in a shared
in-memory gateways slice, deleting them during AfterEach cleanup.
However, openshift-tests distributes tests across separate parallel
worker processes. The annotation-based checkAllTestsDone coordination
works correctly because annotations are stored on the cluster-scoped
GatewayClass, but the gateways slice is not shared across processes.
The process that runs the final AfterEach cleanup has an empty
gateways slice, so it deletes the GatewayClass and istiod but never
deletes the Gateways created by other processes. This leaves gateway
deployments orphaned on the cluster.

As a secondary issue, even when gateways were deleted, the GatewayClass
and istiod were removed without waiting for the gateway proxy
deployments to be fully cleaned up by GC. Since the deployments have
an owner reference to the Gateway (not a finalizer), the cascade
deletion is asynchronous, creating a race where gateway pods lose
their control plane and crash-loop.

Fix both issues by cleaning up gateways at the individual test level
using defer deleteGateway, which deletes the Gateway and waits for
its proxy deployment to be removed by GC. Add deleteGateway and
waitForGatewayDeploymentDeletion helpers shared by both the controller
tests and the upgrade test Teardown. Cleanup errors now hard fail to
surface leftover resources immediately rather than causing confusing
downstream test failures.

https://redhat.atlassian.net/browse/OCPBUGS-83281

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Grant Spence <gspence@redhat.com>
Co-Authored-By: Ishmam Amin <iamin@redhat.com>
@gcs278 gcs278 force-pushed the backport-gwapi-upgrade-test-4.21 branch from 6bca8a5 to 6e20b57 Compare May 29, 2026 01:12
gcs278 and others added 2 commits May 28, 2026 21:21
The GatewayAPIWithoutOLM feature gate is not available on this
release branch. Hardcode isNoOLMFeatureGateEnabled to return false
so tests run correctly with OLM-based provisioning.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add pre/post upgrade check on the Subscription's status.installedCSV
to detect when the OSSM operator InstallPlan approval flow fails.
This catches the case where OLM generates an InstallPlan for the
channel head (e.g. v3.3.3) but the CIO only approves plans matching
its pinned version (e.g. v3.2.0), leaving the operator stuck on the
old version.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@gcs278 gcs278 force-pushed the backport-gwapi-upgrade-test-4.21 branch from 6e20b57 to e14c7a9 Compare May 29, 2026 15:08
@gcs278
Copy link
Copy Markdown
Contributor Author

gcs278 commented May 29, 2026

Upgrades are actually not working for OLM (only affect 4.20->4.21 right now). https://redhat.atlassian.net/browse/OCPBUGS-86778

Let's test this via:
/payload-job periodic-ci-openshift-release-release-4.21-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-upgrade

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 29, 2026

@gcs278: trigger 0 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

@gcs278
Copy link
Copy Markdown
Contributor Author

gcs278 commented May 29, 2026

/payload-job periodic-ci-openshift-release-main-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-upgrade

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 29, 2026

@gcs278: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-upgrade

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/71de0a60-5b7e-11f1-9d30-2d7158c6abc3-0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants