From 25086996dfa4d62abddec63a61b83a1a63931f67 Mon Sep 17 00:00:00 2001 From: Komh Date: Fri, 24 Apr 2026 07:59:26 +0000 Subject: [PATCH] [configure] Egress IP for Hosted Control Plane Tenant Clusters Running on ACP Virtualization --- ..._Clusters_Running_on_ACP_Virtualization.md | 105 ++++++++++++++++++ 1 file changed, 105 insertions(+) create mode 100644 docs/en/solutions/Egress_IP_for_Hosted_Control_Plane_Tenant_Clusters_Running_on_ACP_Virtualization.md diff --git a/docs/en/solutions/Egress_IP_for_Hosted_Control_Plane_Tenant_Clusters_Running_on_ACP_Virtualization.md b/docs/en/solutions/Egress_IP_for_Hosted_Control_Plane_Tenant_Clusters_Running_on_ACP_Virtualization.md new file mode 100644 index 00000000..ad04d2d4 --- /dev/null +++ b/docs/en/solutions/Egress_IP_for_Hosted_Control_Plane_Tenant_Clusters_Running_on_ACP_Virtualization.md @@ -0,0 +1,105 @@ +--- +kind: + - Information +products: + - Alauda Container Platform +ProductsVersion: + - 4.1.0,4.2.x +--- +## Overview + +A tenant cluster provisioned through Hosted Control Planes (HCP) runs its worker "nodes" as VMs inside an ACP Virtualization management cluster. Each tenant-cluster worker is therefore a `virt-launcher` pod at the management layer — and the tenant cluster's egress traffic has to cross two OVN-Kubernetes SDN layers to leave the physical network: the tenant cluster's own SDN, then the management cluster's SDN, then the host's physical NIC. + +A common migration requirement is **static egress IP per application namespace**: traffic from a specific namespace in the tenant cluster should egress with a predictable IP the upstream firewall can allowlist. On a traditional hypervisor (vSphere, KVM bare metal), this is straightforward — the hypervisor can translate a namespace-level egress to a chosen physical IP. On HCP-on-Virtualization, the layering is more complex: the tenant's OVN-Kubernetes decides the namespace-level egress, then the management cluster's OVN-Kubernetes has to let that egress pass through a bridge to a bonded NIC that carries the chosen IP. + +## Current support + +The OVN-Kubernetes egress-IP feature has **platform-support constraints** documented at the release level. As of current supported builds, the feature is supported in two cases: + +- The egress IP is carried on the **primary host network** (the node's normal pod-cluster egress path). +- The tenant cluster runs on a supported hypervisor platform list (check the platform's egress-IP platform-support matrix for the current version). + +Running the tenant cluster as VMs on another ACP cluster (HCP-on-Virtualization) is a **stacked-cluster** topology. The feature's end-to-end correctness depends on both the management cluster's OVN and the tenant cluster's OVN cooperating over a shared secondary network, which requires extra plumbing that the feature's current implementation does not automatically set up. + +This case is being tracked as a platform RFE; until the RFE lands, Egress IP is not the right tool for tenant-cluster namespace-level egress pinning. + +## Workarounds until the feature supports HCP-on-Virtualization + +Two patterns are available today. Pick based on the operational requirement. + +### Option A — tenant-side egress routing via NodePort / external LB + +If the application's egress target is an HTTP / API endpoint, route egress through a **management-cluster** LoadBalancer or external LB that you control. The tenant application calls a cluster-local service proxy (Envoy, Squid, Traefik, or a hand-rolled proxy) that lives in a management-cluster namespace you do own, and the proxy egresses from a specific management-cluster node whose physical NIC carries the allow-listed IP. + +- Trade-off: egress is no longer transparent — tenant applications must configure their HTTP client to use the proxy. +- Benefit: the egress IP is a management-cluster artifact, so any stacked-cluster gymnastics happens in one place. + +### Option B — network-level egress via a dedicated bridge on the management cluster + +If the egress must be IP-transparent (the application has no proxy support), provision a dedicated secondary network on the management cluster that pins to a specific physical NIC / IP, and attach the tenant-cluster workers to that network as a secondary interface. Tenant traffic that should egress through the allowlisted path uses the secondary interface as the default route; other traffic uses the primary tenant SDN. + +Requires, on the management cluster: + +- A `NetworkAttachmentDefinition` referencing the chosen physical NIC via OVN's `localnet` or bridge topology. +- Coordination with the node configuration so the physical NIC is bonded / tagged / allowed to carry the egress traffic. + +Requires, on the tenant cluster: + +- A per-namespace routing policy that directs target traffic through the secondary interface. +- DNS / firewall coordination so the allowlist on upstream receivers sees the fixed IP. + +Trade-off: this is a substantial network design change. Document the layers so the team that later debugs a routing issue understands that tenant traffic leaves through a management-layer NIC that is not obvious from the tenant's own routing tables. + +### Option C — wait for the RFE-tracked feature + +For teams that can tolerate the wait, Egress IP support for HCP-on-Virtualization is on the roadmap. Track platform release notes and re-evaluate the native feature when it becomes supported in the cluster's version. + +## Why this is more involved than on a raw hypervisor + +On a vSphere-hosted tenant: + +1. The vSphere virtual switch sees the VM's traffic. +2. vSphere's NIOC / port-group configuration can pin the VM's egress to a specific uplink with a specific IP on the physical network. +3. No coordination required between tenant-cluster SDN and hypervisor SDN — vSphere handles the mapping. + +On HCP-on-Virtualization: + +1. The tenant SDN (OVN-Kubernetes inside the tenant cluster) decides which `EgressIP` applies. +2. The traffic lands on the virtual node (a `virt-launcher` pod's network), which is on the **management** cluster's pod network. +3. The management cluster's OVN-Kubernetes now has to honour the tenant's egress choice by forwarding the traffic through the right bridge to the right physical NIC. + +That multi-layer coordination — letting a tenant-cluster configuration dictate management-cluster network behaviour — is the part the RFE adds. Doing it today requires the manual bridge configuration (Option B) or stepping outside the native feature entirely (Option A). + +## Diagnostic Steps + +Confirm the topology first. On the management cluster: + +```bash +# List tenant clusters (HostedControlPlane objects). +kubectl get hostedcontrolplane -A -o \ + custom-columns='NAME:.metadata.name,NS:.metadata.namespace,PLATFORM:.spec.platform.type' +``` + +A `PLATFORM: KubeVirt` (or the equivalent platform enum for ACP Virtualization-hosted tenants) confirms the tenant's workers are VMs on this cluster. + +Inspect a tenant-node's underlying `virt-launcher` pod to see the management-cluster network it is attached to: + +```bash +# Pick a tenant node to inspect. +MGMT_NS= +VM= + +kubectl -n "$MGMT_NS" get pod -l kubevirt.io/domain="$VM" \ + -o jsonpath='{.items[0].spec.containers[*].volumeMounts}{"\n"}{.items[0].metadata.annotations}{"\n"}' +``` + +The `k8s.v1.cni.cncf.io/networks` annotation lists every network attached to the VM. Egress IP via Option B adds another entry here. + +From inside the tenant cluster, list any `EgressIP` objects the tenant's OVN-Kubernetes would honour if supported: + +```bash +# Run inside the tenant cluster (use kubeconfig for the hosted cluster). +kubectl get egressip -o yaml +``` + +If `EgressIP` objects exist but tenant egress does not actually leave through the expected IP, that confirms the feature is being requested on a topology it does not yet fully support. The workarounds above are the way forward until native support lands.