From 82db295a9edf1971933973dcfe273ff23a09b35f Mon Sep 17 00:00:00 2001 From: Komh Date: Fri, 24 Apr 2026 07:35:06 +0000 Subject: [PATCH] [virtualization] VM Migration to a User-Defined Primary Network Fails Because Multus Looks Up a `default` NAD in the Wrong Namespace --- ...Up_a_default_NAD_in_the_Wrong_Namespace.md | 132 ++++++++++++++++++ 1 file changed, 132 insertions(+) create mode 100644 docs/en/solutions/VM_Migration_to_a_User_Defined_Primary_Network_Fails_Because_Multus_Looks_Up_a_default_NAD_in_the_Wrong_Namespace.md diff --git a/docs/en/solutions/VM_Migration_to_a_User_Defined_Primary_Network_Fails_Because_Multus_Looks_Up_a_default_NAD_in_the_Wrong_Namespace.md b/docs/en/solutions/VM_Migration_to_a_User_Defined_Primary_Network_Fails_Because_Multus_Looks_Up_a_default_NAD_in_the_Wrong_Namespace.md new file mode 100644 index 00000000..763e6130 --- /dev/null +++ b/docs/en/solutions/VM_Migration_to_a_User_Defined_Primary_Network_Fails_Because_Multus_Looks_Up_a_default_NAD_in_the_Wrong_Namespace.md @@ -0,0 +1,132 @@ +--- +kind: + - Troubleshooting +products: + - Alauda Container Platform +ProductsVersion: + - 4.1.0,4.2.x +--- +## Issue + +A migration plan from a VMware source to ACP Virtualization maps the VM's network to a cluster **user-defined primary network** (a UDN defined as the namespace's default pod network, rather than a secondary network). The migration reports success at the plan level, but the target VMI does not come up — the `virt-launcher` pod fails to start with a Multus error: + +```text +ERRORED: error configuring pod [cudn-ns/virt-launcher-] + networking: Multus: [cudn-ns/virt-launcher-]: + error loading k8s delegates k8s args: + TryLoadPodDelegates: error in loading K8S cluster default network from pod annotation: + tryLoadK8sPodDefaultNetwork: failed getting the delegate: + getKubernetesDelegate: cannot find a network-attachment-definition (default) + in namespace (kube-system): + networkattachmentdefinition.k8s.cni.cncf.io "default" not found +``` + +Multus cannot resolve the pod's default network: the migration-toolkit has configured the pod to fetch a NAD named `default` from the `kube-system` namespace, which does not exist on the cluster (the UDN's default NAD lives in the **VM's own namespace**, not in the CNI operator's namespace). + +## Root Cause + +The migration toolkit, when handed a mapping where the source VM's network becomes the destination's UDN primary network, synthesises a Multus annotation on the `virt-launcher` pod that names the default network to attach. In affected toolkit versions, that annotation names the cluster-wide NAD location (a conventional CNI-operator namespace) rather than the UDN's per-namespace default NAD. + +Multus reads the annotation literally, looks for the named NAD in the named namespace, does not find it, and aborts pod startup. The VM never boots on the destination cluster. + +There is currently no documented user-level workaround that does not involve editing the reconciled `virt-launcher` pod spec directly (which the migration toolkit immediately reconciles back). The fix is in the migration toolkit: the synthesised annotation needs to point at the UDN's per-namespace default NAD. That fix has been tracked by engineering and will ship in an upcoming toolkit release. + +## Resolution + +### Preferred — upgrade the migration toolkit + +Follow the operator-upgrade channel for the migration toolkit to a release that includes the UDN-primary-network fix. After the upgrade, re-run the migration plan; the toolkit now writes the correct NAD reference into the `virt-launcher` annotation and Multus resolves the default network from the destination namespace. + +Verify: + +```bash +kubectl -n get csv -o custom-columns='NAME:.metadata.name,VERSION:.spec.version' | \ + grep -i forklift +``` + +Check the version against the fix's release notes. + +### Workaround while the upgrade is pending + +Two options, both time-boxed: + +**Map the VM's network to a secondary (non-primary) UDN.** A secondary-network UDN migration does not rely on the broken primary-network annotation path; it attaches the UDN through the regular Multus secondary-network mechanism, which works correctly. Adjust the plan's network-mapping to point at a secondary NAD in the destination namespace: + +```yaml +apiVersion: forklift.konveyor.io/v1beta1 +kind: NetworkMap +metadata: + name: my-network-map +spec: + map: + - source: + namespace: + name: + destination: + # Instead of the default pod network (UDN primary), map to a + # namespaced secondary NAD. + namespace: + name: + type: multus +``` + +Trade-off: the VM's primary interface is now a secondary network in Kubernetes terms. If the workload expects the VM's default route to go through this network, adjust the in-guest routing table accordingly. + +**Migrate to the cluster's pod default network, not a UDN.** If the VM does not strictly need to be on the UDN, map the network to the plain `pod: {}` default. The toolkit's default-pod-network path is not affected by this bug: + +```yaml +spec: + map: + - source: + namespace: + name: + destination: + type: pod # regular cluster pod default network +``` + +The VM runs on the default SDN; UDN integration has to wait for the upgrade. + +### Do not + +- Do not hand-edit the `virt-launcher` pod after migration to fix the annotation. The pod is reconciled by the VM operator and your change reverts in seconds, leaving the VM stuck on the next reconcile. +- Do not create a NAD named `default` in `kube-system` to satisfy the broken lookup. That namespace is managed by the CNI operator and manual resources there are unpredictable. + +## Diagnostic Steps + +Confirm the signature on the failing `virt-launcher`: + +```bash +NS= +POD=$(kubectl -n "$NS" get pod -l kubevirt.io/domain= \ + -o jsonpath='{.items[0].metadata.name}') +kubectl -n "$NS" describe pod "$POD" | \ + grep -A4 -E 'Multus|tryLoadK8sPodDefaultNetwork|NetworkAttachment' +``` + +The `networkattachmentdefinition.k8s.cni.cncf.io "default" not found in namespace ` line identifies this bug — note the `` the toolkit is pointing at (the CNI-operator namespace), not the VM's own namespace. + +Read the pod's annotations to see the broken reference: + +```bash +kubectl -n "$NS" get pod "$POD" \ + -o jsonpath='{.metadata.annotations}{"\n"}' | jq '."k8s.v1.cni.cncf.io/networks"' +``` + +A reference to `/default` (rather than `/`) is the bug's footprint. + +Check the destination cluster's actual NAD layout for comparison: + +```bash +# The UDN's primary NAD sits in the VM's own namespace. +kubectl -n "$NS" get network-attachment-definitions +``` + +The expected default NAD for the UDN should appear in this listing. When the upgrade rolls out, the toolkit's annotation will correctly target this NAD name. + +After applying the fix (or the workaround), re-run the plan and watch `virt-launcher` startup: + +```bash +kubectl -n "$NS" get pod -l kubevirt.io/domain= -w +``` + +`Running` status and a subsequent `VMI Ready` condition confirms the network path works end-to-end.