diff --git a/docs/en/solutions/Pod_Pending_didnt_find_available_persistent_volumes_to_bind_Local_PVs_Pin_to_a_Specific_Node.md b/docs/en/solutions/Pod_Pending_didnt_find_available_persistent_volumes_to_bind_Local_PVs_Pin_to_a_Specific_Node.md new file mode 100644 index 00000000..32385ac2 --- /dev/null +++ b/docs/en/solutions/Pod_Pending_didnt_find_available_persistent_volumes_to_bind_Local_PVs_Pin_to_a_Specific_Node.md @@ -0,0 +1,186 @@ +--- +kind: + - Troubleshooting +products: + - Alauda Container Platform +ProductsVersion: + - 4.1.0,4.2.x +--- +## Issue + +A pod (typically a StatefulSet member — Prometheus, database, queue broker) stays Pending. `kubectl describe pod` shows the scheduler found **no** node where both a matching local PV and the pod's own node affinity can coexist: + +```text +0/6 nodes are available: + 3 node(s) didn't find available persistent volumes to bind, + 3 node(s) didn't match Pod's node affinity/selector. +preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling. +``` + +The message is unusually clear about the intersection failure: three nodes carry local PVs but fail the pod's nodeSelector, and the other three nodes satisfy the nodeSelector but have no matching local PV. The scheduler has no candidate that satisfies both constraints simultaneously, so the pod never lands. + +## Root Cause + +Local PVs created by the Local Storage Operator (or any equivalent tool that provisions PVs from a node's attached disks) carry a `nodeAffinity` field that hard-pins the PV to the specific node whose disk it was cut from: + +```yaml +apiVersion: v1 +kind: PersistentVolume +metadata: + name: local-pv-abcd +spec: + nodeAffinity: + required: + nodeSelectorTerms: + - matchExpressions: + - key: kubernetes.io/hostname + operator: In + values: [node-3] +``` + +That affinity is a hard constraint: the PV can only be consumed by a pod that runs on the named node. The PV cannot "migrate" to another node in any meaningful sense — the disk it represents is physically attached to one machine. + +When a pod's template carries its own `nodeSelector` (or `affinity` stanza) that narrows the pod to a *different* set of nodes — for example, a monitoring pod that expects to run on the control plane, while the local PVs were carved out of worker disks — the intersection of the two constraints is empty. The scheduler truthfully reports "no node satisfies both" and the pod stays Pending. + +Neither constraint can be relaxed without consequences: dropping the PV's nodeAffinity would allow the pod to try to mount a disk that isn't there, and dropping the pod's nodeSelector might land the pod on a node that lacks the unrelated requirements the selector was meant to enforce. + +## Resolution + +Align the pod's scheduling constraints with the nodes that own the local PVs. + +### Step 1 — enumerate which nodes hold the local PVs the pod expects + +```bash +# List the bound PVs and their hostname affinity. +kubectl get pv -o json | \ + jq -r '.items[] + | select(.spec.nodeAffinity != null) | + (.spec.nodeAffinity.required.nodeSelectorTerms[0] + .matchExpressions[] | + select(.key == "kubernetes.io/hostname") | + .values[]) as $host | + "\(.metadata.name)\t\(.status.phase)\t\($host)\t\(.spec.storageClassName)"' +``` + +Output: + +```text +local-pv-abcd Available node-3 local-block +local-pv-efgh Available node-4 local-block +local-pv-ijkl Available node-5 local-block +``` + +The PVs that the pod's PVC could bind to are the rows whose `storageClassName` matches the PVC's storage class. Note their `hostname` values — those are the nodes the pod **must** be able to land on. + +### Step 2 — inspect the pod's current scheduling constraints + +```bash +kubectl -n get deployment -o jsonpath='{.spec.template.spec.affinity}{"\n"}' | jq +kubectl -n get deployment -o jsonpath='{.spec.template.spec.nodeSelector}{"\n"}' | jq +``` + +Identify whatever `nodeSelector` or `affinity` is narrowing the pod to nodes that do not include the PV-owning ones. + +### Step 3 — update the pod's template to permit the PV-owning nodes + +Two shapes, pick the one closer to the intent: + +**If the pod's nodeSelector is incidental** (e.g. a label was copied from another workload without thought), remove it. The scheduler will then freely place the pod wherever both the node-affinity of the PV and normal constraints allow: + +```yaml +# Before +spec: + template: + spec: + nodeSelector: + node-role.kubernetes.io/infra: "" + +# After (nodeSelector removed) +spec: + template: + spec: + # Let the PVC's PV-level nodeAffinity drive placement. + # The scheduler will place this pod on whichever node has the local PV. +``` + +**If the pod's nodeSelector is intentional** (e.g. the pod really should run on a specific subset of nodes), **provision local PVs on those nodes instead**. Either extend the Local Storage Operator's `LocalVolumeSet` / equivalent CR to include the intended-placement nodes, or schedule the workload such that the PV-owning nodes are included in the selector: + +```yaml +# LocalVolumeSet with a broader nodeSelector. +apiVersion: local.storage.alauda.io/v1alpha1 +kind: LocalVolumeSet +metadata: + name: local-block-osds +spec: + nodeSelector: + nodeSelectorTerms: + - matchExpressions: + - key: node-role.kubernetes.io/infra + operator: Exists + # ... filters ... +``` + +Then wait for new PVs to provision on the `infra` nodes before re-deploying the pod. + +### Step 4 — watch the pod schedule + +After aligning the constraints, trigger a new pod attempt (rollout restart, or simply delete the pending pod so the controller recreates it): + +```bash +kubectl -n rollout restart deployment/ +# or +kubectl -n delete pod -l +``` + +Monitor: + +```bash +kubectl -n get pod -l -w +``` + +The pod should transition to `Running` within a scheduling cycle. Verify the bound PV: + +```bash +kubectl -n get pvc -o jsonpath='{.spec.volumeName}{"\n"}' +``` + +The returned PV name should be one from the list in Step 1. + +## Diagnostic Steps + +Confirm the specific failure pattern. The telling part is the scheduler's two-clause explanation: + +```bash +kubectl get events -n --field-selector reason=FailedScheduling | \ + grep "didn't find available persistent volumes to bind" +``` + +Finding this message with the complementary `didn't match Pod's node affinity/selector` in the same line is the exact signature — if the messages differ (e.g. "didn't have free ports", "exceeded quota"), the issue is different. + +Cross-check the PVC's binding state: + +```bash +kubectl -n get pvc +``` + +A PVC in `Pending` state that should have been bound to a local PV is the downstream symptom. The events on the PVC itself will repeat the scheduler's complaint: + +```bash +kubectl -n describe pvc +``` + +List candidate PVs that could satisfy the PVC (matching storage class, available, capacity large enough): + +```bash +PVC_SC= +PVC_SIZE_GIB= +kubectl get pv -o json | \ + jq -r --arg sc "$PVC_SC" '.items[] + | select(.spec.storageClassName == $sc) + | select(.status.phase == "Available") + | "\(.metadata.name) \(.spec.capacity.storage) \(.spec.nodeAffinity.required.nodeSelectorTerms[0].matchExpressions[0].values[])"' +``` + +If no PV shows up, the issue is not scheduling but provisioning — the Local Storage Operator has not carved enough PVs. If PVs show up on nodes that the pod's template forbids, the fix is Step 3 above. + +After the fix, the PVC transitions to `Bound`, the pod to `Running`, and no further `FailedScheduling` events should accrue.