From 1b7672e11159a1eb6aa5f5c66f3d80322557574f Mon Sep 17 00:00:00 2001 From: Komh Date: Sun, 26 Apr 2026 02:58:01 +0000 Subject: [PATCH] [storage] PostgreSQL StatefulSet CrashLoops with Wrong Data-Directory Ownership --- ...ops_with_Wrong_Data_Directory_Ownership.md | 127 ++++++++++++++++++ 1 file changed, 127 insertions(+) create mode 100644 docs/en/solutions/PostgreSQL_StatefulSet_CrashLoops_with_Wrong_Data_Directory_Ownership.md diff --git a/docs/en/solutions/PostgreSQL_StatefulSet_CrashLoops_with_Wrong_Data_Directory_Ownership.md b/docs/en/solutions/PostgreSQL_StatefulSet_CrashLoops_with_Wrong_Data_Directory_Ownership.md new file mode 100644 index 00000000..ee89e324 --- /dev/null +++ b/docs/en/solutions/PostgreSQL_StatefulSet_CrashLoops_with_Wrong_Data_Directory_Ownership.md @@ -0,0 +1,127 @@ +--- +kind: + - Troubleshooting +products: + - Alauda Container Platform +ProductsVersion: + - 4.1.0,4.2.x +--- +## Issue + +A PostgreSQL pod that runs as part of a managed service (object-storage metadata, operator internal state, audit DB) enters `CrashLoopBackOff` after a restore, PVC remount, or pod reschedule. The container log fails early: + +```text +FATAL: data directory "/var/lib/pgsql/data/userdata" has wrong ownership +HINT: The server must be started by the user that owns the data directory. +``` + +The PVC still holds the correct data — nothing is lost — but the pod cannot reach `ready` because the in-container user and the on-disk ownership disagree. + +## Root Cause + +PostgreSQL refuses to start when the UID that owns its data directory differs from the UID the server process runs as. Two common ways this gets out of sync: + +- **Credential Secret was rotated without updating on-disk ownership**. The operator that manages the DB reads a Secret such as `-credentials` to derive the container's `postgres` user; when that Secret is regenerated with a new username, the running container picks up the new UID but the PVC's data directory is still owned by the previous one. +- **Restore or node move left stale ownership**. Backups restored with a different storage class (CSI driver A → driver B) or an `fsGroup` change on the StatefulSet can swap the directory owner without touching Postgres's expectation. + +Either way, the data is intact — only the owner metadata is wrong. + +## Resolution + +Pick whichever side is correct: the on-disk ownership is what the data *actually* has; the Secret is what the container *thinks* the user is. Reconcile by changing the one that's behind. + +### Option A — Fix the on-disk ownership to match the container user + +This is the right option when the Secret is authoritative (operator-driven rotation was intentional). + +1. **Discover the container's expected user**. For the NooBaa-style deployment in ACP object storage, the user lives in the credentials Secret: + + ```bash + NS= # e.g. cpaas-system or the operator's own namespace + SECRET= + EXPECTED_USER=$(kubectl -n "$NS" get secret "$SECRET" -o jsonpath='{.data.user}' | base64 -d) + echo "expected user: $EXPECTED_USER" + ``` + +2. **Chown the data directory inside the pod's filesystem**. Use a debug session so the PVC is mounted but Postgres isn't trying to start: + + ```bash + kubectl debug node/ -it \ + --image=registry.alauda.cn:60070/tkestack/pause:3.10 \ + -- chroot /host sh -c ' + # Find the kubelet subPath for the PVC + find /var/lib/kubelet/pods -path "*/volumes/*noobaa-db-storage*" -type d -name "mount" | head -1 + ' + ``` + + In most clusters it's faster to start an ephemeral container attached to the pod: + + ```bash + kubectl -n "$NS" debug --image=alpine --target= -- \ + sh -c 'chown -R postgres:postgres /var/lib/pgsql/data/userdata && ls -ld /var/lib/pgsql/data/userdata' + ``` + +3. **Delete the StatefulSet pod** so it reconciles onto the freshly-owned directory: + + ```bash + kubectl -n "$NS" delete pod + ``` + +### Option B — Fix the Secret to match what's on disk + +This is the right option when the data directory's ownership is authoritative (post-restore, post-migration). + +1. **Read the on-disk owner**: + + ```bash + kubectl -n "$NS" debug --image=alpine --target= -- \ + ls -ld /var/lib/pgsql/data/userdata + # drwx------ 1 54321 54321 4096 Mar 10 09:00 /var/lib/pgsql/data/userdata + ``` + + Convert that numeric UID (or username) to the string form the Secret uses. For systems that store the user as a text name in the Secret, use `getent passwd ` inside the container to find the name; if the image has no such user, pick the string the operator expects (check the operator's reconciliation code or its reconciled `Deployment`/`StatefulSet` `env:`). + +2. **Patch the Secret**: + + ```bash + NEW_USER= + kubectl -n "$NS" patch secret "$SECRET" --type merge \ + -p "{\"stringData\":{\"user\":\"$NEW_USER\"}}" + ``` + +3. **Restart the pod**: + + ```bash + kubectl -n "$NS" delete pod + ``` + +### Guardrails + +- Operator-managed workloads: the operator may overwrite your change on reconcile. Always fix the side the operator considers authoritative (for most DB operators, that's the Secret). +- Never chown under a running Postgres — even a stopped container in CrashLoop is safer than a Postgres that's mid-startup when ownership flips. +- Backups: snapshot the PVC (if the CSI driver supports it) before either change. A wrong chown on a StatefulSet data directory is easy to reverse; a wrong Secret change that then triggers an automated re-init is not. + +## Diagnostic Steps + +Confirm the pod's error matches this failure mode: + +```bash +kubectl -n logs --previous \ + | grep -E 'data directory.*wrong ownership|FATAL|HINT' +``` + +Read both sides explicitly and compare: + +```bash +# Container-expected user +kubectl -n get secret \ + -o jsonpath='{.data.user}' | base64 -d ; echo + +# Actual on-disk owner +kubectl -n debug --image=alpine --target=
-- \ + ls -ld /var/lib/pgsql/data/userdata +``` + +If the two strings/UIDs disagree, apply Option A or Option B accordingly. If they agree and the pod still fails, the error is in a different field — usually Postgres complaining about `pg_hba.conf`, WAL corruption, or a locked-out superuser; follow the specific log line rather than assuming ownership. + +After restart, the pod should reach `Running` and `Ready` within one reconcile cycle (~30s). If the pod goes back into CrashLoop, the operator likely reverted your change — validate that the operator now considers the two sides consistent before the next reconcile window.