Environment
- Cozystack: v1.2.2 (verified unchanged on v1.3.0-rc.1 and master)
- Chart:
packages/apps/kubernetes
- Kubernetes: v1.35.0+k3s3
Symptom
helm install for any apps.cozystack.io/Kubernetes application fails with context deadline exceeded after the 5-minute chart wait. The underlying error is a mount-time failure on three Deployments inside the chart:
FailedMount: MountVolume.SetUp failed for volume "kubeconfig":
secret "<release>-admin-kubeconfig" not found
The affected workloads are:
<release>-cluster-autoscaler
<release>-kccm (cloud-controller-manager)
<release>-kcsi-controller (CSI controller)
HelmRelease / HelmController output
creating 35 resource(s)
beginning wait for 35 resources with timeout of 5m0s
Deployment is not ready: <ns>/<release>-cluster-autoscaler. 0 out of 1 expected pods are ready
(148 duplicate lines omitted)
Error received when checking status of resource <release>-cluster-autoscaler.
Error: 'client rate limiter Wait returned an error: context deadline exceeded'
wait for resources failed after 5m0s: context deadline exceeded
Root cause
The chart renders 35 resources in a single Helm release and relies on the default Helm wait to serialize their readiness. Three workloads in that set mount <release>-admin-kubeconfig as a non-optional Secret volume:
packages/apps/kubernetes/templates/cluster-autoscaler/deployment.yaml → secretName: {{ .Release.Name }}-admin-kubeconfig
- Same pattern for
kccm/manager.yaml and csi/controller.yaml
That Secret is not produced by the chart — it is produced asynchronously by the CAPI Kubeadm control-plane provider after the Cluster CR (also created in this same release) has finished provisioning the workload-cluster control plane. On clusters where control-plane provisioning exceeds the 5-minute --wait budget (typical, especially on first boot / slow node images), the Deployment pods stay ContainerCreating for the whole window, Helm reports them not-ready, and the install times out.
CAPI operators who followed along:
Cluster kubernetes-demo-k8s PHASE: Provisioning AGE: 3m15s
The Secret appears only once the control plane reaches Ready, but by then the Helm operation has already failed and the HelmRelease is marked install-failed.
Reproduction
- Cozystack v1.2.2 (also reproduced with the chart from v1.3.0-rc.1).
- Create a Tenant (any name).
- Create an
apps.cozystack.io/Kubernetes Application inside that tenant with default values.
- Wait ~5 minutes. The HelmRelease transitions to
install-failed.
Suggested fixes (pick one)
Ordered from most surgical to most invasive:
-
Mark the admin-kubeconfig secret volume optional on the three Deployments. The workloads would start, crash-loop briefly while the Secret is absent, and recover once CAPI produces it. Single-attribute change per Deployment (optional: true on the secret volume source). Avoids the helm-wait timeout entirely but trades it for a short CrashLoopBackOff window.
-
Move cluster-autoscaler / kccm / kcsi-controller into a separate HelmRelease that dependsOn the workload-cluster control-plane Secret being present. This is the FluxCD-idiomatic shape and mirrors how the chart already splits kubernetes-demo-k8s-cilium, kubernetes-demo-k8s-coredns, etc. as child HelmReleases that dependsOn the parent (visible in the HelmRelease list during reproduction).
-
Add an init-container per Deployment that polls for the Secret with a short sleep loop. Inelegant but fully contained.
-
Raise the Helm wait timeout in the cozystack-operator past the 95th-percentile CAPI provisioning time (15m). This only papers over the race — a slow node can still miss the window — but at least tightens the loop.
Why this matters
First-time-user experience: creating a Kubernetes cluster from the dashboard produces an install-failed HelmRelease, which the UI surfaces as Failed. The cluster eventually does come up (CAPI keeps going even after the HelmRelease gives up), but the retry path is non-obvious — operators typically flux suspend && resume the HelmRelease or delete and recreate the Application.
Related
No matching existing issues found in the repo for admin-kubeconfig, FailedMount, cluster-autoscaler timeout. The last commit touching templates/cluster-autoscaler/ is f6d4541 (March 2025, resource limits). The chart structure has been in this shape since the v0.0.1 prep in February 2024.
Environment
packages/apps/kubernetesSymptom
helm installfor anyapps.cozystack.io/Kubernetesapplication fails withcontext deadline exceededafter the 5-minute chart wait. The underlying error is a mount-time failure on three Deployments inside the chart:The affected workloads are:
<release>-cluster-autoscaler<release>-kccm(cloud-controller-manager)<release>-kcsi-controller(CSI controller)HelmRelease / HelmController output
Root cause
The chart renders 35 resources in a single Helm release and relies on the default Helm wait to serialize their readiness. Three workloads in that set mount
<release>-admin-kubeconfigas a non-optional Secret volume:packages/apps/kubernetes/templates/cluster-autoscaler/deployment.yaml→secretName: {{ .Release.Name }}-admin-kubeconfigkccm/manager.yamlandcsi/controller.yamlThat Secret is not produced by the chart — it is produced asynchronously by the CAPI Kubeadm control-plane provider after the
ClusterCR (also created in this same release) has finished provisioning the workload-cluster control plane. On clusters where control-plane provisioning exceeds the 5-minute--waitbudget (typical, especially on first boot / slow node images), the Deployment pods stayContainerCreatingfor the whole window, Helm reports them not-ready, and the install times out.CAPI operators who followed along:
The Secret appears only once the control plane reaches
Ready, but by then the Helm operation has already failed and the HelmRelease is markedinstall-failed.Reproduction
apps.cozystack.io/KubernetesApplication inside that tenant with default values.install-failed.Suggested fixes (pick one)
Ordered from most surgical to most invasive:
Mark the admin-kubeconfig secret volume optional on the three Deployments. The workloads would start, crash-loop briefly while the Secret is absent, and recover once CAPI produces it. Single-attribute change per Deployment (
optional: trueon thesecretvolume source). Avoids the helm-wait timeout entirely but trades it for a short CrashLoopBackOff window.Move cluster-autoscaler / kccm / kcsi-controller into a separate HelmRelease that
dependsOnthe workload-cluster control-plane Secret being present. This is the FluxCD-idiomatic shape and mirrors how the chart already splitskubernetes-demo-k8s-cilium,kubernetes-demo-k8s-coredns, etc. as child HelmReleases thatdependsOnthe parent (visible in the HelmRelease list during reproduction).Add an init-container per Deployment that polls for the Secret with a short sleep loop. Inelegant but fully contained.
Raise the Helm wait timeout in the cozystack-operator past the 95th-percentile CAPI provisioning time (15m). This only papers over the race — a slow node can still miss the window — but at least tightens the loop.
Why this matters
First-time-user experience: creating a Kubernetes cluster from the dashboard produces an
install-failedHelmRelease, which the UI surfaces asFailed. The cluster eventually does come up (CAPI keeps going even after the HelmRelease gives up), but the retry path is non-obvious — operators typicallyflux suspend && resumethe HelmRelease or delete and recreate the Application.Related
No matching existing issues found in the repo for
admin-kubeconfig,FailedMount,cluster-autoscaler timeout. The last commit touchingtemplates/cluster-autoscaler/isf6d4541(March 2025, resource limits). The chart structure has been in this shape since the v0.0.1 prep in February 2024.