diff --git a/docs/en/solutions/Measure_the_Time_Between_Pod_Creation_and_Pod_Ready_Using_Prometheus_or_kubectl.md b/docs/en/solutions/Measure_the_Time_Between_Pod_Creation_and_Pod_Ready_Using_Prometheus_or_kubectl.md new file mode 100644 index 00000000..b47d3a81 --- /dev/null +++ b/docs/en/solutions/Measure_the_Time_Between_Pod_Creation_and_Pod_Ready_Using_Prometheus_or_kubectl.md @@ -0,0 +1,84 @@ +--- +kind: + - How To +products: + - Alauda Container Platform +ProductsVersion: + - 4.1.0,4.2.x +--- +## Issue + +Capacity planners and platform SRE want to know how long Pods take to transition from `creationTimestamp` to `Ready`. The intuitive metric `kube_pod_created` is documented upstream by `kube-state-metrics`, but several distributions ship a `kube-state-metrics` Deployment with a `--metric-denylist` that hides every `kube__created` series. The query needs to be rewritten to use the always-exposed lifecycle timestamp metrics. + +## Root Cause + +`kube-state-metrics` accepts a `--metric-denylist` regex argument. Some platform monitoring stacks populate it with patterns like `^kube_.+_created$` to reduce cardinality. As a result, `kube_pod_created` is never scraped, even though it is part of the upstream metric catalog. + +Two stable alternative metrics are exposed by all reasonably recent `kube-state-metrics` releases and are not on the denylist: + +- `kube_pod_status_scheduled_time` — when the scheduler bound the Pod to a node. +- `kube_pod_status_ready_time` — when the Pod's `Ready` condition first flipped to `True`. + +Their difference is the on-node startup latency that operators usually mean by "how long did the Pod take to come up". + +## Resolution + +### Steps + +1. Confirm whether `kube_pod_created` is exposed in the local Prometheus. If the result is empty, fall back to the alternative metrics in step 3: + + ```promql + kube_pod_created + ``` + +2. Inspect the `kube-state-metrics` Deployment to see whether a denylist is in effect. The argument list usually appears under `containers[].args`: + + ```bash + kubectl get deploy -A -l app.kubernetes.io/name=kube-state-metrics \ + -o jsonpath='{range .items[*]}{.metadata.namespace}{"/"}{.metadata.name}{"\t"}{.spec.template.spec.containers[0].args}{"\n"}{end}' + ``` + + If the output contains `--metric-denylist=...kube_.+_created...`, the source metric is intentionally absent. + +3. Use any of the following PromQL expressions to compute the Pod startup duration in seconds. Each one returns one sample per Pod over the lookback window. Pick the alternative whose semantic best matches the question being asked: + + ```promql + # Time between scheduling and readiness — closest to "scheduling and start-up cost" + last_over_time(kube_pod_status_ready_time{namespace="test"}[1d]) + - last_over_time(kube_pod_status_scheduled_time{namespace="test"}[1d]) + + # Time between init-container completion and readiness — isolates main container work + last_over_time(kube_pod_status_ready_time{namespace="test"}[1d]) + - last_over_time(kube_pod_status_initialized_time{namespace="test"}[1d]) + + # Time between container start and readiness — isolates probe latency + last_over_time(kube_pod_status_ready_time{namespace="test"}[1d]) + - last_over_time(kube_pod_start_time{namespace="test"}[1d]) + ``` + +4. To verify a single value out-of-band, use `kubectl` directly. The Pod's `metadata.creationTimestamp` minus the `Ready` condition's `lastTransitionTime` gives the same number Prometheus reports: + + ```bash + kubectl -n test get pod -o json | jq -r ' + .items[] + | .metadata.creationTimestamp as $created + | (.status.conditions[] | select(.type=="Ready") | .lastTransitionTime) as $ready + | "Pod: \(.metadata.name)\tCreated: \($created)\tReady: \($ready)\tDelta: \(($ready | fromdateiso8601) - ($created | fromdateiso8601)) s" + ' + ``` + + Example output: + + ```text + Pod: httpd-5c4cfd69b4-mr7tq Created: 2026-04-22T23:45:33Z Ready: 2026-04-22T23:46:05Z Delta: 32 s + ``` + +## Diagnostic Steps + +If both `kube_pod_created` AND `kube_pod_status_*` are empty in Prometheus, the cluster is missing `kube-state-metrics` entirely: + +```bash +kubectl get pods -A -l app.kubernetes.io/name=kube-state-metrics +``` + +If `kube_pod_status_ready_time` returns values but the difference comes out negative, the Pod was already scheduled but never reached `Ready` within the query window — verify the Pod's current `Ready` condition with `kubectl describe pod`.