Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
---
kind:
- How To
products:
- Alauda Container Platform
ProductsVersion:
- 4.1.0,4.2.x
---
## Issue

Capacity planners and platform SRE want to know how long Pods take to transition from `creationTimestamp` to `Ready`. The intuitive metric `kube_pod_created` is documented upstream by `kube-state-metrics`, but several distributions ship a `kube-state-metrics` Deployment with a `--metric-denylist` that hides every `kube_<resource>_created` series. The query needs to be rewritten to use the always-exposed lifecycle timestamp metrics.

## Root Cause

`kube-state-metrics` accepts a `--metric-denylist` regex argument. Some platform monitoring stacks populate it with patterns like `^kube_.+_created$` to reduce cardinality. As a result, `kube_pod_created` is never scraped, even though it is part of the upstream metric catalog.

Two stable alternative metrics are exposed by all reasonably recent `kube-state-metrics` releases and are not on the denylist:

- `kube_pod_status_scheduled_time` — when the scheduler bound the Pod to a node.
- `kube_pod_status_ready_time` — when the Pod's `Ready` condition first flipped to `True`.

Their difference is the on-node startup latency that operators usually mean by "how long did the Pod take to come up".

## Resolution

### Steps

1. Confirm whether `kube_pod_created` is exposed in the local Prometheus. If the result is empty, fall back to the alternative metrics in step 3:

```promql
kube_pod_created
```

2. Inspect the `kube-state-metrics` Deployment to see whether a denylist is in effect. The argument list usually appears under `containers[].args`:

```bash
kubectl get deploy -A -l app.kubernetes.io/name=kube-state-metrics \
-o jsonpath='{range .items[*]}{.metadata.namespace}{"/"}{.metadata.name}{"\t"}{.spec.template.spec.containers[0].args}{"\n"}{end}'
```

If the output contains `--metric-denylist=...kube_.+_created...`, the source metric is intentionally absent.

3. Use any of the following PromQL expressions to compute the Pod startup duration in seconds. Each one returns one sample per Pod over the lookback window. Pick the alternative whose semantic best matches the question being asked:

```promql
# Time between scheduling and readiness — closest to "scheduling and start-up cost"
last_over_time(kube_pod_status_ready_time{namespace="test"}[1d])
- last_over_time(kube_pod_status_scheduled_time{namespace="test"}[1d])

# Time between init-container completion and readiness — isolates main container work
last_over_time(kube_pod_status_ready_time{namespace="test"}[1d])
- last_over_time(kube_pod_status_initialized_time{namespace="test"}[1d])

# Time between container start and readiness — isolates probe latency
last_over_time(kube_pod_status_ready_time{namespace="test"}[1d])
- last_over_time(kube_pod_start_time{namespace="test"}[1d])
```

4. To verify a single value out-of-band, use `kubectl` directly. The Pod's `metadata.creationTimestamp` minus the `Ready` condition's `lastTransitionTime` gives the same number Prometheus reports:

```bash
kubectl -n test get pod -o json | jq -r '
.items[]
| .metadata.creationTimestamp as $created
| (.status.conditions[] | select(.type=="Ready") | .lastTransitionTime) as $ready
| "Pod: \(.metadata.name)\tCreated: \($created)\tReady: \($ready)\tDelta: \(($ready | fromdateiso8601) - ($created | fromdateiso8601)) s"
'
```

Example output:

```text
Pod: httpd-5c4cfd69b4-mr7tq Created: 2026-04-22T23:45:33Z Ready: 2026-04-22T23:46:05Z Delta: 32 s
```

## Diagnostic Steps

If both `kube_pod_created` AND `kube_pod_status_*` are empty in Prometheus, the cluster is missing `kube-state-metrics` entirely:

```bash
kubectl get pods -A -l app.kubernetes.io/name=kube-state-metrics
```

If `kube_pod_status_ready_time` returns values but the difference comes out negative, the Pod was already scheduled but never reached `Ready` within the query window — verify the Pod's current `Ready` condition with `kubectl describe pod`.
Loading