Skip to content

Alert when TLS Secret deleted while pod is still running #261

@bdchatham

Description

@bdchatham

Problem

Surfaced by security + kubernetes cross-review on #258 (round 3). LLD §4 documents the intended behavior: if the operator-provisioned TLS Secret is deleted (or its SANs drift to a mismatch), kube-rbac-proxy keeps serving on its currently-bound cert until the pod cycles for unrelated reasons (node drain, OOM, etc.). The plan-creation gate prevents the controller from cycling the pod itself. So an unhealthy state can persist invisibly.

The controller publishes `SidecarTLSSecretReady=False` so a `kubectl describe seinode` reveals it, but there is no Prometheus/AlertManager-level signal. Operators not actively inspecting per-SeiNode status would miss it.

Proposed scope

  • Emit a controller metric `seinode_sidecar_tls_secret_ready{namespace, name}` with the condition status as a gauge value.
  • Define an alert rule: `seinode_sidecar_tls_secret_ready == 0 for 5m` → page.
  • Document the expected operator response in the runbook.

Why deferred from #258

Observability tooling lives in the platform repo, not the controller; the controller's job here is to publish the signal. The metric + alert rule are platform follow-up work.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions