Skip to content

WaitForSidecarTLSSecret: per-task deadline + diagnostic surfacing #253

@bdchatham

Description

@bdchatham

Problem

#247 adds a terminal-failure path on Certificate.Ready=False/Reason=Failed, which catches the most common stuck case (bad IssuerName). Two gaps remain:

  1. No wall-clock deadline. Other terminal cert-manager states (e.g., issuer CA crashed mid-issuance, network partition to a remote ACME server) leave Certificate.Ready=False with reasons other than Failed. The task polls forever.
  2. Transient diagnostic invisibility. While Certificate.Ready=False/Reason=Issuing is in progress, operators see only condition=NodeUpdateInProgress, reason=TLSToggleStarted — no insight into why issuance is slow. cert-manager's message field on the Ready condition would help.

Proposed scope

  • Add a per-task deadline (configurable; default ~10min for cert-manager issuance) — return Terminal if Secret.tls.crt is still empty past the deadline
  • Surface Certificate.Ready.message into the task error (non-Terminal) on each poll so it lands in status.plan.tasks[i].error and operators can grep for it via kubectl describe seinode

Why deferred from #247

The Terminal-on-Failed fix lands the primary blocker. The deadline + diagnostic surfacing is defense-in-depth and a UX improvement, not a correctness gap on the rollout path.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions