Skip to content

Sidecar TLS via externally-provisioned Secret #255

@bdchatham

Description

@bdchatham

Problem

Operators need to enable TLS on the sei-sidecar API across arctic-1, atlantic-2, and pacific-1 fleets. Today the controller silently ignores spec.sidecar.tls changes on Running SeiNodes; only init-time TLS works, and that requires deleting each node to re-trigger init — operationally hostile.

The original investigation (#247) explored an in-place toggle mechanism with a drift detector, observer task, and status mirror (landed in PR #254, since closed). That direction was abandoned after cross-review surfaced architectural concerns: the controller was minting cert-manager Certificate resources itself (coupling cert lifecycle to plan-task flow), the Issuer trust-anchor was operator-spec-supplied (security gap, #251), and the unified-plan task-list-sniffing discrimination was getting brittle.

Proposed approach

See docs/design-seinode-sidecar-tls-toggle-lld.md for the full LLD. Summary:

  • spec.sidecar.tls.secretName: string — references an externally-provisioned kubernetes.io/tls Secret. Replaces IssuerName/IssuerKind.
  • spec.sidecar.tls is immutable post-creation (CEL validation). Toggle via delete + recreate with PVC retention.
  • Controller publishes status.sidecarTLS.{secretName, requiredDNSNames} — machine-readable contract for platform tooling.
  • Pre-flight reconcile branch validates the Secret's cert SANs against requiredDNSNames; sets ConditionSidecarTLSSecretReady (mirrors SigningKeyReady / NodeKeyReady / OperatorKeyringReady).
  • Init plan gates on SidecarTLSSecretReady=True when TLS is enabled.
  • ApplyRBACProxyConfig stays (controller-owned, namespace/name-dependent). ApplySidecarCert, ObserveSidecarTLS, sidecarTLSDrift, status mirror — all dropped.

Relevant experts

  • kubernetes-specialist — CEL immutability rule, finalizer-strip-owner-ref pattern (if retain-on-delete also lands), preflight reconcile shape
  • platform-engineer — env contract for the new condition reasons, dashboard/alert impact
  • security-specialist — x509 SAN-validation correctness, condition-as-trust-signal scope

Acceptance criteria

  • spec.sidecar.tls.secretName field replaces IssuerName/IssuerKind; CRD regenerated
  • CEL immutability rule on spec.sidecar.tls rejects mutations on existing SeiNodes
  • status.sidecarTLS.{secretName, requiredDNSNames} populated when TLS enabled
  • ConditionSidecarTLSSecretReady const + reasons (Ready / NotFound / Malformed / SANsMismatch)
  • Pre-flight reconcile branch: get Secret, type check, non-empty tls.crt/tls.key, x509.ParseCertificate, SAN superset check
  • Init plan won't progress when SidecarTLSSecretReady=False (gating parallel to SigningKeyReady)
  • buildBasePlan emits only ApplyRBACProxyConfig (not ApplySidecarCert) under TLS
  • buildRunningPlan reverts to pre-PR-feat(planner): TLS toggle on Running SeiNodes via NodeUpdate plan #254 shape (no TLS handling)
  • Drop: GenerateSidecarCertificate, ApplySidecarCert task, ObserveSidecarTLS task, sidecarTLSDrift, CurrentSidecarTLS status field, SidecarTLSStatus type rewritten under the new contract
  • Unit tests for validateTLSSecret matrix (missing, wrong type, empty data, unparseable, SAN match, SAN mismatch)
  • envtest: SeiNode with TLS but missing Secret stays Pending; provisioning Secret transitions to Running
  • envtest: CEL rejects post-creation mutation of spec.sidecar.tls
  • End-to-end on harbor: create SeiNode with manually-applied Secret; verify pod serves on :8443; verify controller TLS client connection

Out of scope (deferred / unrelated)

  • spec.dataVolume.retainOnDelete: bool — future ergonomic improvement on PVC retention; existing spec.dataVolume.import.pvcName is the supported escape hatch today
  • SND-level retain-on-delete — independently useful, separate concern
  • WaitForSidecarTLSSecret: per-task deadline + diagnostic surfacing #253 — preflight gate deadline; reduced scope, not blocking

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions