You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Operators need to enable TLS on the sei-sidecar API across arctic-1, atlantic-2, and pacific-1 fleets. Today the controller silently ignores spec.sidecar.tls changes on Running SeiNodes; only init-time TLS works, and that requires deleting each node to re-trigger init — operationally hostile.
The original investigation (#247) explored an in-place toggle mechanism with a drift detector, observer task, and status mirror (landed in PR #254, since closed). That direction was abandoned after cross-review surfaced architectural concerns: the controller was minting cert-manager Certificate resources itself (coupling cert lifecycle to plan-task flow), the Issuer trust-anchor was operator-spec-supplied (security gap, #251), and the unified-plan task-list-sniffing discrimination was getting brittle.
Proposed approach
See docs/design-seinode-sidecar-tls-toggle-lld.md for the full LLD. Summary:
spec.sidecar.tls.secretName: string — references an externally-provisioned kubernetes.io/tls Secret. Replaces IssuerName/IssuerKind.
spec.sidecar.tls is immutable post-creation (CEL validation). Toggle via delete + recreate with PVC retention.
Controller publishes status.sidecarTLS.{secretName, requiredDNSNames} — machine-readable contract for platform tooling.
Pre-flight reconcile branch validates the Secret's cert SANs against requiredDNSNames; sets ConditionSidecarTLSSecretReady (mirrors SigningKeyReady / NodeKeyReady / OperatorKeyringReady).
Init plan gates on SidecarTLSSecretReady=True when TLS is enabled.
ApplyRBACProxyConfig stays (controller-owned, namespace/name-dependent). ApplySidecarCert, ObserveSidecarTLS, sidecarTLSDrift, status mirror — all dropped.
Relevant experts
kubernetes-specialist — CEL immutability rule, finalizer-strip-owner-ref pattern (if retain-on-delete also lands), preflight reconcile shape
platform-engineer — env contract for the new condition reasons, dashboard/alert impact
Drop: GenerateSidecarCertificate, ApplySidecarCert task, ObserveSidecarTLS task, sidecarTLSDrift, CurrentSidecarTLS status field, SidecarTLSStatus type rewritten under the new contract
Unit tests for validateTLSSecret matrix (missing, wrong type, empty data, unparseable, SAN match, SAN mismatch)
envtest: SeiNode with TLS but missing Secret stays Pending; provisioning Secret transitions to Running
envtest: CEL rejects post-creation mutation of spec.sidecar.tls
End-to-end on harbor: create SeiNode with manually-applied Secret; verify pod serves on :8443; verify controller TLS client connection
Out of scope (deferred / unrelated)
spec.dataVolume.retainOnDelete: bool — future ergonomic improvement on PVC retention; existing spec.dataVolume.import.pvcName is the supported escape hatch today
SND-level retain-on-delete — independently useful, separate concern
Problem
Operators need to enable TLS on the sei-sidecar API across arctic-1, atlantic-2, and pacific-1 fleets. Today the controller silently ignores
spec.sidecar.tlschanges on Running SeiNodes; only init-time TLS works, and that requires deleting each node to re-trigger init — operationally hostile.The original investigation (#247) explored an in-place toggle mechanism with a drift detector, observer task, and status mirror (landed in PR #254, since closed). That direction was abandoned after cross-review surfaced architectural concerns: the controller was minting cert-manager
Certificateresources itself (coupling cert lifecycle to plan-task flow), the Issuer trust-anchor was operator-spec-supplied (security gap, #251), and the unified-plan task-list-sniffing discrimination was getting brittle.Proposed approach
See
docs/design-seinode-sidecar-tls-toggle-lld.mdfor the full LLD. Summary:spec.sidecar.tls.secretName: string— references an externally-provisionedkubernetes.io/tlsSecret. ReplacesIssuerName/IssuerKind.spec.sidecar.tlsis immutable post-creation (CEL validation). Toggle via delete + recreate with PVC retention.status.sidecarTLS.{secretName, requiredDNSNames}— machine-readable contract for platform tooling.requiredDNSNames; setsConditionSidecarTLSSecretReady(mirrorsSigningKeyReady/NodeKeyReady/OperatorKeyringReady).SidecarTLSSecretReady=Truewhen TLS is enabled.ApplyRBACProxyConfigstays (controller-owned, namespace/name-dependent).ApplySidecarCert,ObserveSidecarTLS,sidecarTLSDrift, status mirror — all dropped.Relevant experts
kubernetes-specialist— CEL immutability rule, finalizer-strip-owner-ref pattern (if retain-on-delete also lands), preflight reconcile shapeplatform-engineer— env contract for the new condition reasons, dashboard/alert impactsecurity-specialist— x509 SAN-validation correctness, condition-as-trust-signal scopeAcceptance criteria
spec.sidecar.tls.secretNamefield replacesIssuerName/IssuerKind; CRD regeneratedspec.sidecar.tlsrejects mutations on existing SeiNodesstatus.sidecarTLS.{secretName, requiredDNSNames}populated when TLS enabledConditionSidecarTLSSecretReadyconst + reasons (Ready/NotFound/Malformed/SANsMismatch)tls.crt/tls.key,x509.ParseCertificate, SAN superset checkSidecarTLSSecretReady=False(gating parallel toSigningKeyReady)buildBasePlanemits onlyApplyRBACProxyConfig(notApplySidecarCert) under TLSbuildRunningPlanreverts to pre-PR-feat(planner): TLS toggle on Running SeiNodes via NodeUpdate plan #254 shape (no TLS handling)GenerateSidecarCertificate,ApplySidecarCerttask,ObserveSidecarTLStask,sidecarTLSDrift,CurrentSidecarTLSstatus field,SidecarTLSStatustype rewritten under the new contractvalidateTLSSecretmatrix (missing, wrong type, empty data, unparseable, SAN match, SAN mismatch)spec.sidecar.tls:8443; verify controller TLS client connectionOut of scope (deferred / unrelated)
spec.dataVolume.retainOnDelete: bool— future ergonomic improvement on PVC retention; existingspec.dataVolume.import.pvcNameis the supported escape hatch todayReferences
docs/design-seinode-sidecar-tls-toggle-lld.md— revised LLD with §0.1 design-choice section