feat(seinode): sidecar TLS via externally-provisioned Secret#258
Open
bdchatham wants to merge 2 commits into
Open
feat(seinode): sidecar TLS via externally-provisioned Secret#258bdchatham wants to merge 2 commits into
bdchatham wants to merge 2 commits into
Conversation
Implements #255 per the merged LLD at docs/design-seinode-sidecar-tls-toggle-lld.md. Cert ownership moves out of the controller. Operator/platform tooling provisions a kubernetes.io/tls Secret; the controller references it by name, validates its presence + cert SANs in a steady-state preflight branch, and gates init-plan creation on ConditionSidecarTLSSecretReady. Mirrors signingKey / nodeKey / operatorKeyring / imported-PVC patterns already in the controller. spec.sidecar.tls is immutable post-creation (CRD CEL); toggle = delete + recreate. API: - SidecarTLSSpec rewritten: drop IssuerName/IssuerKind, add SecretName - SidecarTLSStatus added: {SecretName, RequiredDNSNames} — machine- readable contract for platform tooling, replacing naming-convention docs - SeiNodeStatus.SidecarTLS *SidecarTLSStatus - ConditionSidecarTLSSecretReady with reasons Ready / NotFound / Malformed / SANsMismatch - CEL immutability rule on spec.sidecar.tls Controller: - reconcileSidecarTLSReady method in internal/controller/node/preflight_sidecar_tls.go - Reads Secret via APIReader (bypass cache), validates type + non-empty tls.crt/tls.key + PEM-decode + x509.ParseCertificate + SAN superset - Publishes status.sidecarTLS.requiredDNSNames whenever TLS is enabled - APIReader plumbed through SeiNodeReconciler Planner: - ResolvePlan gates init-plan creation when TLS enabled and condition not Ready; steady-state (Running) plans bypass the gate so image-drift rollouts proceed independently of TLS posture - buildBasePlan / buildBootstrapPlan / buildGenesisPlan emit only ApplyRBACProxyConfig under TLS (ApplySidecarCert removed) Code deletion: - internal/task/apply_sidecar_cert.go (task + deserializer registry entry) - GenerateSidecarCertificate from internal/noderesource/noderesource.go Tests: - preflight_sidecar_tls_test.go: validateTLSSecret matrix, requiredDNSNames, reconcileSidecarTLSReady lifecycle (enable/disable transitions, missing Secret, SANs mismatch). Uses in-memory ECDSA P-256 self-signed certs. - Existing sidecar_tls / plan_execution / reconciler / noderesource tests updated for the new spec shape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… + doc Addresses platform-engineer cross-review against the implementation commit. One blocker (CEL too permissive), rest are non-blocker / ergonomics fixups. - api: tighten the spec.sidecar.tls immutability CEL rule. The prior shape short-circuited to true whenever oldSelf.sidecar or oldSelf.sidecar.tls was unset, which silently permitted enabling TLS on an existing Running SeiNode via spec mutation. Under the new rule, TLS state at creation is final: a SeiNode that started without TLS stays without TLS for its lifetime, and a SeiNode that started with TLS stays at the same TLS spec for its lifetime. Matches LLD §0.1(b). - api: prefix reason string values (TLSSecretReady, TLSSecretNotFound, TLSSecretMalformed, TLSSecretSANsMismatch) to match the existing SigningKey/NodeKey/OperatorKeyring family so SRE tooling can grep across condition types consistently. - task: prefix the transient-error message in validateTLSSecret so an operator who sees Reason=NotFound but finds the Secret present can disambiguate "transient error" from "Secret missing" without grepping controller logs. - api: doc-comment on SidecarTLSStatus.RequiredDNSNames notes the names are pre-computable from the target spec; published for verification, not as a prerequisite handshake. - noderesource: doc-comment on SidecarTLSSecretName notes callers should gate on SidecarTLSEnabled before relying on the empty return. - test: clarify the TransitionEnabledToDisabled test comment now that the CEL rule actually blocks the mutation in prod. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
You have used all Bugbot PR reviews included in your free trial for your GitHub account on this workspace. To continue using Bugbot reviews, enable Bugbot for your team in the Cursor dashboard. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
docs/design-seinode-sidecar-tls-toggle-lld.md.kubernetes.io/tlsSecret; the controller validates SANs in a steady-state preflight branch and gates init-plan creation onConditionSidecarTLSSecretReady. MirrorsSigningKeyReady/NodeKeyReady/OperatorKeyringReady/ imported-PVC patterns already in the controller.spec.sidecar.tlsis immutable post-creation via CRD CEL. Toggle = delete + recreate.Closes #255.
Design at a glance
SidecarTLSSpec{IssuerName, IssuerKind}{SecretName}cert-manager.io/v1 CertificateviaApplySidecarCertplan taskkubernetes.io/tlsSecret externallytls.crt/tls.key,x509.ParseCertificate, SAN superset checkstatus.sidecarTLS.{secretName, requiredDNSNames}published as machine-readable contractConditionSidecarTLSSecretReadywithTLSSecretReady/TLSSecretNotFound/TLSSecretMalformed/TLSSecretSANsMismatchWhat's deleted
internal/task/apply_sidecar_cert.go(task + deserializer registry entry)GenerateSidecarCertificatefrominternal/noderesource/noderesource.goSidecarTLSSpec.IssuerName/IssuerKindfieldsOperator workflow
{name}.{ns}.svc.cluster.local+{name}-0.{name}.{ns}.svc.cluster.local. (Also published instatus.sidecarTLS.requiredDNSNamesafter creation for verification.)kubernetes.io/tlsSecret with those SANs (e.g., via a cert-managerCertificatein GitOps).spec.sidecar.tls.secretName: <name>.PendingwithSidecarTLSSecretReady=Falseuntil the Secret resolves; transitions toRunningonce SANs validate.To toggle TLS on existing nodes: delete + recreate with PVC retained via
spec.dataVolume.import.pvcName. At ~10 nodes this is manual; documented in LLD §5.Cross-review
platform-engineer ran against the diff. Verdict: Blockers → fixed; Ready with non-blocker fixups applied.
Blocker (fixed in commit 2):
(!has(oldSelf.sidecar.tls)) ? true : ...short-circuit allowed enabling TLS on an existing Running SeiNode via spec mutation, which would publishstatus.sidecarTLSand gate init plans but the planner'sphase != Runningbypass meantbuildRunningPlanonly handled image drift — StatefulSet pod template (no proxy container, no Secret mount) never re-applied. Status would lie about transport mode. Tightened the rule: TLS state at creation is final.Non-blocker fixups (commit 2):
Ready→TLSSecretReady, etc.) to match the existingSigningKey*/NodeKey*/OperatorKeyring*family for SRE grep consistency.validateTLSSecretprefixed so operators who find the Secret present can disambiguate from genuineNotFound.SidecarTLSStatus.RequiredDNSNamesclarifies the names are pre-computable; published for verification, not as a prerequisite handshake.SidecarTLSSecretNamenotes callers should gate onSidecarTLSEnabled.Deferred (non-blocker, tracked as future work or LLD-aligned intent):
SidecarTLSSecretReady=Falsecondition is the breadcrumb; operators see it viakubectl describe. Ifkubectl getconsumers complain, add an event in a follow-up.SidecarTLSSecretReady. LLD §4 explicitly allows this — TLS readiness is observability on Running nodes, not enforcement. Pod cycles on image drift even if the cert went bad. Operators should check the condition before rolling.Test plan
make test— full suite greengolangci-lint run --new-from-rev=origin/main ./...— 0 issuesmake manifests generate— diff clean, CEL rule reflects the tightened immutabilityvalidateTLSSecretmatrix (Ready, NotFound, WrongType, EmptyCert, EmptyKey, NotPEM, PEMButUnparseable, SANsMismatch, PartialSANsMismatch, SupersetSANsOK)requiredDNSNamesfor service + pod0 SANs, varying namespace/namereconcileSidecarTLSReadylifecycle (TLS-enabled+ready, missing, SANs-mismatch, disabled-clears, enable→disable transition)spec.sidecar.tlsmutation (deferred — adds envtest infra cost; CEL rule is human-verifiable in the generated CRD YAML):8443and controller TLS client connects. Delete + recreate with TLS removed; verify plain HTTP on legacy port.References
docs/design-seinode-sidecar-tls-toggle-lld.md— merged LLD🤖 Generated with Claude Code