Skip to content

feat(seinode): in-place TLS enable on Running SeiNodes#264

Closed
bdchatham wants to merge 3 commits into
mainfrom
feat/seinode-sidecar-tls-enable-on-running
Closed

feat(seinode): in-place TLS enable on Running SeiNodes#264
bdchatham wants to merge 3 commits into
mainfrom
feat/seinode-sidecar-tls-enable-on-running

Conversation

@bdchatham
Copy link
Copy Markdown
Collaborator

Summary

Closes #262.

Design

Surface Behavior
CEL on spec.sidecar.tls nil → set allowed; set → nil (disable) and A → B (swap) rejected
Status.CurrentSidecarTLSSecretName Mirror stamped by ObserveSidecarTLS after rollout convergence (ReadyReplicas >= *Replicas)
SidecarURLForNode + cmd/main.go newSidecarClient Both key on the same observed status field — lockstep transport flip
sidecarTLSEnableDrift Narrow detector: spec set + status empty + preflight Ready
buildNodeUpdatePlan Composes image-drift + TLS-enable drift in one pod cycle
Init plans (base / bootstrap / genesis) Inject ObserveSidecarTLS between StatefulSet apply and first sidecar HTTP task
classifyPlan labels New: tls-enable, node-update-tls-enable

LLD §0.1(b), §4, §5, §6 rewritten in this PR to reflect set-once semantics — the original LLD said immutable; un-deferred per operator validation that per-node delete+recreate is more friction than warranted.

Cross-review

Trio (platform / kubernetes / security) ran against 77a28c1. Three commits address findings:

  • 77a28c1 — implementation
  • e28c92b — round-1 fixups (cmd/main.go transport gate, ReadyReplicas check, classifyPlan separator, nil-guard, LLD rewrite)
  • 1f33673 — comment sweep

Blockers addressed:

Non-blocker fixups applied:

  • ObserveSidecarTLS now requires ReadyReplicas >= *Replicas (k8s) — stricter than ObserveImage because stamping flips transport
  • classifyPlan label node-update+tls-enablenode-update-tls-enable (security; avoids PromQL regex footgun)
  • sidecarTLSEnableDrift has explicit nil-guard (k8s)
  • buildBasePlan adjacent SidecarTLSEnabled blocks consolidated (platform)
  • Bootstrap ObserveSidecarTLS placement comment clarifies Phase-2 vs Phase-5 (k8s)
  • Test comment contradiction fixed (k8s)
  • Added SidecarURLForNode tests for both mid-rollout quadrants

Deferred (tracked):

Test plan

  • make test — full suite green
  • golangci-lint run --new-from-rev=origin/main ./... — 0 issues
  • make manifests generate — CRD diff is additive (new field optional, CEL change is a relaxation)
  • Unit: sidecarTLSEnableDrift matrix (fires / converged / not-ready / disabled)
  • Unit: plan composition across (image-only, tls-only, both) on Running nodes
  • Unit: init-plan ObserveSidecarTLS placement (after ApplyStatefulSet, before ConfigApply)
  • Unit: SidecarURLForNode mid-rollout quadrants (spec-set/status-empty stays HTTP; status-set drives HTTPS)
  • Unit: ObserveSidecarTLS crashlooping-proxy scenario (UpdatedReplicas=1, ReadyReplicas=0 → stays Running)
  • Unit: classifyPlan new labels
  • Unit: condition lifecycle on TLS-enable plans (sets and clears)
  • envtest: CEL allows nil→set, rejects set→nil + A→B (deferred — envtest harness not yet in repo)
  • End-to-end on harbor: patch spec.sidecar.tls.secretName on a Running SeiNode with a pre-applied Secret; pod cycles; serves on :8443

References

🤖 Generated with Claude Code

bdchatham and others added 3 commits May 17, 2026 18:14
Builds on #258's externalized-cert foundation. Operators can now add
spec.sidecar.tls.secretName to an existing Running SeiNode; the
controller composes a NodeUpdate-style plan that cycles the pod with
the proxy container mount. Disable and swap remain CEL-blocked.

API:
- Relax spec.sidecar.tls CEL: allow nil → set transition, still block
  set → nil and set → different. Set-once semantics; immutability is
  only loosened in the direction that has a coherent rollout path.
- Add Status.CurrentSidecarTLSSecretName: live transport-mode source
  of truth, distinct from the platform-tooling contract published in
  Status.SidecarTLS.

Transport-mode consistency:
- SidecarURLForNode now reads from Status.CurrentSidecarTLSSecretName
  rather than Spec.Sidecar.TLS. Mid-rollout (spec set, pod not yet
  cycled), the controller's sidecar client keeps using HTTP until
  ObserveSidecarTLS stamps the status field — this is the windowing
  bug that bit PR #254. Test covers the mid-rollout state explicitly.

Planner:
- sidecarTLSEnableDrift: narrow detector (spec.tls set + status
  CurrentSidecarTLSSecretName empty + preflight Ready).
- buildRunningPlan composes image-drift + TLS-enable drift into one
  pod cycle.
- buildNodeUpdatePlan emits ApplyRBACProxyConfig pre-task + the
  ObserveSidecarTLS observer when TLS-enable drift fires; co-composes
  with ObserveImage when both drift.
- Init plans (buildBasePlan + buildBootstrapPlan + buildGenesisPlan)
  inject ObserveSidecarTLS between ApplyStatefulSet/Service and the
  first sidecar HTTP task so the transport-mode flip happens before
  ConfigApply / GenerateIdentity / etc. fire.

Task:
- New ObserveSidecarTLS task. Polls the StatefulSet for rollout
  convergence (mirrors ObserveImage), then stamps
  Status.CurrentSidecarTLSSecretName from spec.
- Registered in task.go deserializer + paramsForTaskType.

classifyPlan emits new labels: tls-enable, node-update+tls-enable.

Tests:
- Drift detector matrix (fires / converged / not-ready / disabled).
- Plan composition across (image-only, tls-only, both).
- Init plan ObserveSidecarTLS placement before the first sidecar HTTP
  task (validates the transport-mode-flip ordering invariant).
- SidecarURLForNode mid-rollout regression: spec-set + status-empty
  keeps HTTP.
- Condition lifecycle on TLS-enable plans.

Closes #262.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Blockers:
- cmd/main.go: newSidecarClient now keys the TLS Doer on
  Status.CurrentSidecarTLSSecretName (not spec). Symmetric source of
  truth with SidecarURLForNode. Closes the mid-rollout window where
  mTLS-attempting transport would hit plain HTTP and the bootstrap-pod
  Phase-2 case where the bootstrap pod has no proxy. k8s-flagged
  blocker — corrected platform's "non-blocker" framing.
- LLD §0.1(b), §4, §5, §6 rewritten to reflect set-once semantics
  (enable-on-existing allowed; disable + swap CEL-rejected). Doc
  contradiction with the implementation was the platform + security
  blocker; design history captured so future readers see the spec is
  set-once-not-immutable and why.
- Document load-bearing trust assumption: Status.CurrentSidecarTLS-
  SecretName drives controller transport; manifests/role.yaml scopes
  seinodes/status patch to the controller SA only. Followup tracked.

Non-blockers / nits:
- ObserveSidecarTLS now requires ReadyReplicas >= *Replicas (k8s).
  Stricter than ObserveImage: stamping status flips SidecarURLForNode
  to HTTPS, so a crashlooping proxy must not flip transport
  prematurely. New test covers the crashloop scenario.
- classifyPlan label node-update+tls-enable → node-update-tls-enable
  (security). Hyphen avoids PromQL/dashboard regex-escape footguns.
- sidecarTLSEnableDrift now has an inline nil-guard (k8s). Removes
  implicit dependency on SidecarTLSEnabled invariants.
- buildBasePlan: consolidate two adjacent SidecarTLSEnabled blocks
  with ordering comment (platform).
- buildBootstrapPlan: clarify ObserveSidecarTLS comment now that the
  Phase-2 (bootstrap-pod, HTTP) vs Phase-5 (production-pod, HTTPS)
  split is explicit (k8s).
- Test comment contradiction fixed (k8s).
- New SidecarURLForNode test: status-set-without-spec quadrant.

Deferred (tracked):
- ValidatingAdmissionPolicy on seinodes/status writes (security #263).
- envtest harness for CEL transitions (k8s #264).
- Shared rollout-readiness primitive ObserveImage/ObserveSidecarTLS
  (platform — defer to when a third observer lands).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Strip rationale paragraphs, cross-references, historical framing, and
restatements of what the code obviously does. Keep present-tense
one-liners where the WHY is non-obvious.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cursor
Copy link
Copy Markdown

cursor Bot commented May 18, 2026

You have used all Bugbot PR reviews included in your free trial for your GitHub account on this workspace.

To continue using Bugbot reviews, enable Bugbot for your team in the Cursor dashboard.

@bdchatham
Copy link
Copy Markdown
Collaborator Author

Closing — directional pivot. We're removing sidecar TLS management from the controller entirely. Rationale: controller↔sidecar runs on the private K8s cluster network; TLS termination doesn't earn its keep here. kube-rbac-proxy stays for its SAR-based authz value (in --insecure-listen-address HTTP mode); all cert provisioning, Secret reference contract, preflight SAN validation, status mirror, and CEL immutability surface go away.

Subtraction PR incoming.

@bdchatham bdchatham closed this May 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enable TLS on existing Running SeiNodes via in-place plan

1 participant