CRD rename: TemporalWorkerDeployment → WorkerDeployment, TemporalConnection → Connection#294
Open
CRD rename: TemporalWorkerDeployment → WorkerDeployment, TemporalConnection → Connection#294
Conversation
…ration test coverage
Adds documentation that explains how we think about releases of the application and Helm Charts and when we will bump major, minor and patch versions for both Application and Chart Releases. Issue #264 Signed-off-by: Jay Pipes <jay.pipes@temporal.io>
…blocking unversioned workers Signed-off-by: Anuj Agrawal <anujagrawal380@gmail.com>
Signed-off-by: Anuj Agrawal <anujagrawal380@gmail.com>
Signed-off-by: Anuj Agrawal <anujagrawal380@gmail.com>
Signed-off-by: Anuj Agrawal <anujagrawal380@gmail.com>
Signed-off-by: Anuj Agrawal <anujagrawal380@gmail.com>
…ion finalizer - Add 5-minute deletionCleanupTimeout to prevent TWD stuck in Terminating state indefinitely if Temporal server is unavailable - Return errors from version/deployment deletion to trigger requeue until versions actually clear (pollers disappear as pods terminate) - Add update/patch verbs and finalizers RBAC marker for TemporalConnections - Fix comment-spacing lint on new kubebuilder:rbac markers
…y from Go with make generate
Move TemporalWorkerDeployment spec validation from reconcile time to apply time by embedding CEL validation rules in the CRD schema. Previously validation only fired when the optional TWD webhook was enabled; now the API server enforces it universally on every create/update. Rules added to worker_types.go (regenerated in the CRD manifest): - metadata.name length <= 63 (root object) - Progressive strategy requires non-empty steps (RolloutStrategy) - rampPercentage strictly increasing across steps (RolloutStrategy) - gate: input and inputFrom are mutually exclusive (RolloutStrategy) - gate.inputFrom: exactly one of configMapKeyRef/secretKeyRef (RolloutStrategy) - pauseDuration >= 30s per step (RolloutStep) The ValidateCreate() call and 5-minute requeue in the reconciler are removed — the CRD schema now guarantees the spec is valid before the reconciler sees it. The webhook Go code is kept for defence-in-depth when webhook.enabled=true. Closes #62 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two issues with the initial CEL rules surfaced in k8s 1.27 envtest: 1. gate.input is x-kubernetes-preserve-unknown-fields: true — CEL cannot reference it structurally, so has(self.gate.input) failed to compile. Removed the gate input/inputFrom mutual-exclusivity rule; the webhook Go code still enforces it when enabled. 2. Any list traversal on self.steps at the RolloutStrategy level (map, isSorted) hits the schema-level CEL cost budget in k8s 1.27 because the cost estimator does not propagate maxItems from the steps array schema up to the parent object's rule. Removed the rampPercentage ordering rule from the CRD; the webhook Go code still enforces it when enabled. Remaining CRD rules (all O(1)): name length, Progressive requires steps, pauseDuration >= 30s per step, gate.inputFrom exclusivity. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The CRD CEL rules cannot express rampPercentage ordering or gate input/inputFrom exclusivity (list traversal hits the schema cost budget; gate.input is x-kubernetes-preserve-unknown-fields). These checks live only in the webhook Go code, which is optional. Without a fallback, users who deploy without the webhook and submit an invalid spec (e.g. decreasing rampPercentage) would silently get no rollout progress and no signal — a dangerous failure mode since a decreasing ramp could cause auto-upgrade workflows to roll back. Re-add ValidateCreate() in the reconciler as a fallback: - On failure, emit a Warning event and set ConditionProgressing=False / ConditionReady=False with reason InvalidSpec. - Return early with a 5-minute requeue so the invalid spec is never acted on (no plan generated, no Temporal API calls). - Add ReasonInvalidSpec constant to worker_types.go. - Add TestReconcile_InvalidSpec_EmitsEventAndSetsCondition to cover the new path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Returning ctrl.Result{Requeue: true, RequeueAfter: 5m} is unnecessary:
the controller watches TWD objects, so any spec correction the user
applies triggers an immediate reconciliation automatically. Remove the
timer and return ctrl.Result{}, nil instead.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The CRD schema enforces name length, Progressive-requires-steps, pauseDuration >= 30s, and gate.inputFrom exclusivity at admission time. Keeping them in the webhook Go code was dead redundancy. validateRolloutStrategy now only checks the two constraints the CRD cannot enforce (gate.input is x-kubernetes-preserve-unknown-fields, rampPercentage ordering requires list traversal that exceeds the k8s 1.27 CEL cost budget): - rampPercentage strictly increasing across steps - gate.input and gate.inputFrom mutually exclusive Removed the corresponding test cases from the webhook unit tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Tests hit a real kube-apiserver via envtest (the existing webhook suite infrastructure) so they verify that the x-kubernetes-validations blocks in the generated CRD are syntactically correct and semantically enforced by the API server — independently of the webhook Go code. Covers all four CRD-level rules: - name length > 63 → rejected - Progressive strategy with no steps → rejected - Progressive step with pauseDuration < 30s → rejected - gate.inputFrom with both configMapKeyRef and secretKeyRef → rejected - valid TWD → accepted Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…traint Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
carlydf
commented
Apr 28, 2026
| name: {{ .Release.Name }}-{{ .Release.Namespace }}-temporalconnection-editor-role | ||
| rules: | ||
| - apiGroups: | ||
| - temporal.io.temporal.io |
Collaborator
Author
There was a problem hiding this comment.
mistake previously in main meant no one was using this anyway, but just fix it for symmetry with the other roles. will remove in v1.8
WorkerDeployment never had a mutating webhook in main (only TWD had a validating one). The Default logic is moved to WorkerDeploymentSpec.ApplyDefaults, called directly by the controller as it already was before the webhook existed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
+kubebuilder:default markers on ScaledownDelay and DeleteDelay mean Kubernetes applies these at admission time, so nil values never reach the controller. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
carlydf
commented
Apr 28, 2026
| client.Client | ||
| } | ||
|
|
||
| func (r *DeprecatedTCReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) { |
Collaborator
Author
There was a problem hiding this comment.
other actually interesting piece
carlydf
commented
Apr 28, 2026
| switch { | ||
| case err == nil: | ||
| reason = "MigratedToConnection" | ||
| message = "Migration complete. Delete this TemporalConnection." |
Collaborator
Author
There was a problem hiding this comment.
as soon as a Connection exists with the same name, migration can be considered complete. No ownership transfers needed.
carlydf
commented
Apr 28, 2026
| // Migration: if a deprecated TemporalWorkerDeployment with the same name/namespace | ||
| // exists and has not yet been migrated, transfer ownership of its child Deployments | ||
| // and WorkerResourceTemplates to this WorkerDeployment. | ||
| if err := r.migrateFromDeprecatedTWD(ctx, l, &workerDeploy); err != nil { |
Collaborator
Author
There was a problem hiding this comment.
interesting to review
…temporal.io The deletion block was duplicated during merge resolution — the stale copy (\"TemporalWorkerDeployment is being deleted\") is identical to the correct one above it but with old names. Removed the stale copy. Also fixed groups=temporal.io.temporal.io → temporal.io in the mutating webhook marker, which was introduced by the same merge. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The old CRDs (TemporalWorkerDeployment, TemporalConnection) are not "fully functional" in v1.7: they cannot be created, are not reconciled, and will never become Ready. They exist only to allow migration of resources already on the cluster. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…cation warnings and CEL validation
497d545 to
821945d
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What was changed
Renames the two primary CRDs and a
WorkerResourceTemplatefield:TemporalWorkerDeploymentWorkerDeploymentTemporalConnectionConnectionWorkerResourceTemplate.spec.temporalWorkerDeploymentRefWorkerResourceTemplate.spec.workerDeploymentRefIn v1.7, the old CRD kinds are not actively managed: existing objects are not reconciled against Temporal server state, and new objects of these kinds cannot be created. The deprecated CRDs exist only to support zero-downtime migration of resources already on the cluster.
Key implementation details
DeprecatedTWDReconciler,DeprecatedTCReconciler) set aReady=Falsestatus condition on deprecated resources, progressing throughDeprecated→WorkerDeploymentExists(ownership transfer in progress) →MigratedToWorkerDeployment(safe to delete).WorkerDeploymentwith the same name as an existingTemporalWorkerDeployment, it patches the ownerRefs on all child Deployments and WorkerResourceTemplates to point to the new owner — no pods are restarted.make manifestsalongside the new CRDs. CEL ruleoldSelf != nullblocks new creates;deprecated: true+deprecationWarningsurfaces a warning on every API call.deprecated_temporalworkerdeployment_types.go,deprecated_temporalconnection_types.go) retain the full v1.6 typed spec with renamed sub-structs (e.g.DeprecatedWorkerOptions) to avoid collision with the new types, keepingjson:tags identical so the schema is unchanged.Why?
The
Temporalprefix was redundant given thetemporal.ioAPI group. We wanted to clean this up before GA.Checklist
Closes remove
Temporalprefix on CRD names #256Tested via existing integration test suite (dual-field tests cover both old and new CRD kinds)
TODO: test migration in real life
Docs: all existing docs updated to use new names;
docs/migration-crd-rename.mdadded with zero-downtime migration steps