Skip to content

CRD rename: TemporalWorkerDeployment → WorkerDeployment, TemporalConnection → Connection#294

Open
carlydf wants to merge 81 commits intomainfrom
crd-rename-changes
Open

CRD rename: TemporalWorkerDeployment → WorkerDeployment, TemporalConnection → Connection#294
carlydf wants to merge 81 commits intomainfrom
crd-rename-changes

Conversation

@carlydf
Copy link
Copy Markdown
Collaborator

@carlydf carlydf commented Apr 23, 2026

What was changed

Renames the two primary CRDs and a WorkerResourceTemplate field:

Old New
TemporalWorkerDeployment WorkerDeployment
TemporalConnection Connection
WorkerResourceTemplate.spec.temporalWorkerDeploymentRef WorkerResourceTemplate.spec.workerDeploymentRef

In v1.7, the old CRD kinds are not actively managed: existing objects are not reconciled against Temporal server state, and new objects of these kinds cannot be created. The deprecated CRDs exist only to support zero-downtime migration of resources already on the cluster.

Key implementation details

  • Migration helper reconcilers (DeprecatedTWDReconciler, DeprecatedTCReconciler) set a Ready=False status condition on deprecated resources, progressing through DeprecatedWorkerDeploymentExists (ownership transfer in progress) → MigratedToWorkerDeployment (safe to delete).
  • Ownership transfer: when the v1.7 controller first sees a WorkerDeployment with the same name as an existing TemporalWorkerDeployment, it patches the ownerRefs on all child Deployments and WorkerResourceTemplates to point to the new owner — no pods are restarted.
  • Deprecated CRD manifests are generated by make manifests alongside the new CRDs. CEL rule oldSelf != null blocks new creates; deprecated: true + deprecationWarning surfaces a warning on every API call.
  • Deprecated Go types (deprecated_temporalworkerdeployment_types.go, deprecated_temporalconnection_types.go) retain the full v1.6 typed spec with renamed sub-structs (e.g. DeprecatedWorkerOptions) to avoid collision with the new types, keeping json: tags identical so the schema is unchanged.
  • RBAC changes are additive only: all existing permissions kept, new permissions added only for the two new CRDs.
  • Zero-downtime migration: create new resources before upgrading the controller so ownership transfer is immediate at controller startup with no management gap.

Why?

The Temporal prefix was redundant given the temporal.io API group. We wanted to clean this up before GA.

Checklist

  1. Closes remove Temporal prefix on CRD names #256

  2. Tested via existing integration test suite (dual-field tests cover both old and new CRD kinds)
    TODO: test migration in real life

  3. Docs: all existing docs updated to use new names; docs/migration-crd-rename.md added with zero-downtime migration steps

carlydf and others added 30 commits April 21, 2026 23:38
Adds documentation that explains how we think about releases of the
application and Helm Charts and when we will bump major, minor and patch
versions for both Application and Chart Releases.

Issue #264

Signed-off-by: Jay Pipes <jay.pipes@temporal.io>
…blocking unversioned workers

Signed-off-by: Anuj Agrawal <anujagrawal380@gmail.com>
Signed-off-by: Anuj Agrawal <anujagrawal380@gmail.com>
Signed-off-by: Anuj Agrawal <anujagrawal380@gmail.com>
Signed-off-by: Anuj Agrawal <anujagrawal380@gmail.com>
Signed-off-by: Anuj Agrawal <anujagrawal380@gmail.com>
…ion finalizer

- Add 5-minute deletionCleanupTimeout to prevent TWD stuck in Terminating
  state indefinitely if Temporal server is unavailable
- Return errors from version/deployment deletion to trigger requeue until
  versions actually clear (pollers disappear as pods terminate)
- Add update/patch verbs and finalizers RBAC marker for TemporalConnections
- Fix comment-spacing lint on new kubebuilder:rbac markers
Move TemporalWorkerDeployment spec validation from reconcile time to
apply time by embedding CEL validation rules in the CRD schema.
Previously validation only fired when the optional TWD webhook was
enabled; now the API server enforces it universally on every create/update.

Rules added to worker_types.go (regenerated in the CRD manifest):
- metadata.name length <= 63 (root object)
- Progressive strategy requires non-empty steps (RolloutStrategy)
- rampPercentage strictly increasing across steps (RolloutStrategy)
- gate: input and inputFrom are mutually exclusive (RolloutStrategy)
- gate.inputFrom: exactly one of configMapKeyRef/secretKeyRef (RolloutStrategy)
- pauseDuration >= 30s per step (RolloutStep)

The ValidateCreate() call and 5-minute requeue in the reconciler are
removed — the CRD schema now guarantees the spec is valid before the
reconciler sees it. The webhook Go code is kept for defence-in-depth
when webhook.enabled=true.

Closes #62

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two issues with the initial CEL rules surfaced in k8s 1.27 envtest:

1. gate.input is x-kubernetes-preserve-unknown-fields: true — CEL
   cannot reference it structurally, so has(self.gate.input) failed
   to compile. Removed the gate input/inputFrom mutual-exclusivity
   rule; the webhook Go code still enforces it when enabled.

2. Any list traversal on self.steps at the RolloutStrategy level
   (map, isSorted) hits the schema-level CEL cost budget in k8s 1.27
   because the cost estimator does not propagate maxItems from the
   steps array schema up to the parent object's rule. Removed the
   rampPercentage ordering rule from the CRD; the webhook Go code
   still enforces it when enabled.

Remaining CRD rules (all O(1)): name length, Progressive requires
steps, pauseDuration >= 30s per step, gate.inputFrom exclusivity.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The CRD CEL rules cannot express rampPercentage ordering or gate
input/inputFrom exclusivity (list traversal hits the schema cost
budget; gate.input is x-kubernetes-preserve-unknown-fields). These
checks live only in the webhook Go code, which is optional.

Without a fallback, users who deploy without the webhook and submit
an invalid spec (e.g. decreasing rampPercentage) would silently get
no rollout progress and no signal — a dangerous failure mode since
a decreasing ramp could cause auto-upgrade workflows to roll back.

Re-add ValidateCreate() in the reconciler as a fallback:
- On failure, emit a Warning event and set ConditionProgressing=False /
  ConditionReady=False with reason InvalidSpec.
- Return early with a 5-minute requeue so the invalid spec is never
  acted on (no plan generated, no Temporal API calls).
- Add ReasonInvalidSpec constant to worker_types.go.
- Add TestReconcile_InvalidSpec_EmitsEventAndSetsCondition to cover
  the new path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Returning ctrl.Result{Requeue: true, RequeueAfter: 5m} is unnecessary:
the controller watches TWD objects, so any spec correction the user
applies triggers an immediate reconciliation automatically. Remove the
timer and return ctrl.Result{}, nil instead.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The CRD schema enforces name length, Progressive-requires-steps,
pauseDuration >= 30s, and gate.inputFrom exclusivity at admission
time. Keeping them in the webhook Go code was dead redundancy.

validateRolloutStrategy now only checks the two constraints the CRD
cannot enforce (gate.input is x-kubernetes-preserve-unknown-fields,
rampPercentage ordering requires list traversal that exceeds the k8s
1.27 CEL cost budget):
  - rampPercentage strictly increasing across steps
  - gate.input and gate.inputFrom mutually exclusive

Removed the corresponding test cases from the webhook unit tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Tests hit a real kube-apiserver via envtest (the existing webhook suite
infrastructure) so they verify that the x-kubernetes-validations blocks
in the generated CRD are syntactically correct and semantically enforced
by the API server — independently of the webhook Go code.

Covers all four CRD-level rules:
  - name length > 63 → rejected
  - Progressive strategy with no steps → rejected
  - Progressive step with pauseDuration < 30s → rejected
  - gate.inputFrom with both configMapKeyRef and secretKeyRef → rejected
  - valid TWD → accepted

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…traint

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
name: {{ .Release.Name }}-{{ .Release.Namespace }}-temporalconnection-editor-role
rules:
- apiGroups:
- temporal.io.temporal.io
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mistake previously in main meant no one was using this anyway, but just fix it for symmetry with the other roles. will remove in v1.8

carlydf and others added 6 commits April 27, 2026 18:39
WorkerDeployment never had a mutating webhook in main (only TWD had a
validating one). The Default logic is moved to WorkerDeploymentSpec.ApplyDefaults,
called directly by the controller as it already was before the webhook existed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
+kubebuilder:default markers on ScaledownDelay and DeleteDelay mean
Kubernetes applies these at admission time, so nil values never reach
the controller.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
client.Client
}

func (r *DeprecatedTCReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other actually interesting piece

switch {
case err == nil:
reason = "MigratedToConnection"
message = "Migration complete. Delete this TemporalConnection."
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as soon as a Connection exists with the same name, migration can be considered complete. No ownership transfers needed.

// Migration: if a deprecated TemporalWorkerDeployment with the same name/namespace
// exists and has not yet been migrated, transfer ownership of its child Deployments
// and WorkerResourceTemplates to this WorkerDeployment.
if err := r.migrateFromDeprecatedTWD(ctx, l, &workerDeploy); err != nil {
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting to review

carlydf and others added 6 commits April 27, 2026 19:28
…temporal.io

The deletion block was duplicated during merge resolution — the stale copy
(\"TemporalWorkerDeployment is being deleted\") is identical to the correct
one above it but with old names. Removed the stale copy.

Also fixed groups=temporal.io.temporal.io → temporal.io in the mutating
webhook marker, which was introduced by the same merge.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@carlydf carlydf marked this pull request as ready for review April 28, 2026 03:01
@carlydf carlydf requested review from a team and jlegrone as code owners April 28, 2026 03:01
@carlydf carlydf changed the title [Blocked on all other GA PRs, will go last] CRD Rename CRD rename: TemporalWorkerDeployment → WorkerDeployment, TemporalConnection → Connection Apr 28, 2026
carlydf and others added 3 commits April 27, 2026 20:38
The old CRDs (TemporalWorkerDeployment, TemporalConnection) are not
"fully functional" in v1.7: they cannot be created, are not reconciled,
and will never become Ready. They exist only to allow migration of
resources already on the cluster.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@carlydf carlydf force-pushed the crd-rename-changes branch from 497d545 to 821945d Compare April 28, 2026 04:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

remove Temporal prefix on CRD names

3 participants