Skip to content

feat(decofile): cascade-delete via Revision ownerReference#15

Merged
nicacioliveira merged 2 commits into
mainfrom
feat/decofile-revision-ownerref
May 22, 2026
Merged

feat(decofile): cascade-delete via Revision ownerReference#15
nicacioliveira merged 2 commits into
mainfrom
feat/decofile-revision-ownerref

Conversation

@nicacioliveira
Copy link
Copy Markdown
Collaborator

@nicacioliveira nicacioliveira commented May 22, 2026

Summary

The DecofileReconciler now sets the matching Knative Revision as an ownerReference (controller=false) on the Decofile. When the Revision is later garbage-collected — either by Knative GC (maxNonActiveRevisions: 1) or by the cluster's knative-clean-revisions CronJob — Kubernetes garbage collection cascades:

Revision deleted
   ↓ K8s GC
Decofile deleted (its only owner is gone)
   ↓ K8s GC (existing ownerReference set by this operator)
ConfigMap deleted

No CronJob, no script, no heuristics. Just declarative Kubernetes GC.

Why

Today, every build produces a new Decofile + new Revision pair. When the Revision is later cleaned up (either by Knative or by knative-clean-revisions), the Decofile and its ConfigMap are left as orphans. The cluster currently carries ~2276 Decofiles, the majority of which are orphans of long-deleted Revisions.

The alternative considered was a periodic cleanup CronJob (https://github.com/decocms/infra_applications/pull/92 — kept open as fallback). It works but accumulates complexity (multi-pass with safety cap, TOCTOU guard, dry-run gates). OwnerReference is one mechanism instead of three, native to the platform.

Mechanism

Reconcile path

After fetching the Decofile, syncRevisionOwnerRefs lists Revisions in the same namespace whose app.deco/deploymentId label matches the Decofile's effective deploymentId (the explicit spec.deploymentId if set, otherwise metadata.name), and appends any not-yet-present Revision as an ownerReference. Refetch-before-update prevents optimistic concurrency conflicts with the later status update.

Failures are logged but non-fatal — if Revisions can't be listed for any reason (RBAC drift, API hiccup, transient timeout), the rest of the reconcile (ConfigMap creation, pod notification) continues without the ownerRef. The legacy cleanup fallback handles those rare cases.

Watch path

A new Watches(&servingv1.Revision{}, ...) with a CreateFunc-only predicate enqueues any Decofile in the same namespace whose effective deploymentId matches a newly-created Revision's label. This covers the common case where the admin/build pipeline creates the Decofile slightly before the KSvc, so the Revision doesn't exist yet at the time the Decofile is reconciled.

Update/Delete events on Revisions are intentionally NOT watched — they would only trigger no-op reconciles (existing ownerRef already in place; Delete is handled by K8s GC natively).

Multiple owners (rollback case)

A rollback that re-uses an existing deploymentId creates a new Revision with a different UID but the same label. Both Revisions become owners; the Decofile is only GC'd once both are gone. This is the correct semantics — the Decofile is needed as long as either Revision can scale up.

Tests

internal/controller/decofile_owner_test.go (new file, fake-client based — no envtest extension needed) covers:

  • AddsOwnerWhenRevisionExists — happy path
  • NoMatchingRevisionDoesNothing — early-life Decofile with no Revision yet
  • IsIdempotent — running 3× produces 1 ownerRef
  • SkipsRevisionBeingDeleted — Revision with DeletionTimestamp is ignored
  • MultipleRevisionsBecomeMultipleOwners — rollback / canary case
  • RespectsExplicitDeploymentIdspec.deploymentId wins over metadata.name
  • mapRevisionToDecofile — finds Decofile by defaulted ID, by explicit ID, ignores Revisions without the label

All 9 unit tests pass locally; the existing Ginkgo suite continues to pass (make test).

RBAC

The kubebuilder marker +kubebuilder:rbac:groups=serving.knative.dev,resources=revisions,verbs=get;list;watch was added. make generate manifests regenerated config/rbac/role.yaml and chart/templates/clusterrole-operator-manager-role.yaml (note: the revisions permission joins the existing services permission already granted to the same ClusterRole by the Service webhook).

Backfill of existing Decofiles

The new operator will retroactively patch ownerRefs to any Decofile whose Revision is still alive, on its next reconcile. For Decofiles whose Revision has already been GC'd (likely the majority of the cluster's ~2276), no Revision exists to point to — those remain orphans and need a one-off manual cleanup (separate PR / script in infra_applications).

Test plan

  • Merge + tag a new beta release
  • Bump dependency version in decocms/infra_applications/provisioning/deco-operator/main/Chart.yaml
  • ArgoCD syncs the new operator image on eks-serverless
  • Verify on a sample Decofile that ownerRef is added:
    kubectl -n sites-abracadabra get decofile mhsygflbgo -o jsonpath='{.metadata.ownerReferences}'
  • Trigger or wait for a Knative Revision deletion and confirm the Decofile + ConfigMap cascade
  • Run one-off cleanup script for legacy orphans (separate PR)

Risks

  • Race window (Decofile created before its Revision): brief; covered by the Watch on Revision Create which retriggers reconcile when the Revision appears.
  • Behavior change: Decofile + ConfigMap will now be deleted shortly after their Revision is (vs. lingering indefinitely today). This is the desired behavior — old Decofiles serve no purpose without a Revision — but it is a visible cadence change for anyone monitoring kubectl get decofile -A.
  • No BlockOwnerDeletion — Revisions are not blocked from deletion by their Decofile ownership; cascading deletes proceed normally.

🤖 Generated with Claude Code


Summary by cubic

Adds cascade deletion for Decofiles by setting the matching Knative Revision as an ownerReference, so when a Revision is GC’d the Decofile and its ConfigMap are removed too. Existing Decofiles get ownerRefs on their next reconcile; run a one-off cleanup for legacy orphans whose Revisions are already gone.

  • New Features

    • Add non-controller ownerReference from Decofile to matching Revision (matched by deploymentId); supports multiple Revisions and is idempotent.
    • Watch Revision create events to add ownerRefs when Revisions appear after Decofiles.
    • Update RBAC to get/list/watch Revisions; add unit tests for matching, idempotency, deletes, rollbacks, and explicit deploymentId.
  • Bug Fixes

    • Satisfy golangci-lint: preallocate slices and remove an unused test helper param; no behavior change.

Written for commit 2585560. Summary will update on new commits. Review in cubic

The DecofileReconciler now adds the Knative Revision matching a Decofile's
deploymentId as an ownerReference (controller=false) on the Decofile
itself. When the Revision is later garbage-collected — either by Knative
GC (maxNonActiveRevisions) or by the cluster's knative-clean-revisions
CronJob — Kubernetes garbage collection cascades through
Revision -> Decofile -> ConfigMap, eliminating orphan accumulation.

Mechanism:

* Reconcile path: after fetching the Decofile, syncRevisionOwnerRefs
  lists Revisions in the same namespace whose app.deco/deploymentId
  label matches the Decofile's deploymentId (or metadata.name when
  spec.deploymentId is empty), and appends any not-yet-present Revision
  as an ownerReference. Refetch-before-update avoids optimistic
  concurrency conflicts with the status update that happens later in
  the same Reconcile. Failures are logged but non-fatal — the operator
  continues without owner refs rather than blocking ConfigMap creation.

* Watch path: Revision Create events enqueue any Decofile in the same
  namespace whose effective deploymentId matches the new Revision's
  label, so newly-created Revisions trigger ownerRef sync even when
  the corresponding Decofile already existed (the common case where
  the admin/build pipeline creates the Decofile slightly before the
  KSvc).

Edge cases covered by the implementation and the unit tests:

* No matching Revision yet: nothing is added; reconcile completes.
* Revision in DeletionTimestamp: skipped (don't link to dying owners).
* Multiple Revisions with same deploymentId (rollback): both become
  owners. Kubernetes GC waits for ALL owners to be deleted before
  reclaiming the Decofile.
* Re-running Reconcile is idempotent — existing UIDs are detected and
  not duplicated.
* Explicit spec.deploymentId is respected; decoys matching only the
  Decofile name are ignored.

Defense-in-depth: the controller-cluster CronJob (PR
decocms/infra_applications#92, deferred) remains useful as a fallback
for legacy Decofiles created before this change, and for the rare
case where the Revision is deleted before the operator manages to
patch the ownerRef. Those orphans are still detectable by the script
in infra_applications.

Backfill of the existing fleet (~2276 Decofiles cluster-wide): the
new operator will retroactively patch ownerRefs to any Decofile that
still has a matching Revision, the next time each one is reconciled.
Decofiles whose Revisions are already gone (likely the majority of
the backlog) won't gain an owner — those require a one-off cleanup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 4 files

Re-trigger cubic

hugo-ccabral
hugo-ccabral previously approved these changes May 22, 2026
- prealloc: pre-size toAdd slice to len(revs.Items)
- unparam: drop unused namespace param from test helpers (always "sites-foo")
@nicacioliveira nicacioliveira merged commit b38c2eb into main May 22, 2026
8 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants