Skip to content

docs(networking): Cilium Gateway API — architecture, security, migration#509

Open
lexfrei wants to merge 5 commits intomainfrom
docs/gateway-api-cilium
Open

docs(networking): Cilium Gateway API — architecture, security, migration#509
lexfrei wants to merge 5 commits intomainfrom
docs/gateway-api-cilium

Conversation

@lexfrei
Copy link
Copy Markdown
Contributor

@lexfrei lexfrei commented Apr 23, 2026

What this PR does

Adds a new networking/gateway-api.md page to the next/ docs trunk describing the Cilium-backed Gateway API feature that lands in cozystack/cozystack#2470 (and its dependency stack on #2464 / #2468).

The page is intentionally detailed because the feature introduces:

  • a new platform-level toggle (gateway.enabled) and a new tenant-level toggle (tenant.spec.gateway);
  • a migration away from ingress-nginx for every cozystack-native exposed service (dashboard, keycloak via HTTPRoute; kubeapiserver, vm-exportproxy, cdi-uploadproxy via TLSRoute passthrough; harbor and bucket attached to per-tenant Gateways);
  • a new per-tenant cert-manager Issuer that gives every tenant an isolated ACME account, so child tenants no longer share HTTP-01 state with the parent;
  • a four-layer runtime admission defence against cross-tenant hostname hijacking (cozystack-gateway-hostname-policy, cozystack-tenant-host-policy, cozystack-namespace-host-label-policy, cozystack-gateway-attached-namespaces-policy) plus the listener allowedRoutes namespace whitelist;
  • a render-time safety net against misconfiguring publishing.gateway.attachedNamespaces with tenant namespaces.

Sections:

  • Overview — one-paragraph summary, opt-in defaults, coexistence with ingress-nginx.
  • Architecture — traffic-path mermaid, listener layout per tenant Gateway.
  • Enabling Gateway API — platform-level Package example and per-tenant Tenant example, with full attachedNamespaces list.
  • Per-service routing — tables for HTTPRoute (termination) and TLSRoute (passthrough), mapping service → namespace → route name → backend → listener.
  • Security — mermaid diagram and one paragraph per admission layer, explaining what each one enforces and what is explicitly left to trust boundaries (cluster-admin credentials, DNS control, shared LB IP pool).
  • Certificates — per-tenant Issuer, supported ACME servers, Let's Encrypt rate limits and mitigations.
  • Migration from ingress-nginx — step-by-step for new and existing clusters.
  • Known limitations — shared IP pool, TLSRoute v1alpha2, tenant.spec.host admin responsibility, upstream application gaps.
  • Troubleshooting — concrete kubectl commands for the four most likely "stuck" states.
  • See also — upstream Gateway API, Cilium docs, KEP-5707, Let's Encrypt rate-limits.

Target branch

next/ — the version-agnostic trunk. When cozystack/cozystack#2470 lands in a minor release, this page ships with that version's docs automatically.

Not included

The legacy v1/networking/gateway-api.md page on the abandoned docs/gateway-api branch (from the Envoy Gateway proposal in cozystack/cozystack#2213) is unrelated to this PR. That PR proposed a different architecture that has since been superseded. This PR ships fresh docs for the new Cilium-based design.

Release note

NONE

Summary by CodeRabbit

  • Documentation
    • New Gateway API guide describing the opt-in, platform-level and per-tenant gateway model with Cilium-based traffic flow.
    • Instructions for migrating from ingress-nginx to HTTPRoute/TLSRoute, TLS termination vs SNI passthrough, and listener behaviors.
    • Per-tenant cert-manager ACME setup (prod/staging), rate-limit mitigation strategies, security/isolation controls, and step-by-step troubleshooting.

…r-tenant ingress

Covers the architecture, the two-step opt-in (gateway.enabled at
platform level, tenant.spec.gateway per tenant), per-service routing
(HTTPRoute for termination, TLSRoute for passthrough), the four
independent ValidatingAdmissionPolicies that guard cross-tenant
hostname hijacking plus the listener allowedRoutes whitelist, the
per-tenant cert-manager Issuer that enables isolated ACME state for
child tenants, migration from ingress-nginx, rate-limit
considerations, and operational troubleshooting.

Weight 15 places the page between 'Architecture' (5) and 'HTTP
Cache' (20) in the networking section sidebar.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 23, 2026

Deploy Preview for cozystack ready!

Name Link
🔨 Latest commit 9113a50
🔍 Latest deploy log https://app.netlify.com/projects/cozystack/deploys/69ea80f7694c660009153071
😎 Deploy Preview https://deploy-preview-509--cozystack.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 23, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a5d91740-2eb1-4946-aaeb-d3983856baa1

📥 Commits

Reviewing files that changed from the base of the PR and between 2a68b49 and 9113a50.

📒 Files selected for processing (1)
  • content/en/docs/next/networking/gateway-api.md
✅ Files skipped from review due to trivial changes (1)
  • content/en/docs/next/networking/gateway-api.md

📝 Walkthrough

Walkthrough

Adds a new Cozystack Gateway API documentation page describing an opt-in Cilium-based per-tenant Gateway model, traffic flows (Envoy DaemonSet, HTTPRoute/TLSRoute), cert-manager ACME setup, admission policies and isolation, migration steps, and kubectl troubleshooting.

Changes

Cohort / File(s) Summary
Gateway API Documentation
content/en/docs/next/networking/gateway-api.md
Adds a comprehensive guide for Cilium Gateway API on Cozystack: opt-in platform and per-tenant enablement, listener behavior, HTTPRoute vs TLSRoute routing, cert-manager ACME configuration and rate-limit mitigation, ValidatingAdmissionPolicies and namespace-level allowedRoutes, ingress-nginx migration steps, and kubectl troubleshooting for Gateway/cert/LB issues.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Kubernetes
    participant PlatformConfig as Platform Helm/Values
    participant CertManager
    participant CiliumEnvoy as Cilium Envoy DaemonSet
    participant LoadBalancerPool as Cilium LB IP Pool
    participant TenantResources as Tenant (Namespace)

    User->>PlatformConfig: enable gateway.enabled (opt-in)
    PlatformConfig->>Kubernetes: render templates (gateway templates, policies)
    User->>Kubernetes: create TenantResources with spec.gateway: true
    Kubernetes->>TenantResources: create Gateway, Issuer, Certificate, HTTPRoute/TLSRoute
    TenantResources->>CertManager: request ACME cert (prod/stage)
    CertManager->>TenantResources: Certificate ready
    Kubernetes->>LoadBalancerPool: allocate LB IP (CiliumLoadBalancerIPPool)
    CiliumEnvoy->>Kubernetes: attach Gateway listeners (base listeners on port 443)
    User->>CiliumEnvoy: client traffic (TLS termination or SNI passthrough)
    CiliumEnvoy->>TenantResources: route to services per HTTPRoute/TLSRoute
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 I hopped through YAML and charts so bright,

Tenants get gateways, each shining light,
Certs stitched with care, routes set just right,
Envoy hums softly through day and night,
Hooray for docs — I scribbled with delight! 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and clearly summarizes the main change: documentation for Cilium Gateway API covering architecture, security, and migration from ingress-nginx.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch docs/gateway-api-cilium

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@lexfrei lexfrei marked this pull request as ready for review April 23, 2026 17:43
@lexfrei lexfrei requested review from kvaps and lllamnyp as code owners April 23, 2026 17:43
@lexfrei lexfrei self-assigned this Apr 23, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces comprehensive documentation for the Gateway API support in Cozystack, detailing its architecture, security model, and migration path from ingress-nginx. The review feedback identifies opportunities to improve technical accuracy and consistency, specifically by clarifying that namespace whitelisting applies to both HTTPRoute and TLSRoute resources and resolving a naming inconsistency for the Kubernetes API route.

- The exposed-service templates (dashboard, keycloak) stop rendering their `Ingress` and start rendering their `HTTPRoute`.
- TLS-passthrough services (cozystack-api, vm-exportproxy, cdi-uploadproxy) stop rendering their `Ingress` and start rendering a `TLSRoute` attached to a dedicated Passthrough listener.

The `attachedNamespaces` list restricts which namespaces may attach `HTTPRoute`s to tenant Gateways through the listener `allowedRoutes` whitelist (see [Security](#security)). It is also guarded by a runtime `ValidatingAdmissionPolicy` that rejects any `tenant-*` entry.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The documentation mentions that attachedNamespaces restricts HTTPRoute attachments. However, the architecture also utilizes TLSRoute for services like the Kubernetes API and KubeVirt proxies (as shown in the routing tables). It would be more accurate to state that this list applies to both HTTPRoute and TLSRoute (or Gateway API routes in general).

Suggested change
The `attachedNamespaces` list restricts which namespaces may attach `HTTPRoute`s to tenant Gateways through the listener `allowedRoutes` whitelist (see [Security](#security)). It is also guarded by a runtime `ValidatingAdmissionPolicy` that rejects any `tenant-*` entry.
The attachedNamespaces list restricts which namespaces may attach HTTPRoutes and TLSRoutes to tenant Gateways through the listener allowedRoutes whitelist (see [Security](#security)). It is also guarded by a runtime ValidatingAdmissionPolicy that rejects any tenant-* entry.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1ea0093: the paragraph now says HTTPRoute or TLSRoute. The allowedRoutes whitelist on the listener is route-kind-agnostic, so in practice it restricts every route type that attaches to the Gateway — including the TLSRoutes used for the Kubernetes API, vm-exportproxy, and cdi-uploadproxy.


| Service | Namespace | `TLSRoute` name | Backend | Listener |
|---|---|---|---|---|
| Kubernetes API | `default` | `kubernetes-api` | `kubernetes:443` | `tls-api` |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is an inconsistency in naming the Kubernetes API route. It is referred to as cozystack-api in the Mermaid diagram (line 27) and the migration section (line 265), but as kubernetes-api in this table. Using a consistent name throughout the document would improve clarity.

Suggested change
| Kubernetes API | `default` | `kubernetes-api` | `kubernetes:443` | `tls-api` |
| Kubernetes API | default | cozystack-api | kubernetes:443 | tls-api |

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in dcb805c by aligning the Mermaid diagram to the real resource name. The TLSRoute is literally named kubernetes-api (see packages/system/cozystack-api/templates/api-tlsroute.yaml), so the table at line 144 is the source of truth. cozystack-api in the diagram referred to the cozystack package that ships this route, which was misleading. The diagram now says kubernetes-api and the migration prose clarifies the relationship (cozystack-api (Kubernetes API)).


Every listener on a tenant Gateway pins `allowedRoutes.namespaces.from: Selector` to a `matchExpressions` whitelist against the built-in `kubernetes.io/metadata.name` label. That label is written by kube-apiserver on every namespace and cannot be spoofed.

The whitelist is the publishing tenant's namespace (always, implicit) plus `publishing.gateway.attachedNamespaces`. A namespace outside the list literally cannot attach any `HTTPRoute` to the Gateway.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the comment on line 101, this section should clarify that the whitelist applies to both HTTPRoute and TLSRoute, as both are used in the described architecture.

Suggested change
The whitelist is the publishing tenant's namespace (always, implicit) plus `publishing.gateway.attachedNamespaces`. A namespace outside the list literally cannot attach any `HTTPRoute` to the Gateway.
The whitelist is the publishing tenant's namespace (always, implicit) plus publishing.gateway.attachedNamespaces. A namespace outside the list literally cannot attach any HTTPRoute or TLSRoute to the Gateway.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e15d865: the Layer 1 description now explicitly says HTTPRoute or TLSRoute. Same root cause as the line 101 comment — the listener-level whitelist applies to every route kind attaching to that listener.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@content/en/docs/next/networking/gateway-api.md`:
- Line 56: The in-page anchor "#tls-passthrough" in the sentence "Plus one extra
listener per TLS-passthrough service (see [TLS passthrough](`#tls-passthrough`)
below)" doesn't match the actual heading ID; locate the "TLS passthrough"
section heading in this document and either rename that heading (or add an
explicit HTML anchor/id) to produce the ID tls-passthrough, or update the link
fragment to the existing heading ID (for example whatever the generated slug
is); ensure the link target and the heading ID for the TLS passthrough section
are identical so the anchor works.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1a003edc-54d7-4122-a90c-e40d9592e1c7

📥 Commits

Reviewing files that changed from the base of the PR and between 5415111 and 2a68b49.

📒 Files selected for processing (1)
  • content/en/docs/next/networking/gateway-api.md

Comment thread content/en/docs/next/networking/gateway-api.md Outdated
lexfrei added 4 commits April 23, 2026 23:27
…d TLSRoute

Address review feedback from gemini-code-assist on
content/en/docs/next/networking/gateway-api.md:101: the whitelist guards both
HTTPRoute attachments (dashboard, keycloak, harbor, bucket) and TLSRoute
attachments (Kubernetes API, vm-exportproxy, cdi-uploadproxy), not only
HTTPRoute.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
…TLSRoute name kubernetes-api

Address review feedback from gemini-code-assist on
content/en/docs/next/networking/gateway-api.md:144: the routing table listed
the TLSRoute as kubernetes-api (the real resource name in the cozystack-api
package, pointing at the kubernetes Service in the default namespace), but
the Mermaid diagram labelled it cozystack-api. Update the diagram to match
the actual resource name and add a parenthetical clarification in the
migration section that the cozystack-api package ships the Kubernetes API
TLSRoute.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
…te and TLSRoute

Address review feedback from gemini-code-assist on
content/en/docs/next/networking/gateway-api.md:185: the Security section's
Layer 1 description said the listener allowedRoutes whitelist blocks
HTTPRoute attachments, but listener.allowedRoutes in Gateway API applies to
every route kind attaching to that listener — HTTPRoute on the HTTPS
listeners and TLSRoute on the tls-* Passthrough listeners.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
…section

Address review feedback from coderabbitai on
content/en/docs/next/networking/gateway-api.md:56: the link fragment
#tls-passthrough did not match the heading ID Hugo generates for
'TLSRoute (TLS passthrough)' (which slugifies to tlsroute-tls-passthrough),
so the jump target was broken and markdownlint-cli2 flagged MD051.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant