A simple kube-proxy addon for 1:1 NAT services in Kubernetes using an NFT backend.
This project ensures a one-to-one mapping between a service and a pod in Kubernetes.
At Cozystack, we strive to follow the standard Kubernetes network architecture by separating the pod network, service networks, and external load balancers. However, our platform also runs virtual machines that sometimes require an external IP address.
There are several ways to achieve this:
- Using a separate Kube-OVN subnet and exposing it via BGP with kube-ovn-speaker.
- Adding a secondary interface with Multus.
- Using native Kubernetes services with externalIPs and exposing them via MetalLB.
The last option is the simplest and most flexible, but it has a limitation: Kubernetes services do not forward all traffic, but only traffic on specific ports (see: Kubernetes Issue #23864). Additionally, kube-proxy does not perform SNAT, which causes outgoing traffic from the pod to use the default gateway of the host where it is running.
To address these issues, we have added an additional controller that performs 1:1 NAT for services selected by either the service.kubernetes.io/service-proxy-name: cozy-proxy label or the networking.cozystack.io/wholeIP annotation.
cozy-proxy is a simple Kubernetes controller that watches for services selected by either of the following:
service.kubernetes.io/service-proxy-name: cozy-proxylabel (recommended) — the standard Kubernetes mechanism for delegating a service to a non-default proxy. kube-proxy skips services carrying this label, so cozy-proxy becomes the sole handler and no rules collide.networking.cozystack.io/wholeIPannotation — also selects the service for management. The annotation value additionally drives the ingress mode (see below).
When it finds such a service, it creates NFT rules that forward traffic from the service's external IP to the pod's IP and vice versa, performing source-IP preservation for egress traffic.
This controller can be used together with kube-proxy and Cilium in kube-proxy replacement mode.
- If your cluster runs plain kube-proxy (iptables or IPVS mode) — for example, a default RKE2/kubeadm install with Calico or Flannel — use the
service.kubernetes.io/service-proxy-name: cozy-proxylabel. Without it, kube-proxy installs its own LoadBalancer rules that conflict with cozy-proxy's NAT and break outbound SNAT. - If your cluster runs Cilium in kube-proxy replacement mode (as in the reference Cozystack environment), either selector works.
You can safely set both on the same service.
The networking.cozystack.io/wholeIP annotation value selects the ingress mode:
| Value | Behavior |
|---|---|
"true" |
Whole-IP passthrough. All TCP/UDP traffic to the LoadBalancer IP is forwarded to the backend pod. |
"false" |
Per-port filtering. Only TCP/UDP traffic to ports listed in Service.spec.ports is forwarded; rest dropped. |
| absent | Defaults to passthrough (services selected by label only behave the same as wholeIP: "true"). |
In both managed modes, egress traffic from the backend pod is SNATed to the LoadBalancer IP for source-IP preservation.
The optional networking.cozystack.io/allowICMP: "true" annotation, only
meaningful in port-filter mode (wholeIP: "false"), accepts ICMP traffic
toward the backend pod IP that would otherwise be dropped by the per-port
filter. Without it, all ICMP to a port-filtered pod is dropped — which also
blocks ping, PMTU discovery (ICMP "fragmentation needed"), and ICMP
unreachable signalling. Recommended for any service where path-MTU mismatches
or observability matter.
The nftables ruleset placed in table ip cozy_proxy consists of:
- Chain
egress_snatat priorityraw(-300): rewrites packet source IP via thepod_svcmap for outbound traffic from managed pods. Runs before conntrack so the recorded tuple hassaddr=LB_IP. - Chain
ingress_dnatat prioritymangle(-150): rewrites packet destination IP via thesvc_podmap for inbound traffic to a LoadBalancer IP. Runs after conntrack so reply packets of egress flows are matched correctly. - Chain
port_filterat priorityfilter(0): for Services in port-filter mode (wholeIP: "false"), drops ingress packets whose(daddr, l4proto, dport)is not inallowed_ports. The chain accepts packets in conntrack statesestablishedorrelatedfirst, so reply packets of egress flows bypass the filter even when their dport is the VM's ephemeral source port. ICMP is dropped by default; if theallowICMP: "true"annotation is set, the pod IP is added toicmp_allowed_podsand ICMP toward it is accepted before the drop rule.
Install controller using Helm-chart:
helm install cozy-proxy charts/cozy-proxy -n kube-systemCreate a LoadBalancer service with the service.kubernetes.io/service-proxy-name: cozy-proxy label. This also tells kube-proxy to stay away from the service:
apiVersion: v1
kind: Service
metadata:
name: example-service
labels:
service.kubernetes.io/service-proxy-name: cozy-proxy
spec:
allocateLoadBalancerNodePorts: false
externalTrafficPolicy: Local
ports:
- port: 65535 # any
selector:
app: nginx
type: LoadBalancer
---
apiVersion: v1
kind: Pod
metadata:
name: nginx
labels:
app: nginx
spec:
containers:
- name: nginx
image: docker.io/library/nginx:alpineCheck that the service has an external IP:
kubectl get svcExample output:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
example-service LoadBalancer 10.96.195.46 1.2.3.4 65535/TCP 84sNow try to access the service using icmp and tcp; both should work:
ping 1.2.3.4
curl 1.2.3.4Check external IP from inside the pod:
kubectl exec -ti nginx -- curl icanhazip.comExample output would be the same as the service external IP:
1.2.3.4This controller was developed primarily for the Cozystack platform and has been tested in the following environment:
- OS: Talos Linux
- CNI: Kube-OVN with Cilium in chaining mode.
- Kube-proxy: Cilium in kube-proxy replacement mode.
- LoadBalancer: MetalLB in L2 mode with
externalTrafficPolicy: Local.
If you have tested it in other environments, please let us know.
- @kvaps – for the implementation.
- @hexchain – for the Stateless NAT with NFTables snippet.
- @danwinship – for the idea regarding the annotation.