haondt[blog]

Upgrading my Homelab Kubernetes Cluster from Ingress to Gateway API

Dark days are upon us as Ingress NGINX, a most beloved ingress implementation, is soon being retired due to lack of maintainership effort. Unfortunately a common story with many FOSS projects. Per their recommendation, I have decided to migrate my homelab clusters to the Gateway API — the successor to the Ingress mechanism.

There are quite a few implementations of the Gateway API to pick from. I also found this handy GitHub repo by howardjohn, which provides a performance-based comparison between some of the implementations. From that comparison, there are only two "full pass" choices, Istio and Kgateway. I didn't do much more research to pick between them. Both use Envoy as a reverse proxy, and some light Googling suggests Istio might be more performant, but more complex to set up (as it is much more than just a gateway), so I went with Kgateway.



Ingress NGINX

First, some details about my old setup. I won't go into too much detail about my homelab config (perhaps in another post), but part of my lab involves a couple of Kubernetes clusters. Some are single-node, one is multi-node. I typically split the stuff running on my clusters into workloads (e.g. a regular old app like Plex) and add-ons (e.g. operators like Ingress NGINX). I manage all the add-ons with ansible and workloads get deployed via a GitLab pipeline and kapp.

I used both of these methods to deploy Ingress NGINX. Let's first take a look at the ansible playbooks. For this I just installed Ingress NGINX with the Helm chart, using the an ansible template to generate the vars. Let's see how that template is composed. First we have a section that allows setting the ingress pod to use the host network:

controller:

{% if kubernetes.ingress_nginx.host_network.enabled %}
  dnsPolicy: ClusterFirstWithHostNet
  hostNetwork: true
{% endif %}

  service:
    <<: {}
{% if kubernetes.ingress_nginx.host_network.enabled %}
    enabled: false
{% endif %}

Next, if an ip address is supplied, we add some annotations to solicit the ip address from kube-vip or metallb if they are installed.

    annotations:
      <<: {}
{% if kubernetes.ingress_nginx.load_balancer_ip is defined %}
{% if kubernetes.kube_vip_cloud_controller.enabled %}
      kube-vip.io/loadbalancerIPs: {{ kubernetes.ingress_nginx.load_balancer_ip }}
{% endif %}
{% if kubernetes.metallb.enabled %}
      metallb.io/loadBalancerIPs: {{ kubernetes.ingress_nginx.load_balancer_ip }}
{% endif %}
{% endif %}

Lastly, if a NodePort is desired, that can be added with a dedicated port for https traffic.

{% if kubernetes.ingress_nginx.node_port.enabled %}
    type: NodePort
    externalTrafficPolicy: null
{% if kubernetes.ingress_nginx.node_port.https is defined %}
    nodePorts:
      https: {{ kubernetes.ingress_nginx.node_port.https }}
{% endif %}
{% else %}
    externalTrafficPolicy: "Local"
{% endif %}

Next let's see what gets deployed on the workload side. I generate manifests from a config file (we'll call it a 'service file') and pass those to kapp to deploy them to a cluster. The service file will have a line like

ingresses:
- port: http
  host: "vaultwarden.marble.local.haondt.dev"

Which gets translated into an ingress manifest:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: vaultwarden-primary-vaultwarden-marble-local-haondt-dev
  namespace: deployments-vaultwarden
spec:
  ingressClassName: nginx
  rules:
  - host: vaultwarden.marble.local.haondt.dev
    http:
      paths:
      - backend:
          service:
            name: vaultwarden-primary
            port:
              number: 80
        path: /
        pathType: Prefix
  tls:
  - hosts:
    - '*.marble.local.haondt.dev'
    secretName: vaultwarden-primary-vaultwarden-marble-local-haondt-dev-tls


I also use tls on these services. I am using cert-manager for this, but one of the quirks with this setup is that if you just generate certs from ingresses, you end up with one cert per ingress. I have many apps all hosted under the same base domain, and I would prefer a single wildcard cert to reuse between them. To do this, I first generate a single cert from ansible. The template for this resource looks like this:

{% if kubernetes.cert_manager.certificates | length > 0 %}
{% for k,v in  kubernetes.cert_manager.certificates.items() %}
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: {{ k }}
  namespace: cert-manager
spec:
{% if 'dns_names' in v %}
  dnsNames:
{% for dns_name in v.dns_names %}
  - "{{ dns_name }}"
{% endfor %}
{% else %}
  dnsNames: []
{% endif %}
  issuerRef:
    group: cert-manager.io
    kind: ClusterIssuer
    name: letsencrypt-prod
  secretName: {{ k }}-tls
  usages:
    - digital signature
    - key encipherment
  secretTemplate:
    annotations:
      reflector.v1.k8s.emberstack.com/reflection-allowed: "true"
{% endfor %}
{% endif %}

You can see the reflector annotation to allow the matching secret to be reflected into the workload namespaces. The mirror secret is referenced in the aformentioned ingress manifest. When I generate that ingress manifest I also generate the mirror, which is populated by kubernetes-reflector, since an ingress cannot reference a secret in a different namespace:

apiVersion: v1
data:
  tls.crt: ''
  tls.key: ''
kind: Secret
metadata:
  annotations:
    reflector.v1.k8s.emberstack.com/reflected-version: ''
    reflector.v1.k8s.emberstack.com/reflects: cert-manager/haondt-dev-default-tls
  name: vaultwarden-primary-vaultwarden-marble-local-haondt-dev-tls
  namespace: deployments-vaultwarden
type: kubernetes.io/tls


Installing the Gateway API

Now that we've established where we came from, we can talk about where we're going. I mostly followed the excellent Kgateway docs to get everything up and running, translating each step into an ansible role as I went. The Gateway API itself consists entirely of CRDs so installation is straightforward. I went ahead and created a standalone role for just the Gateway API, and set it as a dependency for the Kgateway role. This way I can reuse it if I switch to a different Gateway API implementation down the road.

- name: Install kubernetes gateway api crds
  k8s:
    state: present
    src: "https://github.com/kubernetes-sigs/gateway-api/releases/download/{{ kubernetes.kubernetes_gateway_api_crds.version }}/standard-install.yaml"
    server_side_apply:
      field_manager: ansible


Installing Kgateway

Then I wrote a role to install the Kgateway CRDs and Kgateway itself with Helm.

- name: Install kgateway crds via helm
  block:
    - name: Install kgateway-crds Helm chart
      kubernetes.core.helm:
        name: kgateway-crds
        chart_ref: oci://cr.kgateway.dev/kgateway-dev/charts/kgateway-crds
        release_state: present

- name: Install kgateway via helm
  block:
    - name: Install kgateway Helm chart
      kubernetes.core.helm:
        name: kgateway
        chart_ref: oci://cr.kgateway.dev/kgateway-dev/charts/kgateway
        release_state: present
        create_namespace: true

Finally I added to that role a step to create some custom resources.

- name: Template custom resources
  template:
    src: custom-resources.yaml.j2
    dest: "{{ haondt.artifacts }}/kgateway-custom-resources.yaml"

- name: Apply custom resources
  k8s:
    state: present
    src: "{{ haondt.artifacts }}/kgateway-custom-resources.yaml"

For my homelab I only need one Gateway per cluster, so my custom resources template includes a single Gateway and GatewayParameters. Starting with the Gateway, the first thing I noticed was that cert-manager can issue certificates directly to a gateway. To make this work you need cert-manager >= 1.15 and config.enableGatewayAPI: true when installing it with Helm. More info here.

---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: {{ kubernetes.kgateway.gateway_name }}
  namespace: {{ kubernetes.kgateway.namespace }}
{% if kubernetes.cert_manager.enabled %}
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
{% endif %}

Filling in the Gateway spec, here we will reference the GatewayParameters that we will add later.

spec:
  gatewayClassName: kgateway
  infrastructure:
    parametersRef:
      name: {{ kubernetes.kgateway.gateway_name }}
      group: gateway.kgateway.dev
      kind: GatewayParameters

Next we define the listeners. This is going to be all the hostname + port combos the gateway should attach to. You can put wildcards here, which for my purposes allows me to have a single gateway for an entire cluster. Using the same secret name for all the listeners also means that cert-manager will only generate one cert for the whole gateway. This means I no longer have to define a certificate separately.

I am also adding allowedRoutes.namespaces.from: All to my listeners. This means the HTTPRoutes (Ingress equivalent) that I create for each workload can be in their own namespace. Because the cert is in the Gateways namespace, I no longer need the kubernetes-reflector workaround.

For the purposes of my clusters, everything gets routed through https, so I have protocol: HTTPS and port: 443 for all the listeners.

  listeners:
{% for name, l in kubernetes.kgateway.listeners.items() %}
    - protocol: HTTPS
      port: 443
      name: "{{ name }}"
      hostname: "{{ l.hostname }}"
      allowedRoutes:
        namespaces:
          from: All
      tls:
        mode: Terminate
        certificateRefs:
          - name: haondt-kgateway-default-tls # same secret name = single cert
{% endfor %}

And the last part of the Gateway template is the ip address. This is implementation dependent. I don't think Kgateway will actually assign the ip given here, at least I'm not relying on it for my cluster. I am just putting it in here for visibility.

{% if kubernetes.kgateway.load_balancer_ip is defined %}
  addresses:
    - type: IPAddress
      value: '{{ kubernetes.kgateway.load_balancer_ip }}'
{% endif %}

Moving on to the GatewayParameters. This is one of the ways in which Kgateway allows you to provide extra configuration for the Envoy proxy. First thing I templated was the NodePort. This maps the external node port to the internal gateway port.

---
apiVersion: gateway.kgateway.dev/v1alpha1
kind: GatewayParameters
metadata:
  name: {{ kubernetes.kgateway.gateway_name }}
  namespace: {{ kubernetes.kgateway.namespace }}
spec:
  kube: 
    service:
{% if kubernetes.kgateway.node_port.enabled %}
      type: NodePort
      externalTrafficPolicy: "Cluster"
{% if kubernetes.kgateway.node_port.ports | length > 0 %}
      ports:
{% for node_port, gateway_port in kubernetes.kgateway.node_port.ports.items() %}
        - nodePort: {{ node_port }}
          port: {{ gateway_port }}
{% endfor %}
{% endif %}
{% else %}
      externalTrafficPolicy: "Local"
{% endif %}

Next we populate extraAnnotations with the ip, to request it from kube-vip or metallb.

      extraAnnotations:
        <<: {}
{% if kubernetes.kgateway.load_balancer_ip is defined %}
{% if kubernetes.kube_vip_cloud_controller.enabled %}
        kube-vip.io/loadbalancerIPs: {{ kubernetes.kgateway.load_balancer_ip }}
{% endif %}
{% if kubernetes.metallb.enabled %}
        metallb.io/loadBalancerIPs: {{ kubernetes.kgateway.load_balancer_ip }}
{% endif %}
{% endif %}

And that's it for the GatewayParameters. There is no config for host networking. As far as I can tell, Kgateway doesn't support setting host networking on the gateway pod. It can be done with some external automation, but ultimately I don't mind setting a dedicated ip for workload ingress, as I was already doing this on multi-node clusters anyways.

There are more ways to configure the underlying Envoy proxy. I wanted to enable websockets and have it apply globally to all connections, so I also added an HTTPListenerPolicy:

---
apiVersion: gateway.kgateway.dev/v1alpha1
kind: HTTPListenerPolicy
metadata:
  name: {{ kubernetes.kgateway.gateway_name }}-upgrades
  namespace: {{ kubernetes.kgateway.namespace }}
spec:
  targetRefs:
    - group: gateway.networking.k8s.io
      kind: Gateway
      name: {{ kubernetes.kgateway.gateway_name }}
  upgradeConfig:
    enabledUpgrades:
      - websocket


Setting up the Workloads

For the workloads, all I had to do was convert my ingress generator to an http route generator. In my service files I just added a new config with the same values:

gateway.http_routes:
- port: http
  host: "vaultwarden.marble.local.haondt.dev"

This was translated into HTTPRoutes:

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: primary-vaultwarden-marble-local-haondt-dev
  namespace: deployments-vaultwarden
spec:
  hostnames:
  - vaultwarden.marble.local.haondt.dev
  parentRefs:
  - group: gateway.networking.k8s.io
    kind: Gateway
    name: haondt-default
    namespace: kgateway-system
  rules:
  - backendRefs:
    - kind: Service
      name: vaultwarden-primary
      port: 80

The route references the gateway to which it belongs and the service it should point at. Quite similar to an Ingress, except you no longer need to reference the cert. Accordingly, I also skipped the part that generated a secret reflector.


Cleaning up NGINX

So at this point, everything was available both through the ingress and through the gateway. There is lots of options for how to smoothly transition one to the other. In my case, since I self-host my DNS server, I just changed the DNS rules for each of my clusters to point at the gateways assigned ip address. For my multi-node cluster, I actually uninstalled Ingress NGINX before assigning the ip address to the gateway. This way I could keep the same ip address (and avoid having to wait for DNS cache invalidations) at the cost of a few minutes of downtime — not an issue in a low-stakes homelab.

From here I just had to tear down Ingress NGINX, which meant:

  • Deleting all the ingresses
  • Deleting the manually-created certificates
  • Deleting the secrets and secret reflectors associated with the certificates
  • Uninstalling the Ingress NGINX Helm chart
  • Deleting the namespace

Getting all this done was trivial. An ansible role to uninstall the Helm chart and delete the custom resources took care of the add-ons. For the workload stuff (ingresses and reflectors) I just removed the ingress section from the service files and kapp took care of the rest.


Final Thoughts

Actually putting everything to use, the change has been... unnoticeable. Which is exactly what I want. I don't notice any latency or connection issues or anything. I am also happy to have been able to clean up some of the mess with the certificates and reflectors. I think the fact that I was able to get rid of the reflectors helps show how the Gateway API is a much better fit, architecturally, as compared to the Ingress system. Granted my proxy config for these clusters is pretty basic. The most custom part was the websocket support, which meant I didn't have to get too deep into the Envoy weeds. The whole migration was maybe a couple weekends worth of work, and the majority of it was just reading documentation. Since everything was done through scripts and ansible roles, once I figured it out for one workload on one cluster, it was trivial to apply to the others.

That being said, I don't think it is necessarily smooth sailing for everyone. I have another machine in my lab that is using NGINX, and it uses the configuration features of NGINX much more heavily. It acts as gateway for all WAN traffic and makes use of features like proxy auth, path-based ip whitelisting, and cloudflare-based proxy header trust. Luckily I am planning on keeping that one on docker-compose for the forseeable future, but if I were to migrate it, I'm sure it would be quite a headache converting all that to Envoy config. In fact, the Gateway API docs have a dedicated guide for Ingress NGINX users, and the Kubernetes team has built a tool, ingress2gateway to help with migrating resources.

If you're thinking about making the move yourself, I do think its worth the effort. Ultimately more dev effort is going to be put towards Gateway API, I think it's better to reduce the risk of finding yourself stuck with an out-of-support tool. The API is mature (GA) and there are multiple fully conformant implementations, some with direct integrations with other K8s networking tools like Cilium or Istio. With Ingress NGINX going out of support in a few short months, I'd say now is a good time to switch. Not to mention you might end up like me, where the migration resulted in a reduction in the complexity of resource topology.

All the config described in this article is on my GitHub, you can check out the ansible playbooks here, the service files here and the manifest generation script here.