Pi-hole on Kubernetes with MetalLB and a Ruckus SSID

🚧 UNDER CONSTRUCTION 🚧 Switch and Ruckus configuration pending. Screenshots to be added.

Pi-hole was already running on the cluster. It worked fine from inside the cluster and via NodePort on weird high ports. The problem was that no actual client device could use it as a DNS server, because DNS clients expect port 53 and NodePort gives you 30054.

What started as “just give Pi-hole a real IP” turned into a new VLAN, a new DHCP server, firewall policy changes, switch configuration, and a Ruckus SSID. The usual homelab scope creep.

Contents#

The Problem#

Pi-hole ran as a Deployment in the heezy namespace with a ClusterIP service and a NodePort service. The NodePort exposed DNS on ports 30054 (TCP) and 30055 (UDP). The web UI was on 30053.

This is useless for actual DNS. Every DNS client on every operating system expects to talk to port 53. You can’t configure a Ruckus AP (or most DHCP servers) to hand out a DNS server on a non-standard port. The NodePort was fine for the admin UI and for testing from inside the cluster, but it couldn’t serve as a real DNS resolver for client devices.

The cluster already had MetalLB running with a single VIP (192.168.1.25) assigned to the SWAG LoadBalancer for reverse proxying. The solution was to expand the MetalLB pool and give Pi-hole its own VIP on port 53.

But then the next problem: I can’t just point the existing USERS VLAN DHCP at Pi-hole. That would force every device in the house through Pi-hole, and I’m not ready to do that yet. Some things break with ad blocking: smart TVs, certain apps, my wife’s patience. I needed a separate SSID that opts in to Pi-hole, which means a separate VLAN, which means a new FortiGate interface, a new DHCP server, switch trunk changes, and a Ruckus WLAN config. Scope creep.

MetalLB LoadBalancer#

The existing MetalLB pool was a single IP: 192.168.1.25-192.168.1.25. I expanded it to 192.168.1.25-192.168.1.27 and created a new LoadBalancer service for Pi-hole requesting .27:

apiVersion: v1
kind: Service
metadata:
  name: pihole-lb
  namespace: heezy
  annotations:
    metallb.universe.tf/loadBalancerIPs: "192.168.1.27"
spec:
  type: LoadBalancer
  selector:
    app: pihole
  ports:
  - port: 53
    targetPort: 53
    protocol: TCP
    name: dns-tcp
  - port: 53
    targetPort: 53
    protocol: UDP
    name: dns-udp
  - port: 80
    targetPort: 80
    protocol: TCP
    name: http

The metallb.universe.tf/loadBalancerIPs annotation pins the VIP to a specific address instead of letting MetalLB auto-assign from the pool. Without this, MetalLB could hand out any address from the pool depending on creation order, and the SWAG VIP could end up on the wrong IP.

I also removed the node affinity that was pinning Pi-hole to a specific node. With a MetalLB VIP, the pod can float to any node in the cluster and the VIP follows it. That’s the whole point. If a node goes down, the pod reschedules somewhere else and MetalLB re-announces the VIP from there. Pinning to a node would defeat the purpose.

The IP Collision#

I originally assigned the Pi-hole VIP to 192.168.1.26. Immediately after deploying, I couldn’t reach my Cisco 3560 switch anymore. Turns out .26 was the switch’s static management IP. It had been sitting there for years with a DHCP reservation holding its MAC address (00:26:98 is a Cisco OUI, which should have been a clue). MetalLB started answering ARP requests for .26, and the switch lost.

The fix was to move the Pi-hole VIP to .27 and restore the .26 reservation with the switch’s real MAC. The DHCP reservation table now looks like:

IP	MAC	Description
192.168.1.25	`00:00:00:00:00:25`	MetalLB VIP, SWAG LoadBalancer
192.168.1.26	`00:26:98:a6:59:40`	Cisco 3560 switch management
192.168.1.27	`00:00:00:00:00:27`	MetalLB VIP, Pi-hole LoadBalancer

Lesson: check what’s actually using an IP before you hand it to MetalLB. The dummy MAC reservations for VIPs are there to prevent DHCP from handing the IP out, but they don’t help if something already has it statically configured.

VLAN 201: PIHOLE-USERS#

Rather than forcing every device through Pi-hole, I created a dedicated VLAN for it. Devices that connect to the Pi-hole SSID get ad-blocked DNS. Everything else stays on the regular USERS VLAN with dnsmasq.

VLAN	Subnet	Interface	DNS	Purpose
200	192.168.2.0/24	users-vlan-200	dnsmasq (192.168.1.29)	Regular WiFi
201	192.168.201.0/24	pihole-vlan-201	Pi-hole (192.168.1.27)	Ad-blocked WiFi

The new VLAN 201 interface is on the same physical port as VLAN 200 (internal7 on the FortiGate), and it’s added to the existing USERS zone. This means all existing firewall policies that use srcintf = "USERS" automatically apply to VLAN 201 traffic too.

resource "fortios_system_interface" "pihole_users_vlan_201" {
  name                  = "pihole-vlan-201"
  vdom                  = "root"
  ip                    = "192.168.201.1 255.255.255.0"
  allowaccess           = "ping https ssh http"
  device_identification = "enable"
  role                  = "lan"
  interface             = "internal7"
  vlanid                = 201
}

Note the interface name is pihole-vlan-201 (15 characters). FortiGate interface names have a 15 character limit. The first attempt with pihole-users-vlan-201 (21 characters) failed at terraform plan with a validation error. One of those things you don’t think about until the provider yells at you.

The Firewall Rules#

Since VLAN 201 is in the USERS zone, the srcintf on existing policies already matches. But the srcaddr fields referenced heezy-users-192.168.2.0-24, a specific subnet object that doesn’t include 192.168.201.0/24.

The fix was to create an address group called all-users containing both subnets, and replace the individual subnet reference in every policy that used it:

resource "fortios_firewall_addrgrp" "all_users" {
  name = "all-users"

  member {
    name = "heezy-users-192.168.2.0-24"
  }
  member {
    name = "pihole-users-192.168.201.0-24"
  }
}

Policies updated to use all-users:

301: Users → FortiGate management UI
309: Users → dnsmasq DNS
314: Users → Plex (nebula nodes)
316: Users → Pi-hole DNS
317: Users → WAN (outbound internet, NAT enabled)

The admin policies (311, 312, 313) still use macbook-m4-admin which is a specific host. That stays as-is since my laptop will be on whichever VLAN I connect to.

All terraform-managed policies now carry a comments = "managed-by: terraform" tag so I can tell them apart from the legacy manual rules in the FortiGate GUI. There are about 20 manual policies from the pre-IaC era that still need to be imported or replaced. That’s a separate project.

DHCP Configuration#

Three DHCP changes:

USERS VLAN 200: Shrunk the pool from .2-.254 to .2-.240. The .241-.254 range is now headroom for static assignments if needed.
PIHOLE-USERS VLAN 201: New DHCP server, pool .2-.240, with dns_server1 = 192.168.1.27 (Pi-hole VIP) and dns_server2 = 1.1.1.1 as fallback.
SHARED: Both MetalLB VIPs (.25 and .27) have dummy MAC reservations to keep them out of the DHCP pool. The switch at .26 has its real MAC reserved.

resource "fortios_systemdhcp_server" "pihole_users_vlan_201_dhcp" {
  fosid           = 7
  interface       = "pihole-vlan-201"
  status          = "enable"
  lease_time      = 86400
  default_gateway = "192.168.201.1"
  netmask         = "255.255.255.0"
  dns_service     = "specify"
  dns_server1     = "192.168.1.27"
  dns_server2     = "1.1.1.1"

  ip_range {
    id       = 1
    start_ip = "192.168.201.2"
    end_ip   = "192.168.201.240"
  }
}

Switch Configuration#

The Cisco 3560 is the one piece of gear that’s still manually configured. VLAN 201 needs to exist on the switch and be tagged on the trunk to the FortiGate and the access ports for the Ruckus APs.

The FortiGate uplink (gi2/0/48) is already a dot1q trunk with no VLAN filter, so VLAN 201 passes through automatically once it exists on the switch. No change needed there.

The AP ports (gi2/0/1 and gi2/0/2) were configured as access ports on VLAN 200:

interface GigabitEthernet2/0/1
 description access-point
 switchport access vlan 200
 switchport mode access
 spanning-tree portfast

They need to become trunks carrying both VLAN 200 (untagged, native) and VLAN 201 (tagged) so the Ruckus APs can serve both SSIDs:

conf t
vlan 201
  name PIHOLE-USERS
exit
interface GigabitEthernet2/0/1
  switchport trunk encapsulation dot1q
  switchport mode trunk
  switchport trunk native vlan 200
  spanning-tree portfast trunk
exit
interface GigabitEthernet2/0/2
  switchport trunk encapsulation dot1q
  switchport mode trunk
  switchport trunk native vlan 200
  spanning-tree portfast trunk
exit
end
write memory

The native vlan 200 is important. It keeps untagged traffic on VLAN 200 so existing clients on the regular SSID don’t break. The AP sends VLAN 201 traffic tagged for the Pi-hole SSID.

This change has to be done from a wired connection. If you’re on WiFi through one of these APs and you change the port from access to trunk, you’ll cut yourself off mid-command. The AP might recover once it sees the trunk, or it might not. Don’t risk it.

Ruckus SSID Configuration#

The Ruckus controller web UI is at https://192.168.2.66/admin/login.jsp. The master AP bounced between .45 and .66 at some point. Both have DHCP reservations now so they won’t move again. Finding the master was a pain because Brave didn’t follow the redirect from the old IP like Safari did.

To create the Pi-hole SSID:

Create a new WLAN
Set the SSID name
Set the VLAN to 201
The DHCP server on the FortiGate handles DNS assignment; clients on VLAN 201 automatically get 192.168.1.27 as their DNS server

No client-side configuration needed. Connect to the SSID, get ad blocking. Connect to the regular SSID, no ad blocking.

Ad Blocking in Action#

What Changed#

Repo	Changes
heezy-k8s	Added pihole LoadBalancer service, removed node affinity, expanded MetalLB pool to .25-.27, added MetalLB config to base, fixed cross-dir kustomization ref
terraform-heezy	Added pihole-vlan-201 interface and USERS zone membership, pihole-users address object, all-users address group, updated policies 301/309/314/316 to use group, added policy 317 USERS to WAN, VLAN 201 DHCP with pihole DNS, shrunk USERS pool to .240, fixed .26/.27 DHCP reservations, tagged all policies managed-by terraform
Cisco 3560	VLAN 201 creation, AP ports gi2/0/1 and gi2/0/2 converted to trunk with native VLAN 200
Ruckus	New SSID on VLAN 201

Project Timeline#

timeline title Pi-hole MetalLB and VLAN 201 Deployment section K8s and Firewall 0133 UTC : LoadBalancer service created : MetalLB pool expanded : Firewall address and policy added : DHCP reservation created 0135 UTC : Terraform apply succeeded : DNS verified through VIP section IP Collision 0136 UTC : Switch unreachable on .26 : MetalLB ARP conflict with Cisco 3560 section VLAN 201 Buildout 0151 UTC : VLAN 201 interface on FortiGate : Added to USERS zone : all-users address group created : Five firewall policies updated : VLAN 201 DHCP server added 0152 UTC : Interface name exceeded 15 char limit 0153 UTC : Renamed to pihole-vlan-201 0155 UTC : Terraform apply succeeded section VIP Collision Fix 0156 UTC : managed-by terraform tags added 0203 UTC : VIP moved from .26 to .27 : Switch restored on .26 0205 UTC : Terraform apply confirmed : DNS verified on .27 section Switch and Ruckus 0230 UTC : Cisco 3560 VLAN 201 created : AP ports converted to trunk : Ruckus SSID created on VLAN 201 0240 UTC : Phone getting 192.168.2.x not 201.x : Ruckus ignoring VLAN tag : AP using native VLAN 200 instead section SSID Troubleshooting 0245 UTC : Ruckus VLAN tagging not working : Deleted and recreated SSID : Phone gets 192.168.201.x address section Blocklists and Testing 0300 UTC : Added 7 community blocklists : 1.5M unique domains blocked : Reddit wildcard block added : Tailscale MagicDNS was bypassing pihole : Fixed Tailscale DNS override : Ad blocking confirmed on laptop : Reddit blocked on laptop