Pi-hole on Kubernetes with MetalLB and a Ruckus SSID
🚧 UNDER CONSTRUCTION 🚧 Switch and Ruckus configuration pending. Screenshots to be added.
Pi-hole was already running on the cluster. It worked fine from inside the cluster and via NodePort on weird high ports. The problem was that no actual client device could use it as a DNS server, because DNS clients expect port 53 and NodePort gives you 30054.
What started as “just give Pi-hole a real IP” turned into a new VLAN, a new DHCP server, firewall policy changes, switch configuration, and a Ruckus SSID. The usual homelab scope creep.
Contents#
- The Problem
- MetalLB LoadBalancer
- The IP Collision
- VLAN 201: PIHOLE-USERS
- The Firewall Rules
- DHCP Configuration
- Switch Configuration
- Ruckus SSID Configuration
- Ad Blocking in Action
- What Changed
The Problem#
Pi-hole ran as a Deployment in the heezy namespace with a ClusterIP service and a NodePort service. The NodePort exposed DNS on ports 30054 (TCP) and 30055 (UDP). The web UI was on 30053.
This is useless for actual DNS. Every DNS client on every operating system expects to talk to port 53. You can’t configure a Ruckus AP (or most DHCP servers) to hand out a DNS server on a non-standard port. The NodePort was fine for the admin UI and for testing from inside the cluster, but it couldn’t serve as a real DNS resolver for client devices.
The cluster already had MetalLB running with a single VIP (192.168.1.25) assigned to the SWAG LoadBalancer for reverse proxying. The solution was to expand the MetalLB pool and give Pi-hole its own VIP on port 53.
But then the next problem: I can’t just point the existing USERS VLAN DHCP at Pi-hole. That would force every device in the house through Pi-hole, and I’m not ready to do that yet. Some things break with ad blocking: smart TVs, certain apps, my wife’s patience. I needed a separate SSID that opts in to Pi-hole, which means a separate VLAN, which means a new FortiGate interface, a new DHCP server, switch trunk changes, and a Ruckus WLAN config. Scope creep.
MetalLB LoadBalancer#
The existing MetalLB pool was a single IP: 192.168.1.25-192.168.1.25. I expanded it to 192.168.1.25-192.168.1.27 and created a new LoadBalancer service for Pi-hole requesting .27:
apiVersion: v1
kind: Service
metadata:
name: pihole-lb
namespace: heezy
annotations:
metallb.universe.tf/loadBalancerIPs: "192.168.1.27"
spec:
type: LoadBalancer
selector:
app: pihole
ports:
- port: 53
targetPort: 53
protocol: TCP
name: dns-tcp
- port: 53
targetPort: 53
protocol: UDP
name: dns-udp
- port: 80
targetPort: 80
protocol: TCP
name: http
The metallb.universe.tf/loadBalancerIPs annotation pins the VIP to a specific address instead of letting MetalLB auto-assign from the pool. Without this, MetalLB could hand out any address from the pool depending on creation order, and the SWAG VIP could end up on the wrong IP.
I also removed the node affinity that was pinning Pi-hole to a specific node. With a MetalLB VIP, the pod can float to any node in the cluster and the VIP follows it. That’s the whole point. If a node goes down, the pod reschedules somewhere else and MetalLB re-announces the VIP from there. Pinning to a node would defeat the purpose.
The IP Collision#
I originally assigned the Pi-hole VIP to 192.168.1.26. Immediately after deploying, I couldn’t reach my Cisco 3560 switch anymore. Turns out .26 was the switch’s static management IP. It had been sitting there for years with a DHCP reservation holding its MAC address (00:26:98 is a Cisco OUI, which should have been a clue). MetalLB started answering ARP requests for .26, and the switch lost.
The fix was to move the Pi-hole VIP to .27 and restore the .26 reservation with the switch’s real MAC. The DHCP reservation table now looks like:
| IP | MAC | Description |
|---|---|---|
| 192.168.1.25 | 00:00:00:00:00:25 | MetalLB VIP, SWAG LoadBalancer |
| 192.168.1.26 | 00:26:98:a6:59:40 | Cisco 3560 switch management |
| 192.168.1.27 | 00:00:00:00:00:27 | MetalLB VIP, Pi-hole LoadBalancer |
Lesson: check what’s actually using an IP before you hand it to MetalLB. The dummy MAC reservations for VIPs are there to prevent DHCP from handing the IP out, but they don’t help if something already has it statically configured.
VLAN 201: PIHOLE-USERS#
Rather than forcing every device through Pi-hole, I created a dedicated VLAN for it. Devices that connect to the Pi-hole SSID get ad-blocked DNS. Everything else stays on the regular USERS VLAN with dnsmasq.
| VLAN | Subnet | Interface | DNS | Purpose |
|---|---|---|---|---|
| 200 | 192.168.2.0/24 | users-vlan-200 | dnsmasq (192.168.1.29) | Regular WiFi |
| 201 | 192.168.201.0/24 | pihole-vlan-201 | Pi-hole (192.168.1.27) | Ad-blocked WiFi |
The new VLAN 201 interface is on the same physical port as VLAN 200 (internal7 on the FortiGate), and it’s added to the existing USERS zone. This means all existing firewall policies that use srcintf = "USERS" automatically apply to VLAN 201 traffic too.
resource "fortios_system_interface" "pihole_users_vlan_201" {
name = "pihole-vlan-201"
vdom = "root"
ip = "192.168.201.1 255.255.255.0"
allowaccess = "ping https ssh http"
device_identification = "enable"
role = "lan"
interface = "internal7"
vlanid = 201
}
Note the interface name is pihole-vlan-201 (15 characters). FortiGate interface names have a 15 character limit. The first attempt with pihole-users-vlan-201 (21 characters) failed at terraform plan with a validation error. One of those things you don’t think about until the provider yells at you.
The Firewall Rules#
Since VLAN 201 is in the USERS zone, the srcintf on existing policies already matches. But the srcaddr fields referenced heezy-users-192.168.2.0-24, a specific subnet object that doesn’t include 192.168.201.0/24.
The fix was to create an address group called all-users containing both subnets, and replace the individual subnet reference in every policy that used it:
resource "fortios_firewall_addrgrp" "all_users" {
name = "all-users"
member {
name = "heezy-users-192.168.2.0-24"
}
member {
name = "pihole-users-192.168.201.0-24"
}
}
Policies updated to use all-users:
- 301: Users → FortiGate management UI
- 309: Users → dnsmasq DNS
- 314: Users → Plex (nebula nodes)
- 316: Users → Pi-hole DNS
- 317: Users → WAN (outbound internet, NAT enabled)
The admin policies (311, 312, 313) still use macbook-m4-admin which is a specific host. That stays as-is since my laptop will be on whichever VLAN I connect to.
All terraform-managed policies now carry a comments = "managed-by: terraform" tag so I can tell them apart from the legacy manual rules in the FortiGate GUI. There are about 20 manual policies from the pre-IaC era that still need to be imported or replaced. That’s a separate project.
DHCP Configuration#
Three DHCP changes:
USERS VLAN 200: Shrunk the pool from
.2-.254to.2-.240. The.241-.254range is now headroom for static assignments if needed.PIHOLE-USERS VLAN 201: New DHCP server, pool
.2-.240, withdns_server1 = 192.168.1.27(Pi-hole VIP) anddns_server2 = 1.1.1.1as fallback.SHARED: Both MetalLB VIPs (.25 and .27) have dummy MAC reservations to keep them out of the DHCP pool. The switch at .26 has its real MAC reserved.
resource "fortios_systemdhcp_server" "pihole_users_vlan_201_dhcp" {
fosid = 7
interface = "pihole-vlan-201"
status = "enable"
lease_time = 86400
default_gateway = "192.168.201.1"
netmask = "255.255.255.0"
dns_service = "specify"
dns_server1 = "192.168.1.27"
dns_server2 = "1.1.1.1"
ip_range {
id = 1
start_ip = "192.168.201.2"
end_ip = "192.168.201.240"
}
}
Switch Configuration#
The Cisco 3560 is the one piece of gear that’s still manually configured. VLAN 201 needs to exist on the switch and be tagged on the trunk to the FortiGate and the access ports for the Ruckus APs.
The FortiGate uplink (gi2/0/48) is already a dot1q trunk with no VLAN filter, so VLAN 201 passes through automatically once it exists on the switch. No change needed there.
The AP ports (gi2/0/1 and gi2/0/2) were configured as access ports on VLAN 200:
interface GigabitEthernet2/0/1
description access-point
switchport access vlan 200
switchport mode access
spanning-tree portfast
They need to become trunks carrying both VLAN 200 (untagged, native) and VLAN 201 (tagged) so the Ruckus APs can serve both SSIDs:
conf t
vlan 201
name PIHOLE-USERS
exit
interface GigabitEthernet2/0/1
switchport trunk encapsulation dot1q
switchport mode trunk
switchport trunk native vlan 200
spanning-tree portfast trunk
exit
interface GigabitEthernet2/0/2
switchport trunk encapsulation dot1q
switchport mode trunk
switchport trunk native vlan 200
spanning-tree portfast trunk
exit
end
write memory
The native vlan 200 is important. It keeps untagged traffic on VLAN 200 so existing clients on the regular SSID don’t break. The AP sends VLAN 201 traffic tagged for the Pi-hole SSID.
This change has to be done from a wired connection. If you’re on WiFi through one of these APs and you change the port from access to trunk, you’ll cut yourself off mid-command. The AP might recover once it sees the trunk, or it might not. Don’t risk it.
Ruckus SSID Configuration#
The Ruckus controller web UI is at https://192.168.2.66/admin/login.jsp. The master AP bounced between .45 and .66 at some point. Both have DHCP reservations now so they won’t move again. Finding the master was a pain because Brave didn’t follow the redirect from the old IP like Safari did.
To create the Pi-hole SSID:
- Create a new WLAN
- Set the SSID name
- Set the VLAN to 201
- The DHCP server on the FortiGate handles DNS assignment; clients on VLAN 201 automatically get 192.168.1.27 as their DNS server
No client-side configuration needed. Connect to the SSID, get ad blocking. Connect to the regular SSID, no ad blocking.
Ad Blocking in Action#
What Changed#
| Repo | Changes |
|---|---|
| heezy-k8s | Added pihole LoadBalancer service, removed node affinity, expanded MetalLB pool to .25-.27, added MetalLB config to base, fixed cross-dir kustomization ref |
| terraform-heezy | Added pihole-vlan-201 interface and USERS zone membership, pihole-users address object, all-users address group, updated policies 301/309/314/316 to use group, added policy 317 USERS to WAN, VLAN 201 DHCP with pihole DNS, shrunk USERS pool to .240, fixed .26/.27 DHCP reservations, tagged all policies managed-by terraform |
| Cisco 3560 | VLAN 201 creation, AP ports gi2/0/1 and gi2/0/2 converted to trunk with native VLAN 200 |
| Ruckus | New SSID on VLAN 201 |