The Heezy: A Homelab That Got Out of Hand

Contents#

What Even Is a Heezy#

If you grew up in the early 2000s, you probably remember when Snoop Dogg had everyone adding “-izzle” to everything. “For sheezy” was peak vocabulary for a 12-year-old who spent too much time on Counter-Strike and not enough time on homework. “Heezy” rhymes with “easy,” which is what I told myself this project would be. It was not easy. But the name stuck, and now my entire infrastructure is named after slang that peaked in 2003. No regrets.

I’m a millennial who grew up as a gamer, became an infrastructure guy, picked up some development along the way, and eventually decided that paying for cloud services to host my own stuff was offensive. So I built a homelab. Then I kept building it. Then I started managing it like production infrastructure because apparently I can’t help myself.

This is the architecture overview. If you want the gory details of any specific component, the other blog posts go deep on individual war stories.

The Network#

Everything starts with the network. Four VLANs, one FortiGate firewall doing all the routing, and a healthy amount of paranoia about zone isolation.

VLAN	Subnet	Zone	What Lives Here
native	10.x.x.0/24	SHARED	Servers, k8s nodes, NAS, monitoring
200	10.x.x.0/24	USERS	Desktops, laptops, phones, wireless
3	10.x.x.0/24	DMZ	Public-facing game servers
2000	10.x.x.0/24	PROD	Reserved for future use

The FortiGate sits at x.x.x.1 on every subnet and handles all inter-VLAN routing. No direct routes between VLANs exist. If traffic needs to cross zones, there’s a firewall policy for it or it gets dropped. Every policy is managed in Terraform. No clicking around in the GUI, no “just this one quick rule” in the CLI. Every rule is in git or it doesn’t exist.

The DMZ is the strictest zone. Game servers live there on dedicated VMs, one service per host. DMZ hosts never initiate connections into SHARED or USERS. They don’t use internal DNS. They don’t talk to the monitoring stack. They face the internet and that’s it.

Ideally I’d also be running private VLANs on the Cisco 3560 to isolate hosts within the same broadcast domain. PVLANs would let the DMZ game servers sit on the same subnet without being able to talk to each other, which is the right way to do it when you have multiple untrusted hosts on one VLAN. The switch supports it. I know the syntax. But setting up Ansible automation for a single 48-port switch that changes maybe twice a year is not a smart use of time, and I’m not going back to managing switch configs by hand. So for now, the FortiGate policies handle the isolation at L3 and the switch stays manually configured. It’s on the list. It’ll stay on the list.

The Kubernetes Cluster#

Five nodes. Five identical Ubuntu 24.04 boxes named nebula-1 through nebula-5, running MicroK8s 1.32.9 with Calico CNI and VXLAN encapsulation.

nebula-1  10.x.x.x  HA standby
nebula-2  10.x.x.x  HA master
nebula-3  10.x.x.x  HA master
nebula-4  10.x.x.x  HA standby
nebula-5  10.x.x.x  HA master

This cluster runs a mix of self-hosted applications: a media server, music streaming, a personal website, DNS ad-blocking via Pi-hole, uptime monitoring, a Tailscale exit node, monitoring exporters, and a reverse proxy handling TLS termination for public-facing services. At any given time there are 25+ pods spread across the nodes.

The lab has been through a few eras before this. At one point I had a Cisco ASA sitting in the rack because I wanted to learn firewall syntax for a job I was chasing. I also, somewhat shamefully, stood up a Windows Server domain at one point so I could “learn Active Directory.” At the time it actually served me well and helped me land a role. In retrospect, I’m glad I didn’t pursue anything with Microsoft professionally or as a specialty. But the homelab has always been the place where I try things, and not everything I’ve tried aged well.

Storage is split between Longhorn (replicated block storage for app configs) and NFS (a FreeNAS box at 10.x.x.x0 serving up shared data). The split matters: Longhorn gives you replicated, node-failure-tolerant config storage. NFS gives you a single massive shared pool that every pod can read from.

Ingress to the cluster comes through two paths:

Cloudflare Tunnel via SWAG for public access (various services on yourdomain.tld)
NodePorts for LAN-only access (everything gets a NodePort in the 30000-32767 range)

MetalLB provides a single VIP at 10.x.x.x that the SWAG LoadBalancer service claims. Internal DNS overrides yourdomain.tld subdomains to point at this VIP so LAN clients skip the Cloudflare round-trip entirely.

What Runs on the Cluster#

Service	What It Does
Plex	Media server
Navidrome	Music streaming (Subsonic-compatible)
Aurral	Music discovery
SWAG	Reverse proxy, TLS termination, personal website
Pi-hole	DNS-level ad blocking
Uptime Kuma	Service uptime monitoring
Tailscale	VPN exit node for remote access
Promtail	Log shipping (DaemonSet on all nodes)
kube-state-metrics	Cluster state metrics for Prometheus
Exportarr	Application metrics for Prometheus

Plus a handful of other self-hosted services that handle various automation tasks. Everything communicates internally via Kubernetes DNS using FQDNs like plex.heezy.svc.cluster.local:32400. Short names don’t work reliably in all container images because of how they handle DNS resolution. Always use the full FQDN.

Some services run behind VPN sidecars (gluetun) that route all their traffic through NordVPN. The gluetun container handles the VPN tunnel and firewall rules. You have to explicitly open ports with FIREWALL_INPUT_PORTS or the sidecar blocks all inbound traffic, including from other pods in the cluster. Learned that one the hard way.

The DMZ#

Game servers get their own dedicated VMs on VLAN 3. Each one is a Proxmox VM running Docker Compose with a single service.

Host	IP	What
dmz-minecraft	10.x.x.x	Minecraft Bedrock (UDP 19132/19133)
dmz-cs16	10.x.x.x	Counter-Strike 1.6 (UDP 27015, TCP 80)
dmz-minecraft-java	10.x.x.x	Minecraft Java (TCP 25565)

Yes, I still run a CS 1.6 server. Some things are sacred.

Each DMZ host gets its own FortiGate address object, VIP for inbound NAT, and firewall policies. The GitHub Actions runner on SHARED has SSH access for Ansible deployments, but that’s the only inbound path from the internal network.

IPs are DHCP-assigned on first boot, then locked down with DHCP reservations on the FortiGate so they don’t shuffle around on reboot.

Monitoring#

The LGTM stack (Loki, Grafana, Tempo, Mimir) plus Prometheus runs on a dedicated VM at 10.x.x.x via Docker Compose.

Service	Port	What
Grafana	3000	Dashboards and alerting
Prometheus	9090	Metrics collection
Loki	3100	Log aggregation
Tempo	3200	Distributed tracing
Mimir	9009	Long-term metrics storage

Promtail runs as a DaemonSet on all 5 k8s nodes, shipping pod logs to Loki. DMZ hosts also run standalone Promtail agents that push logs across the firewall (there’s a specific policy for DMZ-to-Loki traffic on TCP/3100).

Prometheus scrapes metrics from:

kube-state-metrics (cluster state)
Exportarr instances (application-level metrics)
Node exporters on all hosts
SNMP exporter for the FortiGate

DNS#

Split-horizon DNS via dnsmasq at 10.x.x.x. Two domains:

heezy.local: Internal only. Every host, every k8s service, every piece of infrastructure gets a name. Auto-generated from Ansible inventory. K8s services round-robin across all 5 nodes via NodePort.

yourdomain.tld: Public domain on Cloudflare, but dnsmasq overrides SWAG-proxied subdomains to point at the MetalLB VIP (10.x.x.x) so LAN clients go direct instead of hairpinning through Cloudflare.

Everything else forwards to 1.1.1.1 and 8.8.8.8. DMZ hosts don’t use internal DNS at all. That’s by design.

Infrastructure as Code#

Three repos, three tools, one workflow: edit, commit, push, let GitHub Actions handle it.

terraform-heezy: FortiGate firewall rules, Proxmox VMs, DHCP config. Organized by environment (shared, production, dmz). The FortiGate Terraform provider has some sharp edges (zone names not interface names, auto-assign policyids, NAT required for cross-zone traffic) but once you learn its quirks it works.

ansible-heezy: Server configuration. Roles for everything from baseline OS setup to the full monitoring stack. Playbooks run in Docker containers on a self-hosted GitHub Actions runner. Never run Ansible locally. Never SSH in and make manual changes. The runner at 10.x.x.x handles all of it.

heezy-k8s: Kubernetes manifests. Kustomize-based, one directory per app. Push to main triggers auto-deploy via GitHub Actions. The runner has kubectl access and applies manifests directly.

The self-hosted runner is the linchpin. It sits on the SHARED VLAN with SSH access to all hosts, kubectl access to the cluster, and firewall policies allowing it to reach the DMZ for Ansible deployments. It’s provisioned by Ansible (yes, the runner that runs Ansible is itself configured by Ansible, bootstrapped manually once).

The AWS Bootstrap: CDK First, Everything Else After#

Before any of the automation works, AWS needs to be set up. The Terraform state has to live somewhere. The GitHub Actions runner needs IAM credentials. OIDC federation needs to exist so GitHub-hosted runners can assume roles. All of this is bootstrapped with AWS CDK in a separate private repo.

CDK deploys the foundation:

OIDC identity provider for GitHub Actions (so workflows can assume roles without static keys)
GitHubActions-MultiRepo IAM role assumable by OIDC and by the self-hosted runner’s static keys
productionATerraformStateBackend role for Terraform state access (S3 + DynamoDB locking)
Static IAM keys for the self-hosted runner (zero permissions on their own, can only assume the above roles)
S3 bucket for Terraform state
CloudTrail for audit logging

This is the one piece that lives outside the Terraform/Ansible loop. CDK runs once to create the IAM plumbing, and then everything else bootstraps from there. The self-hosted runner gets its static keys from Secrets Manager, assumes the backend role, and that’s how Terraform and Ansible get their permissions.

Secrets#

AWS Secrets Manager stores everything sensitive. Secrets follow a strict path convention:

production/heezy/<service>/<secret-type>

Examples:

production/heezy/github_runner/aws_credentials (runner static keys)
production/heezy/ubuntu/cloud-init-credentials (VM provisioner creds)
production/heezy/grafana/discord-webhook (alerting webhook URL)
production/heezy/terraform/fortigate/secret (FortiGate API token)
all/heezy/github/runner/personal-access-token (GitHub PAT for workflow triggers)

The production/ prefix scopes secrets to the production environment. The all/ prefix is for secrets shared across environments. Every secret is JSON-formatted so individual fields can be extracted with jq at runtime.

The k8s cluster runs External Secrets Operator which syncs secrets from AWS into Kubernetes Secret objects. Ansible pulls secrets at runtime via the aws_secret lookup plugin on the runner. Terraform reads provider credentials (FortiGate, Proxmox) from Secrets Manager during plan/apply. No secrets in git, no secrets in environment variables on the runner itself.

The credential flow for a typical Ansible workflow:

GitHub-hosted runner assumes the OIDC role
Reads the runner’s static keys from production/heezy/github_runner/aws_credentials
Passes them to the self-hosted runner
Self-hosted runner assumes the backend role
Ansible container runs with assumed role credentials
Inside the playbook, aws_secret lookups fetch service-specific secrets (SSH keys, API tokens, webhook URLs)

What I’d Do Differently#

Honestly? Not much. The biggest lesson was that MicroK8s is great for getting started but has some rough edges at scale (the Calico BPF incident, kubelite bundling kube-proxy, HA failover quirks). If I were starting over I might go with k3s or vanilla kubeadm. But the cluster works, it’s been stable for months, and I’m not about to migrate 25+ services to prove a point.

The other thing: start with proper DNS from day one. I spent months typing IP:port combos before building the dnsmasq setup. Should have done it first.

The Name#

Look, I know “The Heezy” is a dumb name for a homelab. But every time I SSH into a box and see <service>.internal in my prompt, it makes me smile. And that’s the whole point of a homelab. It’s yours. Name it whatever you want.

For sheezy.