Documentation and Automation: Repo as Blueprint
I treat the homelab repo as the single source of truth. Everything that can be described or configured in files lives there, with placeholders for secrets. Here’s how that works and what I run to keep it honest.
Why it’s worth it
- Recreate from scratch. If a host or VM dies, I have the intended state in the repo: topology, host roles, nginx config, keepalived config, Docker Compose files, scripts, systemd units. I substitute real values from a local file and deploy. No “how did I set that up?” archaeology.
- See drift. If I change something by hand on a server and forget to update the repo, a comparison step will show the diff. So “reality vs repo” is something I can check.
- Safe to share. The repo has no IPs, domains, or keys in committed files. I can push to a remote or show snippets without leaking anything. Colleagues can clone and fill their own values.
Placeholders and values
Every doc and config that needs a hostname, IP, domain, or path uses a placeholder like <HOST_01_IP>, detellem.com, <NGINX_VIP>, <SUBNET>. The names match keys in a values file. I keep one file, values.yaml.local, with the real values (gitignored). No other file in the repo contains those values.
When I need to:
- Generate docs for local reading (with real values filled in), I run
make generate; it reads the values and writes substituted docs into an output dir (also gitignored). - Generate configs for deployment, I run
make generate-configs; it writes substituted nginx, keepalived, Docker Compose, scripts, systemd units, etc. intodist/. I copy fromdist/to the servers. No deploy script in the repo—just “copy these files to these paths,” documented per component. - Validate before commit, I run
make test. It loads the values file and scans every committed doc and config for any of those values. If one appears (e.g. I pasted an IP by mistake), the script fails. So I don’t commit secrets.
Audit: does live match the repo?
I run an audit script that SSHs into the VMs (and optionally the router, APs, TrueNAS, Windows) and:
- Checks that services are running (nginx, keepalived, Docker containers, Pi-hole, Plex, etc.).
- Optionally compares live files to repo-generated files: e.g. nginx.conf and reverse-proxy.conf on both nginx nodes vs the configs in the repo (after substitution). Same for keepalived, scripts, systemd units, SSH config. Any difference is reported as “config drift.”
- Optionally checks that device state (router DHCP/NAT, AP config, TrueNAS datasets/shares/apps, Windows hosts file and VM list) matches expected files in the repo.
So “make generate-configs, then compare live to dist” tells me if I’ve edited something on the server and not updated the repo (or the other way around). I run a “thorough” mode periodically; it takes a few minutes and spits out a report. Drift isn’t automatically fixed—I either change the server to match the repo or update the repo to match the server, then re-run.
What lives in the repo
- docs/ — Network topology, firewall, DHCP, DNS, domains, host docs (hardware, roles, VMs, services per host), service docs (nginx, Pi-hole, Plex, Bitwarden, Mealie, etc.), procedures (SSH setup, VM creation, adding a service, SSL). All with placeholders.
- configs/ — nginx, keepalived, fail2ban, router expected state (for comparison), CAPsMAN expected state, TrueNAS/Windows expected state, Docker Compose files per service, scripts (nginx-sync, sync-certs, certbot, DDNS), systemd units, SSH config template. Again, placeholders only.
- scripts/ — Validation (no-secrets check), values parsing test, and anything else that doesn’t need to be on a server.
- audit/ — Load-values, generate-docs, generate-configs, and the audit script that SSHs and compares. Reports go to audit/reports (gitignored).
No Ansible or Terraform; just “generate from template + copy + reload” and “audit to compare.” Simple enough that I can fix it when it breaks.
Workflow in practice
- Edit a doc or config in the repo (with placeholders).
- Run
make testso I don’t commit a value from values.yaml.local. - When I’m ready to deploy that config, run
make generate-configs, then copy the relevant files fromdist/to the right hosts and reload (e.g. nginx, systemd). - Periodically run the audit (quick or thorough) to see if anything’s out of sync or broken.
That’s it. The payoff is that the homelab is describable and checkable; the cost is keeping the repo up to date when I change things by hand (or changing by hand less and deploying from the repo more).