This is a repository for my home infrastructure and Kubernetes cluster. I try to adhere to Infrastructure as Code (IaC) and GitOps practices using tools like Terraform, Kubernetes, Flux, Renovate, and GitHub Actions.
My Kubernetes cluster is deploy with Talos. This is a cluster of workloads running on Intel NUC devices with all storage with NFS and ISCSI shared mounted via a Synology Rackstation appliance. I also utilize offsite backups to cloudflare r2 blob storage and take hourly snapshots using [Volsync].
There is a template over at onedr0p/cluster-template if you want to try and follow along with some of the practices I use here.
- cert-manager: Creates SSL certificates for services in my cluster.
- cilium: Internal Kubernetes container networking interface.
- cloudflared: Enables Cloudflare secure access to certain ingresses.
- external-dns: Automatically syncs ingress DNS records to a DNS provider.
- external-secrets: Managed Kubernetes secrets using 1Password Connect.
- ingress-nginx: Kubernetes ingress controller using NGINX as a reverse proxy and load balancer.
- sops: Managed secrets for Kubernetes and Terraform which are commited to Git.
- spegel: Stateless cluster local OCI registry mirror.
- volsync: Backup and recovery of persistent volume claims.
Flux watches the clusters in my kubernetes folder (see Directories below) and makes the changes to my clusters based on the state of my Git repository.
The way Flux works for me here is it will recursively search the kubernetes/apps
folder until it finds the most top level kustomization.yaml
per directory and then apply all the resources listed in it. That aforementioned kustomization.yaml
will generally only have a namespace resource and one or many Flux kustomizations (ks.yaml
). Under the control of those Flux kustomizations there will be a HelmRelease
or other resources related to the application which will be applied.
Renovate watches my entire repository looking for dependency updates, when they are found a PR is automatically created. When some (minor/patch) PRs are merged Flux applies the changes to my cluster.
This Git repository contains the following directories under Kubernetes.
📁 kubernetes
├── 📁 apps # applications
├── 📁 bootstrap # bootstrap procedures
├── 📁 flux # core flux configuration
└── 📁 templates # re-useable components
This is a high-level look how Flux deploys my applications with dependencies. Below there are 3 Flux kustomizations postgres
, postgres-cluster
, and atuin
. postgres
is the first app that needs to be running and healthy before postgres-cluster
and once postgres-cluster
is healthy atuin
will be deployed.
graph TD;
id1>Kustomization: cluster] -->|Creates| id2>Kustomization: cluster-apps];
id2>Kustomization: cluster-apps] -->|Creates| id3>Kustomization: postgres];
id2>Kustomization: cluster-apps] -->|Creates| id5>Kustomization: postgres-cluster];
id2>Kustomization: cluster-apps] -->|Creates| id8>Kustomization: radarr];
id3>Kustomization: postgres] -->|Creates| id4[HelmRelease: postgres];
id5>Kustomization: postgres-cluster] -->|Depends on| id3>Kustomization: postgres];
id5>Kustomization: postgres-cluster] -->|Creates| id10[Postgres Cluster];
id8>Kustomization: radarr] -->|Creates| id9[HelmRelease: radarr];
id8>Kustomization: radarr] -->|Creates| id11[PersistentVolumeClaim: radarr];
id8>Kustomization: radarr] -->|Creates| id12[ExternalSecret: radarr-volsync-r2-secret];
id8>Kustomization: radarr] -->|Creates| id13>ReplicationSource: radarr-r2];
id8>Kustomization: radarr] -->|Creates| id14>ReplicationDestination: radarr-dst];
id11>PersistentVolumeClaim: radarr] -->|SourceRef| id13>ReplicationSource: radarr-r2];
id14>ReplicationDestination: radarr-dst] -->|Depends on| id12[ExternalSecret: radarr-volsync-r2-secret];
id8>Kustomization: radarr] -->|Depends on| id5>Kustomization: postgres-cluster];
id9>HelmRelease: radarr] -->|DependsOn| id11[PersistentVolumeClaim: radarr];
While most of my infrastructure and workloads are self-hosted I do rely upon the cloud for certain key parts of my setup. This saves me from having to worry about three things. (1) Dealing with chicken/egg scenarios, (2) services I critically need whether my cluster is online or not and (3) The "hit by a bus factor" - what happens to critical apps (e.g. Email, Password Manager, Photos) that my family relies on when I no longer around.
Alternative solutions to the first two of these problems would be to host a Kubernetes cluster in the cloud and deploy applications like HCVault, Vaultwarden, ntfy, and Gatus; however, maintaining another cluster and monitoring another group of workloads would be more work and probably be more or equal out to the same costs as described below.
Service | Use | Cost |
---|---|---|
1Password | Secrets with External Secrets | ~$65/yr |
Cloudflare | Domain and S3 | ~$30/yr |
GitHub | Hosting this repository and continuous integration/deployments | Free |
Fastmail | Email hosting | ~$20/yr |
Pushover | Kubernetes Alerts and application notifications | $5 OTP |
UptimeRobot | Monitoring internet connectivity and external facing applications | ~$58/yr |
Total: ~$20/mo |
In my cluster there are two ExternalDNS instances deployed. One is deployed with the ExternalDNS webhook provider for UniFi which syncs DNS records to my UniFi router. The other ExternalDNS instance syncs DNS records to Cloudflare only when the ingresses and services have an ingress class name of external
and contain an ingress annotation external-dns.alpha.kubernetes.io/target
. All local clients on my network use my UniFi router as the upstream DNS server.
Device | Count | OS Disk Size | Data Disk Size | Ram | Operating System | Purpose |
---|---|---|---|---|---|---|
Intel NUC7i5BEH | 3 | 512GB NVMe | 32GB | Talos | Kubernetes Controllers | |
Intel NUC7i5BEH | 3 | 512GB NVMe | 32GB | Talos | Kubernetes Controllers | |
Intel NUC9i7BEH | 1 | 512GB NVMe | 64GB | Talos | Kubernetes Workers | |
Synology RS2423RP+ | 1 | 2TB SSD | 8x6TB HDD | 32GB | DSM 7.x | NFS + ISCSI |
Synology DS1511+ | 1 | 2TB SSD | 8x4TB HDD | 16GB | DSM 6.x | NFS + Backup |
UniFi UDMP | 1 | - | 1x8TB HDD | - | - | Router & NVR |
USW Pro 48 PoE | 1 | - | - | - | - | 10Gb PoE Switch |
USW Flex | 3 | - | - | - | - | Distributed PoE Switches |
CyberPower PDU41001 | 1 | - | - | - | - | Server Remote PDU |
APC SMT1500RM2U | 1 | - | - | - | - | UPS |
Thanks to all the people who donate their time to the Home Operations Discord community. A special thanks to onedr0p for the inspiration, templates, and support. Be sure to check out kubesearch.dev for ideas on how to deploy applications or get ideas on what you could deploy.