Rebuilding my homelab: Suffering as a service

45

Rebuilding my homelab: Suffering as a service devops linux xeiaso.net
authored by cadey 11 months ago (hidden by 13 users) | caches
Archive.org Archive.today Ghostarchive
| 26 comments

1. 16
  
  oz 11 months ago
  
  I’m so sorry. At least I’m paid to deal with this sh*t.
2. 11
  
  weberc2 11 months ago
  
  If you’re a “systemd hater”, please actually give it a chance before you decry it as “complicated bad lol”. Shit’s complicated because life is complicated.
  
  I’m a systemd hater because its interfaces are terrible. The file format is frustrating, the directory structures are frustrating, systemctl is frustrating, journactl is frustrating, the way things are named is frustrating (yes I know naming things is hard, but lots of other systems do better) etc etc etc. It kind of reminds me of Nix where I actually agree with the core philosophy but in every decision concerning usability they opt to do the unfamiliar, surprising, complicated thing such that no one can intuit about anything and everyone has to commit every surprising detail to memory. 🥲
  
  My workaround is to do the minimal systemd necessary–usually that’s a single systemd unit that starts a docker container or a compose stack and then do every interesting thing in containers. I’d need to double check, but my homelab k8s cluster is just ubuntu with tailscale installed and then I use k3sup to install k3s. The only systemd unit there is a thing that automounts my external drives.
  1. 6
    
    icefox 11 months ago
    
    Gonna put you on my list of people to email for contributions when I yak-shave my way down to writing My Own Systemd But Better someday. :-P
    3
    
    valpackett 11 months ago
    
    haha hey, I have plans for a “systemd but better” of my own too… Spoiler alert: I want to do a pure “mechanism not policy” version that does not have any on-disk file formats or anything like that, just a varlink socket for you to fully drive it. So a simple embedded-ish or immutable system (which is what I want to build) could just have a single rc script that just creates the services by writing static definitions to the socket and nothing more. And if someone wants to define a general-purpose sysadminable thing with config formats on top of it, that’d be an external project and not my problem xD
    
    1
    
    icefox 11 months ago
    
    That’s an interesting idea; I think you would still definitely want/need a standard-ish system atop it so that people from different linux distro’s can have a prayer of figuring out what’s going on, instead of having the equivalent of 2000’s-2010’s network configuration files where every single one is gratuitously incompatible. HOWEVER, done right, this could be the layer beneath that, similar to how ninja-build is a “mechanism not policy” layer you can put beneath cmake, bazel, etc…
    
    1
    
    Ambroisie 11 months ago
    
    Mixing Ninja and Bazel? I don’t think I’ve ever seen it, I’m curious if you got mixed up or got sources to look at?
    
    1
    
    icefox 11 months ago
    
    Nah, I got mixed up, my bad.
3. 9
  
  geekodour edited 11 months ago
  
  i recently shifted my entire homelab to nomad+nixos, it was super fun. I referenced Xe’s blogpost on setting up tailscale etc. things are tight! but at times i did think, ah this little part k8s does better than nomad etc etc. but overall for my 1 user setup i am happy with it.
  
  on a sidenote: just wanted to thank xe for the blogpost that are being put out, always a fun read!
4. 6
  
  valpackett 11 months ago
  
  Heh, I’ve been firmly in the “burnt out on sysadmin and would prefer to have exactly zero always-on unix boxes running” camp for a while now (ended up with one not-really-at-home box in the form of a free 256mb fly.io instance that provides a bunch of things inside of my tailnet) but this made me think…
  
  why isn’t there like a “bare metal micro-fly.io” agent for OpenBMC that would, on command (including from an orchestrator like nomad), reimage the box with the contents of an OCI image? With A/B partitions to avoid downtime while the image is being extracted.
  1. 1
    
    jclulow 11 months ago
    
    What large internal storage are you anticipating that the BMC will have access to while the machine is running, such that it can do the A/B thing?
    1
    
    valpackett 11 months ago
    
    Hmmm. Cursed PCIe injection? xD Slightly-less-cursed version of that for NVMe drives that support SR-IOV? Two separate drives for A/B? I’m not actually sure what PCIe capabilities BMCs have heh.
    
    Yeah lol this might actually be better as an ultra-thin hypervisor that wouldn’t virtualize anything except the boot drive…
  2. 1
    
    bryfry 11 months ago
    
    I’m working on that, but not at the BMC level, just a hosted service that installs cloud init images to A/B drives. It’s been a fun project!
    1
    
    valpackett edited 11 months ago
    
    …cloud-init? That’s not an image format though, that’s a YAML config read by an OS on boot?
    
    Anyway yes, it could work as an early-boot micro-OS that applies an OCI image to a partition and reroots/kexecs/etc., but for being able to reboot into a new image on-demand it would have to inject a daemon into the image (e.g. be pid 1, fly.io does force its own pid 1) and the problem with that is there’s no reliability guarantee that being fully out-of-band gives you. Well, combining this with a BMC just-for-rebooting-on-demand would be the most practical way of doing it.
    
    UPD: ohh, it could be an ultra-thin hypervisor that doesn’t virtualize anything except the boot drive! it could then hog a NIC for itself as an OOB (from the perspective of the hypervised system) management interface.
5. 5
  
  pmarreck edited 11 months ago
  
  I had a kid and realized that all of my time to tinker with stuff like this completely vanished.
  
  I am now COMPLETELY a slave to “pay for whatever works right now”
  
  I’m still in mourning.
  
  I have to plug NixOS as something worth learning here however (BEFORE you have a kid). There’s definitely a learning curve upfront but it does pay off.
  
  Which is poignant as the reason my main NixOS tower has been down since November is because I somehow messed up the ONE THING it doesn’t let you roll back from (accidental permissions changes to the ESP) and since I don’t know whether it will take 1 hour or 2 days to fix, I haven’t been able to get to it yet thanks to toddler :/
  
  I know exactly 1 guy who homelabs with kids and it’s only possible because it’s his job
  1. 3
    
    dkasper 11 months ago
    
    I have kids and do some homelab stuff. Spinning up some containers is pretty easy. But I definitely don’t have time to futz around with k8s or endless tweaks or trying the latest stuff right away. I mainly use the relatively stable stuff and run it on a home server with debian, and for the most part it’s set and forget.
6. 3
  
  Corbin 11 months ago
  
  I have no plans to touch my shellbox or the NAS, those have complicated setups that I don’t want to mess with. … I’m also scared to touch the NAS because that has all my media on it and I don’t want to risk losing it. It has more space than the rest of the house combined.
  
  I hear this! I made my local fileserver durable about a decade ago, and the only reason I’m not scared to modify it is because I’ve intentionally taken steps to maintain it. The main concept is to think of the disks as a fundamental unit of storage. For me, this is a RAID array. If it’s encrypted, then the disks and key together are the unit. As long as this unit is intact, it doesn’t matter what boots the machine, only whether the mounted root has mdadm or etc.
  
  My fileserver currently boots off a USB drive. I have a script which bakes the boot drive, including SSH keys and networking configuration, and so I can survive the death of the boot drive. (Yes, it’s NixOS, but you can use anything you like. This server was originally Debian when it lived in my dorm room.) I’ve done the whole distributed-system N+Y analysis to figure out how to recover the homelab from the loss of various machines, including the loss of the fileserver and the typical image-generating laptop; you might not care about this, but it might bring down your level of anxiety regarding fileserver configuration changes.
7. 2
  
  Heloise 11 months ago
  
  Loved to see the Xenoblade Chronicles game reference in the author’s server names.
  1. 3
    
    cadey 11 months ago
    
    Fun fact: when I originally specced out these servers, it was supposed to be three nodes (logos, ontos, pneuma) that were administered by my tower which was gonna have the hostname galea. My tower ended up becoming where I play games and was named shachi (Japanese for orca, the build has black and white themes to it like a killer whale). I ended up getting more in a tax refund and had enough to build four nodes. I ran a poll on Twitter and the majority of nerds picked kos-mos over t-elos and a few other options I don’t remember. The cluster itself is named alrest.
8. 2
  
  symgryph 11 months ago
  
  For a while I used proxmox but I ended up always having trouble every time it was upgraded. So I simply moved over to Alpine with libvirt and qemu and I’ve never looked back. Upgrades are very simple. Just change a simple line in my file and APK upgrade minus minus available. It works really well. I spend very little time on it and it just runs and automatically patches itself. I will admit that I really prefer wholesale internet for most of my remote stuff. They’re cheap and have excellent service. If you are not us-based you might not like them because they are in Kansas. I’ve been thinking about using incus instead, but that’s a project for a different year. I’d rather be out biking on trike recumbent.
  1. 1
    
    icefox 11 months ago
    
    wholesale internet
    
    Oh, they look potentially neat– ahahaha wow I used to work in a datacenter literally a few blocks away from them. Kansas City is one of the big east-west network arteries in the US, if what I’ve been told is true, so it’s a good place for networking infrastructure.
  2. 1
    
    winter edited 11 months ago
    
    I’m surprised I’ve never heard of WholeSaleInternet until now – will definitely keep in mind the next time I need a remote server, thank you!
9. 1
  
  alper 11 months ago
  
  I haven’t had a homelab since university when our flat had an old linux machine that got the outside internet and routed it to the other computers using a coax network cable. That’s been the machine on which I learned Linux network/server administration and Perl programming.
  
  Haven’t had one since. Doesn’t exactly look like I’m missing out… or?
10. 1
  
  metahost 11 months ago
  
  Oh wow, those servers are quite beefy! — I was expecting to head over to the blog post and read yet another article on a couple of raspberry pis but this was a pleasant surprise. Thanks!
11. 1
  
  amontalenti edited 11 months ago
  
  I don’t really have a homelab as much as a beefy hand-built development Linux server, a Raspberry Pi SSH bastion, and UniFi consumer networking gear to tie it all together.
  
  But, I recently helped out an organization with setting up XCP-ng on a spare server, and I was pretty impressed! It got me thinking maybe I should replace my Ubuntu 20.04 LTS bare metal dev server with an install of XCP-ng instead. For the uninitiated, this is a Linux distro based on Centos which is meant to be a simple host for a Xen hypervisor, which you then admin using an HTTPS API, web control panel, and/or terraform/ansible. The control plane stuff, called “Xen Orchestra,” all runs inside a Xen VM (with elevated permissions, of course), too, so it can be easily upgraded and so that the bare metal Linux distro can be kept super minimal.
  
  https://en.wikipedia.org/wiki/XCP-ng
  
  p.s. to head off any suggestions in this direction, I have no interest in a control plane for containers or Kubernetes. I just want a way to get to Linux via ssh and to have relatively direct and uncomplicated access to CPUs, memory, disks, networking, and GPU hardware.
  1. 1
    
    symgryph 11 months ago
    
    I thought Citrix put xcp out the pasture a long time ago.
    1
    
    amontalenti 11 months ago
    
    The Wikipedia page has the blow by blow. As of 2018, the whole thing is fully a community project, after some flip-flopping by Citrix on community engagement.