HACKER Q&A
📣 xrd

Best solution for homelab service monitoring?


I'm running a lot of different amazing software tools on a variety of hardware. I've got a bunch of Vagrant machines on my Linux boxen, experimenting with ARM machines on my macOS m2 using multipass and much more.

And, occasionally, one of the disks on my machines fills up. And, then my vagrant machine suspends.

Or, I reboot a machine and the VM does not come up automatically, because I forgot that that service isn't docker with restart set.

What are people using to make sure homelab services are running correctly?

My goal would be something that:

  * Has a simple dashboard
  * Should it be an agent on the VM, or an external process that checks from afar?
  * Good backup and restore story so I can rebuild the service and move to another server if I need to.
  * Polyglot checks: I want to see CPU, memory, disk space usage. But, I also want to check services on the VM, like docker. And, use SSH to do manual checks.
  * Lightweight learning curve
  
I'm using Uptime Kuma for monitoring HTTP and SSH services and I love it. But, with Uptime Kuma cannot ssh into a server, it only checks to see if the SSH daemon is up, and I would love to actually enter the machine and do a health check which could be customized.

Is there a single service that I could use for this? Or, should I wire together a bunch of smaller tools and put them into a centralized dashboard somehow?

I'm looking at dashy and it looks great. But, as of yet, I'm unsure how I can just "add SSH check for this hostname" and "add HTTP status check for this hostname" and "check diskspace for this hostname" and get it working without a lot of confusion.

https://live.dashy.to/


  👤 notsahil Accepted Answer ✓
Grafana might be the right fit for you. It can be easily self-hosted: https://github.com/grafana/grafana And has an agent that can be run on the host to monitor it: https://github.com/grafana/agent And alerts can be set easily!

👤 speedgoose
Personally I use kubernetes, k3s is kind of lightweight, with the Prometheus operator.

https://k3s.io/

https://prometheus-operator.dev/

Kubernetes is not for everyone and is far from perfect but you already use Docker and you seem to seek many features offered by Kubernetes.


👤 torqu3e
TLDR - use Grafana cloud and configure the agent to do the needed things. There should almost be no need to check for SSH access if the machine is up but yes you can process monitor for sshd.

Bit of a longer dump for an answer...

Having been running services at home for way too long now and my day job being running the cloud for large businesses I am tired of special snowflakey prone to breaking hand rolled solutions. My idea of a well run infrastructure is that I should be able to walk away from it hands off for extended periods of time and it just continues running/self heals, to that effect this is what I've come down to:

- 3 node k8s cluster on a bunch of random mini nucs

- Github repo with helm charts/manifests hooked to ArgoCD (runs on cluster) for CD. All changes get checked into repo and auto deploy to cluster. https://www.argonaut.dev/ is an option to not run own ArgoCD

- Grafana cloud free tier for shipping machine/cluster metrics and monitoring. Alerting is via pushover, you can email too

- Uptime kuma on a fly.io free instance for inbound HTTP/DNS/Cert etc. monitoring from the outside hooked to techulus push/pushover for alerting

- Terraform for DNS/cloudflare management via TF cloud offering for automated deploys again


👤 yuppie_scum
Grafana. It’s also a useful career skill