HACKER Q&A
📣 metadat

Monitoring Hard-Disk Drives and Arrays Health in Linux in 2022?


What is the best way to automate monitoring and alerting on the health and SMART status for disks and arrays in a Linux machine?

The most complex case I have is a machine housing 15 hard-disk drives along with 1x RAID-0, 1x RAID-1, and 1x RAID-5 arrays.

Curious what a highly-effective / cutting-edge / best-practices end-to-end setup looks like in 2022.

Thank you!


  👤 toast0 Accepted Answer ✓
Quick and easy, run smartctl [1] at least once an hour. Add up the bad sectors (reallocated, pending, offline uncorrectable) --- alert if it grows by 10 in one day or so, or if the total hits 100 or so. Also alert if any of the other metrics say failed; if you've got a helium drive, there's a metric for that and you might want a threshold, but I don't have enough experience there.

If you really want to spend time on it, you could monitor disk transfer speeds and seek times and alert if the speeds drop or the seek times increase. But I'd guess that's unlikely to be worth the time.

[1] or whatever if the controller gets in the way and you have to use it's utility instead.