Context: one of my teams manages infrastructure for customers, eg. installing Linux in VMs, managing storage, restarting servers, upgrading the OS etc. The per-installation incident count is low, so support engineers don't spend enough time with systems to "get used" to them, but total incident count over all installations is high enough to make me think I oughta do something about it. Integrated monitoring/management automation isn't an option unless the customer pays for it, which they usually don't. Customer installations are highly individualised, too.