Cleaner code, proper testing, reproducible and reliable build processes, actually studying and applying CS concepts [0]. I've since come to the conclusion that any place suffering from repeated "fires" is a place that ought to burn down. It's a sign of misprioritization and it will either lead to the early deaths of their employees from the extreme stress or the early death of the company/org from incompetence. Hopefully the latter more than the former.
[0] Concurrency issues. How many places have a policy of just "clear the queue, the data will be resubmitted if it's needed" when the real problem is that they have more data to be processed than systems to process it. Wait, that's not actually the real problem, it's a poorly designed system that's barely taking advantage of the available hardware. I had a system I (thankfully escaped) was partially responsible for that involved millions of dollars in hardware for...200 concurrent users. It couldn't go past that. They'd created so many bottlenecks in their design and dedicated hardware entirely to certain tasks (that used maybe 1% of the hardware's capability) that scaling beyond that was literally impossible. The hardware was more than capable of handling more users (I'd seen less hardware handle more users for more complex tasks in the past), but a lack of comprehension in the design of the system led to this incredible failure. More money was being thrown at it year after year to try and scale it but they were still hitting these fundamental walls created by the design which they wouldn't address or even entertain addressing.
If you’re talking about minor fires (phone calls for network blips or something similarly minor), then yes 20 seems about right.
Tech fires can happen as often as 8-10 times a month in our work environment, which is consulting simultaneously for several clients. It usually happens a few weeks or months after information was not successfully passed into a larger organization and completely routed to the right people. The second most common cause is misconfiguration. We do cause analysis and often find that the person who misconfigured was very surprised at what they’d overlooked. Even the best-intentioned people cannot trust themselves if they’re busy. We find we have fewer issues when engineers are in the habit of including screenshots when they report having done something. The screenshots occasionally prompt a review from a colleague, but more often, the act of taking the screenshot or the glimpse of the screenshot when attaching it helps the engineer catch an issue.
He had 140 open critical tickets in the queue that had not even been touched. That first month was the definition of being thrown to the wolves, but I knocked it out.
Closing that last ticket on day 24 was so satisfying!