HACKER Q&A
📣 ozten

What are the most interesting or worst outages you have seen?


Inspired by comments in the Klarna incident, please share your most interesting / worst outage stories from your past.

I'll go first: We had a project to re-write the cart UI of a large e-commerce platform. The project was later cancelled. I requested to de-provision all the new project hardware, but accidentally used the backend infrastructure name due to muscle memory and being tired. Infra techs didn't double check that these hosts didn't take traffic and I accidentally removed all capacity for our shopping cart fleets in all regions :| It took about 45 minutes to recover, I wanted to crawl into a hole and disappear.


  👤 stephenr Accepted Answer ✓
I was the (gov) department tech assigned on a state wide upgrade/merge project bringing a bunch of Novell servers into a new eDirectory tree. The server migration was mostly the responsibility of external sub-contractors, I dealt with migrating all the other on site equipment, a bunch of related prep work etc.

Anyway at a smaller site the two guys are doing their bit. I have a staging server i lug around from campus to campus so I’d usually migrate the classroom and office equipment without needing to talk to them much until we take a break or I’m finished.

This was a tiny campus, i finished really quickly, and went to see how they’re doing.

Turns out they’d managed to skip an important step in the “bible” procedure they normally follow, not changed the netware install type to “custom” and it had wiped all the disks in the server.

I spent the rest of the day with my boss (who happened to come up to the town for the weekend) and her family, riding their JetSki etc while the two contractors restored all the data volumes from LTO backups and then started the whole migration over again.