HACKER Q&A
📣 sva_

What is the gnarliest bug you have ever encountered?


What is the gnarliest bug you have ever encountered?


  👤 LinuxBender Accepted Answer ✓
Adding a few because what I think is the gnarliest and others think may vary.

One was a quirky GBIC on an old Foundry router that would corrupt specific bit patterns and more specifically would cause downloads of Microsoft Word .doc files to get corrupted, specifically from an old Xerox docushare server. It took a while to convince the network team to replace the GIBC. Around the year 2000

Another was a bad NIC. The firmware went offline so to speak and started looping/spraying pieces of VLAN packets at the network at a rate I did not know was even possible. That single NIC took a massive router offline.

Another was a coding bug in an Ericsson mainframe that required me to telnet in from my Nokia 9000 and reload the mainframe, taking northern California off their cell phones for ~40 minutes. late 90's

There was a CIFS bug in RHEL4 that would kernel panic under certain conditions when the NetApp export was in mixed mode meaning the export supported both NFS and CIFS. Emailed the kernel dev and he fixed it when he got home from bicycling with his kids.

Another was a Solaris 8 server in California having slow transfer rates to England. It was a GRE tunnel bug. I had to convince the network team to encapsulate my traffic into a VPN inside that GRE tunnel. They were convinced it would go slower. Went from ~80kbps to 43Mbps and only hit that limit because the England side was a DS3.

Many other fun ones were all the times someone would come up with the idea to shim a proxy usually a web server between services and would not factor in SO/http keep-alives or lack thereof and thusly would not factor in source port depletion, TCP timeouts, etc... Those are more people issues than technical issues.

I could probably keep going for a few hundred pages.


👤 not_your_vase
Was a bug in an AM335x SoC, from TI. Our HW engineers accidentally missed an errata, which I managed to re-discover and track down independently. Once in a while it caused a CPU timer to get "stuck", essentially halting most of the operating system - due to an improperly grounded crystal. Sometimes it happened twice a week, sometimes it was hiding for half a month. I pulled out half of my hair by the time I managed to get to the bottom of it, it took months.

👤 h2odragon
I dont recall the details anymore, but there was a nasty one with some 75Mhz SPARC cpus: If your code ran too tight a loop you could burn out the i-cache and ruin the CPU. I think the fix, for the circumstance I hit it in, was to make sure my innermost loop only ran for like 64 iterations before doing something else that flushed the icache.