HACKER Q&A
📣 speedylight

What is your strategy for debugging unexpected behavior in software?


What is your strategy for debugging unexpected behavior in software?


  👤 takoid Accepted Answer ✓
I usually start by trying to reproduce the behavior. This helps me narrow down the cause and figure out if it's something specific to a certain input or environment. Next, I'll check the code to see if there's anything obvious that's causing the issue. If not, I'll add some print statements or log messages to help me track down the source of the problem. Once I have a better idea of what's going on, I'll try to come up with a solution and test it to see if it fixes the issue. If it doesn't, I'll repeat the process until I figure out what's going on.

👤 PaulHoule
Figure how to make it reproducible first. If you can do that the debugging often takes care of itself.

👤 DamonHD
That's a very broad question!

I guess the first thing to do is work out why it is unexpected, and then find out how long it has been going on for, or if it is new.


👤 red_Seashell_32
Ensure I’ve same or atleast similar data locally, where I can properly replicate it

👤 bjourne
Reproduce, then narrow down.

👤 gregjor
Writing new code requires creativity, iteration, experimenting, testing. In contrast, debugging comes down to pure scientific method.

Start with describing:

- Expected behavior

- Observed behavior

- Steps to reproduce

The bug report forms I use follow that template.

Next try to reproduce the problem. With some bugs you can easily reproduce the unexpected behavior. Others only happen some of the time, or only in production but not in development, or may depend on previous steps the user didn't report because they didn't seem important.

For web applications you may want to try clearing browser cache, using different browsers. You may have to use the same browser and version, or the same OS, as the user. Your instinct about the nature of the bug can guide this. A bug that shows a column of numbers summed wrong probably doesn't come from OS or browser differences, whereas a bug that shows a block of text overlapping an image on a web page may have something to do with a specific browser.

Once you can reproduce the bug you find where in the code the bug manifests itself. The actual problem may happen somewhere else, but you start looking where the bad output got emitted, the log message got written, the application crashed, etc.

You form a hypothesis (or a few) about how the unexpected behavior might happen, then you make changes or log variables/state to test your hypothesis. You work back up the code and into functions and classes, adding logging (or use a debugger if that will work), then reproduce the problem while gathering more information.

Eliminate variables. You want to narrow your testing down by removing parts of the code (temporarily), kind of like a binary search. Try stubbing functions, i.e. returning known valid but dummy values. Try commenting out async tasks to rule out timing or race problems.

If the bug just started happening look through recent commits that even remotely relate to what you see. Very often changes to a code base can introduce or reveal bugs, sometimes far away from the actual changes. Sometimes programmers "fix" something they saw when working on the code and don't mention that in the commit comments. Read the commit history and see if any of them lead to a hypothesis. Try rolling back to previous versions and reproducing the bug -- that at least eliminates a set of changes as the culprit.

Some bugs have obvious causes and simple solutions. Others can take a lot of time and research to track down. Timing-related problems, for example, can look intermittent and hard to reproduce in a controlled (development) environment.

Keep notes while debugging so you don't waste time looking for things over and over, and you can write down additional hypotheses as they occur to you.

Get another set of eyes on the problem. Ask for help.

If you suspect a third-party library or API (a dependency), search for the exact error message/code with Google. If you don't have that try to search with keywords that describe the problem. Common libraries and APIs can have bugs and unexpected behavior too, but very often other users have run into the same problems and maybe posted solutions or at least some guidance.

Have patience. Don't just make random code changes to see what happens. Debug methodically with a plan.