When joining a new company, 2 months in, and facing a problem where code wouldn't compile, I dig deep into the codebase but for the love of me, can't find a solution.
I wonder, is that normal? Or how can I say: "I don't care, if I am alone, I can figure this out and solve it."
Did anyone here evolve from this state of reliance of others and turned themselves into a "I can do it myself in a good amount of time"?
* Split the problem space. Payment messages are not getting to the slack channel. Well, is it a bug in receiving payments, or a bug sending slack messages? Check if the payments are hitting the database. If no, you know it's on the input side. If yes, you know it's on the output side.
* Explain the problem to someone else; if you're alone write it up like a question you're going to post on a forum, with as much detail as possible. This must engage a different part of the brain, because often I'll figure out the issue while writing it up I'll reveal some clue while logging example output to add to the post.
Longer-term, teach a programming class! You'll get very good at debugging issues because you'll encounter a lot of other people's bugs, and you'll get a feel for what causes particular failures.
Also, don't feel any shame in asking others for help. Even senior devs get blocked, and they ask other seniors, or juniors, or if juniors are not available a rubber duck will suffice.
Keep a debugging journal. Take notes on every step you take and its results. It's easy to go in circles because you forget what you tried or forgot some detail of its outcome. Seeing a summary of what you already know helps you rule out possibilities and inspires new ones.
I often forget to do this or feel like "I can handle this bug without a crutch". Yet every time I actually journal the process it's helpful.
- People look down on console/logging, but it's useful specifically when there are race conditions or too many variables. It's way better to have several outputs and logs rather than a "standstill" picture you can only look while using the debugger.
- "Learn to debug" - This advice is thrown too vaguely around, but I'll tell you that the 2 essential pieces that helped me debug are 1- using watches to keep an eye on variables, props that are relevant to the problem 2- Use conditional breakpoints - About the 90% of the people I've paired programmed with don't know about it or even if they know it exists they don't use it and when I put in place some conditional break points they look with awe at how it can make a change.
- A somewhat counter intuitive or controversial advice: read the code involved and look for inconsistencies, dumb scenarios, flags, awful named variables and clean them! - Sometimes my attention span and memory range is entangled with garbage code that makes it harder to reason about. By throwing away the pieces I don't like and improving on them I get the benefit that in the future it will be easier to maintain and reason.
Have a hypothesis and try to prove otherwise.
Start cordoning off parts of the codebase where the problem ISN'T.
Leave printf breadcrumbs to trace execution.
I find my less capable colleagues make the mistake of false assumptions, like some subroutine is executing or what state an object is in without proving it to themselves. That keeps your investigation from proceeding without luck on the larger problems.
"You need a few more years of study before you can fully understand. But! You have a 600 page textbook. Do every exercise in that book. When you can do that without assistance, get another 600 page textbook and repeat."
He was trying to get us to build an intuition for certain class problem solving, while at the same time saying "Shut up and calculate".
I find that problem-solving in the programming space is the same. Just keep doing things. You'll develop an intuition for it eventually.
What also helps is just shutting the computer off and going for a walk, generally the moment you step away you'll figure it out!
The very first thing should probably be to carefully check which commit introduced the error, and then carefully study the code changes in that one for clcues.
After that, and possibly based on that, what probably most people will do is start their debugging process by running a debugger through some suspicious parts of the code, based on varying degrees of well-informed suggestions.
For problems that escape those first tries though, what you often need to do is to start a systematic process of ruling out possible sources for the bug. In many ways this resembles scientific studies where you try to control as many variables as possible, and also include control samples with known states, trying to zoom in on only the particular variable you are studying without noise from other things.
That can mean feeding the system or code under study with carefully set up data for which you know what the effect should be, and then carefully trying to change each part of it and observing the outcomes. Things like that.
In my experience, this type of effort will eventually most often lead to the solution. The main challenge I think though, is to realize how deeply you might need to go with the systematization and automation of things, to really rule out possible sources and start zooming in on the general area around where the bug is. You might need to take some real drastic measures, and this is where I see most people who fail, don't go far enough. Here you might need to really get away from the screen to get your thoughts flowing more freely ... but not in an undirected way, but rather trying to answer the question "How can I do this in an even more systematic way ... to rule out even more possible sources, or identify unexpected behavior".
Not super easy to put into words, but this is in my experience the way to go.
Finally, one caveat is that there are certain things you should probably check before even going the systematic path. Things that can totally screw things up so that whatever you do, you never get any systematic pattern of behavior or behavior change. These things often are related to caches in various form. Make sure to turn off any and all kinds of caches in the system. They will almost guaranteed drive you insane otherwise.
Debugging has been one of those for me.
A long time ago I helped my mother with a murder mystery by researching how doctors diagnose things (I was a professional information broker then, with Dialog, Lexus/Nexus, GratefulMed, etc.), as much as I could without going to medical school. I learned a lot about differential diagnosis. That got me hooked on medical shows, where I learned a little more. At some point after that I wondered if I could apply differential diagnosis methods to debugging. Because there are often multiple possible causes for a bug, I found the differential diagnosis approach to work amazingly well for me.
I call this process D3 (Differential Diagnosis Debugging).
Below is roughly how I apply it. I am working on a book including this as a couple chapters, but that won't be out for at least another year. The material in this post is in the book, so I am told I must copyright anything smacking of an excerpt.
First, capture all the relevant details of the expected behavior. Create a unit test (or tests) to confirm the expected behavior.
Next, capture all the differences between the observed behavior and the expected behavior (the 'symptoms').
Then, examine those differences to come up with possible hypothesis about the causes.
After that, use a concept similar to Karnaugh Maps [1] to determine a sequence of small discrete unit tests whose truth (if true hypothesis could be true) determines a T or F for each hypothesis. If you wind up with more than one T then you need more tests (diagnostic testing).
Once you have a confirmed hypothesis, apply a fix an rerun all your tests. Rinse and repeat as needed, if needed (treatment), until all of your expected behavior tests pass.
Unpublished Work © Copyright 2022 William A. Barnhill, Jr. Some rights reserved. You may apply the D3 process as described herein; you may not incorporate the D3 process into a written work, a web site, or an email; you may discuss the D3 process if full attribution to the author is given.
Please don't hate me for the above folks. Been told I need to include that if I want to get published.
Write down a list of hypotheses. Then write an experiment to test it.
Sometimes there's this intuition to comment out a block of code, run, and then comment out another block of code. Or revert to something before a bug happens. That's all fine, just make sure it's linked to a hypothesis or multiple ones.
Or sometimes you don't have a hypothesis. This is where the scientific method is also useful - you know what you don't know!
Then you use the Monte Carlo method, or as I call it, throwing darts and seeing if it hits a bug. Basically you slice out random blocks of possibly offending code, compile, slice out more or less, compile, until you narrow down an area.
From that area, you might formulate a hypothesis. Or you may need to throw more darts until you see a pattern.
Scientific method is bloody slow but you'll get to the answer eventually. It's not for everything.
It's a mix of search space reduction + heuristics. You start looking for most probable areas of fault. This includes new commits. Then you try to partition the places where the bug could be by probablistic reasoning, logging as you go.
If these approaches don't work then you start questioning your assumptions. For example, if there is eatApple() method, what did you assume about this method?
Did you assume all apples are rex, or is the functionality actually eatFruit()?
* When did this start happening?
* What changed between when it was working when it stopped working?
* Have you asked person Y who worked on that change?
This advice would be very obvious to many, but I'm constantly surprised by how many of my coworkers don't know how to do this and test code by pushing logging statements into a production environment.
tl;dr, when debugging people naturally form a hypothesis about what is going wrong and then set about gathering evidence to support or refute that hypothesis. When you do so unconciously you are very prone to biases of all sorts, most importantly confirmation bias.
When you do so mindfully, and Write It All Down then you are much more likely to a) come up with an evidence gathering exercise which will usefully falsify your hypothesis, and b) respect the gathered evidence, reject the now false hypothesis, and move on to a new one.
They get triggered more often than you'd think.
The most important thing is understanding the problem itself. So in the case of compile issues, rust for me has just blown me away. But I won't harp on.
There are times too where it's just not clicking and you shouldn't be afraid to step away from it for a bit. I like taking a walk. But something to step away from it.
I do have crazy lows when I'm just stuck and feel so stupid at that moment. But solving it then gives that addictive high.
As for tools and those obscure errors that are impossible I'm always reminding myself of the X Y problem and to make sure I'm not getting trapped in it!
Power through though, you're not alone and all the problema you've solved you'll remember!
One time I solved a very complex circular dependency issue this way. I think just writing down thoughts and process in a very human way can help with technical issues.
Short term - while debugging a problem: 1. Gather whatever information you can. Look in the logs! 2. Debugging means tracking the flow of execution from beginning to end (and then back to the front in many cases). Where along that chain is the breakdown? 3. Often you can stop after 2, since when the point of failure is determined the issue becomes obvious. However, in some cases it is not obvious. In those cases, you have to look for 'something that does not make sense' or can't be true.
Case study: timeouts We started having errors on a service we owned. Calls would be made to our service (call it service A) and then calls to an upstream service (service B) would time out, and we didn't catch this exception so the request to A would end with a 500 response. However, when we looked at the logs for B we could see our requests from A and that they were not timing out at all, they were taking the usual amount of time. This is a huge contradiction! After a day of poking around we could reproduce the timeout with a script which called our service. Thinking it might be the load balancer, we started sending requests to the IP's of workers directly, which never timed out. On a hunch, we sent requests to the IP of the load balancer (this can't be done directly since the LB uses the hostname as part of how it routes requests to servers, but we could do this by adding the LB with the proper hostname to the /etc/hosts file.) Low and behold, the timeouts were still gone even though we were doing 'the exact same thing' as before! Removing the entry from the hosts file, the timeouts immediately came back for the same small percentage of requests.
Long term: Being good at debugging really consists of two things, knowing how things work and spending the time required to debug things. Interestingly, one great way to learn how things work is to debug things. Whenever something does not make sense, or you can't explain it, dig in and get to the bottom of it. Every time you do this you'll learn things that you never would have guessed, and your ability to debug some random future issue will go way up when that future issue just happens to intersect something that you had to dig into previously.
Knowing who knows what, and who can help, and asking for help the right way is key to getting better at problem solving.
Perhaps it is the school system that discourages problem solving through social means, and people feel guilty about taking help.
However, consider this perspective. What if the ones who could put together people into functional groups could be considered great problem solvers? Consider any of the great American enterprises, there are people at the top who figured out how to put people together to get things done.
Another approach is to use tools (system stats, logging, debuggers like rr/pernosco, etc).
Learn as much as you can about the system and tools you're using.
Reduce the size of the problem (reduce code, isolate, etc).
Read the code, and try to break it. Build scenarios 'this thing I see in the log can only happen if...'
Finally, there's no shame in rubber ducking a problem with a colleague!
Here's a trick you might find useful when you are really stuck and don't see any path forward: change some random related things in the code. If the code does not respond to those changes as you expected then that is a clue! There is something there you don't understand and the unexpected behavior is a clue to it. It's kind of like writing unit tests, checking for what you expect to be true.
Keeping a journal is essential also.
If the problem is too obscure, I 'just' rebuild the system like I imagined it.
But I don't think everyone is as suited to this form of tackling problems. You might get more enjoyment out of creating new things?
Comment out or delete the code until the fundamentals of the system works. Then add stuff back until errors happen. Also add better logging to track down the problem. Tear down the system and build it back up. Delete nonstandard approaches you can’t figure out and use standard ones from QuickStart tutorials. It might be slow progress but progress will be made.
Also, print statements or use a debugger to confirm assumptions. (I'll be honest, I rarely bring out a debugger unless a print based workflow is time consuming enough to justify remembering everything about the debugger).
1. Recreate problem in prod - How was it triggered? I did a bit of digging and noticed that it happened when very specific conditions occurred.
2. Recreate it locally - Got an error message, i.e. `SomeVar is not defined`.
3. SomeVar - Why is it `undefined`? I started working backwards and realized that the `SomeVar` functionality was working as intended, but it's what was being fed into it that was the issue.
4. Working backwards - I had a hunch as to why the error was occurring when I started, and when I worked backwards I realized that I was partially correct.
5. More research - In regards to point 4, I learned what else was missing (i.e. why I was partially correct about my hunch).
6. Start coding - Since the issue was in prod, I put in a pretty shoddy hotfix. It worked, but it tacked on some logic into code that was already pretty confusing (due to a lot of conditional scenarios in the UI).
7. PR - I opened a PR, but to my Sr Dev's point the solution worked but the code wasn't clean.
8. Sr Dev chat - We had a quick chat, and even though the issue was in prod only a few people had it happening to them (the feature wasn't widely used). Also, unless we were actively losing money due to a prod issue, a hotfix isn't needed and we can take the time to write cleanly.
9. Coding - I realized that there was an even easier way to write the code without introducing confusing logic. I scrapped about 95% of what I had and put in something that not only worked, but was much cleaner. I also wrote some tests, as per the Sr's recommendation.
10. Updated PR - I followed-up with the Sr, who thanked me for revising the code and for writing the tests.
---
Although I reached the original solution and the new one on my own, having discussions with the Senior Dev was incredibly beneficial.
Actually I think I enjoy problem solving more than the actual programming. Though frankly programming and problem solving are so closely related they are almost the same thing.
I can solve most problems I come up against day to day. I guess I can't solve them all because I do ask questions on Stack Overflow sometimes.
I'm not sure why, I guess it's a few things:
1: As I said, I enjoy problem solving - it feels like a game
2: I have more than 35 years hard computer problem solving, so there's alot of experience which helps.
3: I understand, and make a deliberate effort to understand, as much as I possibly can about every aspect of computer systems from hardware to operating system to database, to cloud to front end to back end.
4: I use the tried and tested problem solving method of divide and conquer - when you have a problem, break the software in half and keep doing so until you find where the problem is.
5: If you are still stuck, try to create a minimal test case the demonstrates the problem - this often solves it, and if it doesn't then you can post the minimal test case on StackOverflow.
6: Only ever work on one thing at a time.
7: Work hard to learn how to use problem solving tools.... the debugger in the browser, make sure you know a little strace for doing things like tracking what file the operating system is trying to open, ngrep so you can watch data passing over the network.
8: Be relentless. Sometimes I think this is a personality "issue", but I literally will work for 10 hours at a time trying to solve a problem until I crack it. I really don't like finishing work without having all the days problems solved. I just grind and grind and grind on problems until I work it out.
9: Have a really thorough knowledge of the technologies you are working with - don't be satisfied with learning as you go - actually take the time to read the language specification for the languages you program with, stuff like that.
10: Use a really good IDE and make sure you know how to use its ability to jump through the code, to jump to function definitions. You need to be good at following the logic of the program.
11: And finally, if no matter what you do the error remains the same, then you are probably not even working on the correct code.
In the case for example of why a program won't compile, well just start cutting code out until something compiles, then add it back in little by little until you work out where the line is that fails. Even better is to use appropriate debugging tools, but divide and conquer always works.
There's also no shame in scattering these everywhere you suspect the problem might be:
print('got here!')
unless you didn't even do any changes, and it's a matter of the codebase requiring a specific environment or option to compile and you not having/knowing it
on getting better at debugging, I would try to isolate possible sources of the problem, form a hypothesis, test it experimentally
you'll need to know what you know
is it the environment you have? networking? OS? etc., whatever applies to your system
I think having a known working state is probably the most useful, though it might be difficult to get
something to consider, I do remember there being a paper on getting wrong results even when doing nothing wrong
That said, you mention not getting your code to compile. I've never (at least not in recent memory) not been able to get my code to compile by myself after a while, but I also don't know what language you're working with. Are you struggling with the type system? That can usually be tackled by taking expressions apart, assigning them explicit types and then reasoning through the inputs and outputs (with pen and paper if need be).
Use version control. If the order processing worked last week and johnny made a change to order processing in this release it's highly likely it's johnny's change.
You make an assumption of where the problem is and start looking. Always understand your assumption can be wrong and you are looking in the wrong place.
Avoid code with side effects. I've taken code with bugs in it and completely rewrote all of it without side effects just to avoid those types of bugs and magically all of the hard to find bugs disappeared.
Instead, I've found much more deliberate practice and therefore value from reverse engineering: just like in debugging, you want to understand the underlying logic through program analysis.
You can pick security CTFs, crackmes, malware samples, proprietary software with bugs you want to fix, or functionalities you want to add, or even games where you want to find some hidden content, extract some resources, or modify some behaviors. In all that you will find something of your interest, and apply approaches that I think translate well to software development:
* Differential tracing: Want to know how some action reflects in the codebase? Take k instruction traces where you do everything except that action, then take a trace where you do that action, find out what are the unique differences in that last trace. Want to know what data gets written? Same approach with memory dumps and breakpoints. How do different inputs affect these changes? You will learn to be methodic and throughout in what you test and log.
* Recognizing patterns: Sometimes you don't have symbols for your functions, are you able to identify printf at a glance, or will you waste time following the logic of the function? Do you see relative offsets being used and recognize accesses to an array? With source code all this happens on a more macro view, such as algorithms or design patterns hidden in all those coupled functions and classes. But the micro view also applies: figure out what constants relate to specific functionalities, and you can grep your way to relevant functions or documentation.
* Avoiding boilerplate: This follows from the previous point, since you want to recognize the flow of data through the codebase, in order to have some call hierarchy to follow, otherwise it's easy to waste time on functionally that is irrelevant to you. Start with how data enters the application: stdin, files, database connections, http endpoints... Tests, examples, or client apps will also help here.
Oh and don't worry about learning some assembly language, just make that investment, since that's the straightforward, well defined, predictable part.
It's hard to debug a problem if you have no idea what most of the code is supposed to do. You can go for speed - asking colleagues that do know - or you can just start following the code until you understand it. That will take more time of course, but in time you will start having enough knowledge of the code base that you can just start throwing guesses when there's a problem.
My 2 cents in case i read your question right. If i haven't, there are a few answers describing debugging techniques :)
Of course I've seen people who could pull out the same without having that much time. I respect and admire them.
Sometimes I allow myself to be annoyed that the code isn't working and that is the wrong mindset.
Everyone will experience problems but suffering is a choice.
Divide and conquer.
And define things out of existence.
I’m grunching this thread. But I have 20ish years of experience and consider myself a decent debugger and considered enough so by others that I’ve been asked this line of questioning while in a mentorship role on a number of occasions.
The number one thing is simply humility; asking the dumb question in the smartest way you can, as quickly as possible, to the person who can most likely give you the best answer.
As you’re learning and starting out, you may not have great resources for such, but within most orgs you should be able to find where the answers are, and first ask the dumb questions (caveated by what you do understand), then eventually ask for a brain dump.
It is through the accumulation of tons of disparate idiosyncratic knowledge that little clues and common patterns emerge that allow people to sniff out a root cause hypothesis before the facts are even in, and often be right.
But people who hide their ignorance, refuse to ask questions that may make them look silly, refuse to be the idiot in the room, they learn much less quickly because they don’t get unblocked as quickly.
get good at finding and reaching out to contractable experts for rabbit-holes, whether for the spot-work or to pair on solving with their expertise
Rubber duck it.
Never quit.