2. If the project uses a version control system (Git, Mercurial, Subversion, etc.) then take a look at the most recent additions, modifications, and/or deletions in the version control log (git-log, or whatever you want to call it). Sometimes, the most relevant files in a project are the ones people modify the most… obviously, ignore files associated to third-party dependencies (vendor, node_modules, that kind of stuff).
3. Install a Language Server Protocol (LSP) server [3] with support for the programming language(s) that you are going to use. Configure your favourite code editor to take advantage of as many LSP features as possible, with enphasis on “Jump To Definition” and “Find References” [4].
Tell us what programming language(s) is the project written in to give you more suggestions.
[1] https://en.wikipedia.org/wiki/Grep
[2] https://github.com/BurntSushi/ripgrep
[3] https://microsoft.github.io/language-server-protocol/impleme...
- Pull up the commit history for the file to see what other files were modified along with it the last few times. This will give you dependencies and linkages.
- Make your change and then ask your ide/command line to find all typing/lint errors in your project which will help you find other dependencies you may have missed.
- If you get stuck, reach out to the authors or reviewers of previous PRs. (Hint: you may want to include them as reviewers. They'll give good feedback and you'll engender good will by keeping them in the loop when you touch their corner of the codebase)
- Write a few solid unit tests. Maybe even clean up the testing code a little bit while you're there.
- Write a concise but informative description of your changes in your PR. If you made two or more logical changes, split your PR and stack them. Your teammates will appreciate the shorter PRs and you will get feedback more quickly.
- Land the PR in a timely manner and keep an eye on it until it hits prod.
- Once in prod, test it yourself and keep an eye on the logs for a day or three.
- Bonus: put all changes behind feature flags to do slow rollouts and so you can quickly revert without waiting for a deploy. Make a task to remind yourself to remove the dead code behind the flag in month or so when you're pretty sure it's stable.
What happens next will be an iterative process.
Do some initial investigation - don't waste time figuring things out, we just want a set of questions and areas to explain at this stage.
Ask for some help from the team - perhaps 30-60 mins, perhaps with different people to cover off the topics. It's essential that the team contribute - firstly they owe it to you, secondly only they know the background. Which bits of the codebase were experimental, which bits are ancient relics being removed, what's the direction of travel, which parts have subtleties. "Unknown unknowns" (to you).
Now "pay it forward". Ensure that the next person to join the team has a better time. Maybe this is writing some documentation - be it architectural, a glossary of domain-specific terms, updating/culling outdated docs. Maybe parts of the codebase could do with some re-organising or renaming to make it more self-evident. Perhaps the build/test process could do with some care...
I've done this several times, and after 3-4 hour sessions with "the one person who understands" (and a few hours writeup), we now have good documentation, better standards and multiple people including new joiners who understand. The guru also appreciates having more people to bounce ideas around with and no longer being overloaded with this work.
But as you do this, keep an eye out for assumptions that may have changed. A feature now obsolete requiring weird code. Out-of-date assumptions about the behaviour of computers or other systems. New language features that can simplify or improve code. Talk them over with the people who know the code, and maybe you'll be the one to delete that awful code everyone hated.
Also, take notes, not just about the code, but about its environment, release process, surrounding systems, use cases, and people. Knowing who to ask about a given issue is gold.
If there are post-mortems available, they can give a great insight into how the system works and fails. Design docs to a certain extent, too, but they can be misleading especially if they are not kept up to date.
Pair programming can be a very effective way of learning, too.
- Don't try to learn domain knowledge from the code. I you need to get familiar with code of a JPEG decoder, learn how JPEG works first.
- Before reading code, make sure you can jump to definition and find references. The easiest and language agnostic way is to use ctags and grep.
- Start reading code from main() (or an API call in the case of a library) and then start to dig deeper. This way will get a feeling where the "important" code is"
One of my favorites is “how did you know to do that?”
For example, let say I'm working on an ecommerce system and I try to understand what happens when a buyer adds an item to its cart. I'm going to put logs / debugger's prompts on the important steps of that operation. Next to those, add a comment that describes why this step is important. The important thing is to label the comment with incrementing numbers. Those numbers allow me to keep track of the order of execution.
Finally, commit this to a dummy branch and share it with your co-workers if both of you are new to this code base.
That's one of the first thing I do when I jump in a new codebase. Pick something that interests you, and log the whole operation.
Find large utilities that are not coupled with a particular part of the application, and put them under heavy testing. Do not fix any failing test before having completed the test suite.
If you have access to other developers that know the codebase, review the tests with them, and fix failing tests together.
Do the same for the frontend / interface code, but do not go for unit testing imho, go for visual regression testing.
And the end of the process you will have a very large knowledge of the codebase, and you will have improved it at the same time.
Make sure you're able to efficiently navigate through the code. If your IDE supports jumping to declarations and usages, make sure you get working.
Look at old commits to see how changes were added before, that will hint at how future changes should look.
Ask others for help if something doesn't make sense.
If it’s not already TypeScript, add a tsconfig.json, with allowJs: true, checkJs: true. You’ll probably need some tweaks from there. But merely adding the config is enough to kick VSCode or other language service providers into finding references and a lot more.
Apart from that:
- Keep notes as you uncover how things flow and interact.
- The best way to keep those notes is as part of your project’s documentation. Use JSDoc and TypeScript declaration files. Learn how these docs interact and aid navigation.
- @see is a particularly useful JSDOC tag!
- Speed up the build if it’s slow. No really, it’s going to help with discoverability in the codebase! Breaking flow to wait for a build is a sure way to lose track of your journey.
- Backfill tests before you touch implementation. Even if you suspect the tests might be redundant. Odds are they’re not, but even if they are, (1) that’s more documentation you can reference and (2) any other maintainers/contributors seeing redundancy can use that as a signal to shortcut your familiarization.
- Git blame is your friend. Some days it’s your best friend.
- Look in weird places! Sometimes that old version of a dependency is pinned for a reason. Sometimes it’s pinned to a fork, again for a reason.
It is a fine line between asking for help coming up to speed on a codebase, and asking for people to hold your hands while you do your job. Your team will help show you where they draw that line (each team is different.) But be aware of it because I have seen people fired for landing on the wrong side of the line.
I'm not saying to be afraid to ask for help - I'm saying to be sure to focus that help on learning and understanding, not trivia.
FWIW, I'm fully on board with all the answers that say, in short, "Search"
Start by adding breakpoints for some key actions of the app. Then step through the flows with the debugger. First pass you can just step over functions to get the high-level idea. In subsequent passes, you can step into functions that seem important. Rinse and repeat until you understand those actions well. Then move on to other areas in the app.
This works because you can see the actual end-to-end execution flow (not always clear from reading code), inspect runtime data (impossible by reading code) and even change the runtime data (variables, DOM) to validate assumptions about how the code works.
This method has also worked for me when building features in a large codebase. Write a unit test first and then keep checking where the code breaks and fix those until your test succeeds. This is effectively TDD. Note that you might have to refactor the code for better design but it gets you started towards understanding the flow.
Find out the deployment process. Find out what to do if anything goes really, really wrong. Try to find patterns in the code. Just explore for a while and even try to create you own mental model of how things are structured.
When dealing with frontend code, my attitude remained the same, but instead of looking at the database, I'm typically looking at the API. The goal is to see how data flows through the app. That is certainly one way, for understanding the code. This used to be easier, with classic RESTful APIs, I find this approach is a bit more difficult when using GraphQL.
Also, just to state the obvious, run the unit tests. If the code base lacks good test coverage, ask if you can start there, writing more tests -- that will give you a purpose and a structured way of diving into the code. I also suggest that you deliberately break things, and then see which tests fail -- if a test fails and you weren't expecting it to fail, then you just discovered a linkage in the code that you didn't think would be there.
Not sure when a bug was introduced? Write a test then bisect (if you can't automatically bisect frontend code you have other problems, but manual bisects work in a pinch).
Not sure why something was written the way it was? Hey sometimes this stuff just evolves naturally, try looking back through blame history to see how different commits effected the code.
Not sure what needs touching to add a new feature? Try seeing if you can find a similar feature that someone else wrote in the past.
Then, the custom idioms and weirdness that develops in every project, often it's a set of patterns that the team just 'acquires' - often it's in every file, every function. It's like learning 'words' specific to the language they have created for themselves.
Then get someone to explain the build systems, tooling etc..
And then, just of the sake of it - make sure to build it. Make a change and build it again. I think there's a weird leap in subconscious confidence that comes along with 'building it'. Like riding a wild horse for the first time, if you can do it for one second, you can do it for longer.
Once you have you head wrapped around the system, and the 'systematic things' (like idioms) and can navigate the tools ... then it a matter of breaking thing down into details. Which is where the work is of course.
But you need a 'map' and a 'cart' and your 'hammer' before you can start waltzing around Campus thinking about fixing things.
Git grep (or ripgrep) to find usage - useful when refactoring and to see how data is used/accessed and where it's passed around. Also useful to note where certain data doesn't show up: you can infer some structure from this.
Looking at when something was last changed with git blame can be useful. Is something suddenly broken, but hasn't been changed in 5 years? Could give an indication of where not to look on a first pass.
Break some things (locally) on purpose. Get a feel for how errors bubble up through the application and how dependant code behaves when something is wrong.
Look over the last handful of PRs/merged patches. It can be helpful to see these smaller pieces of code, the changeset, and their associated context - whether it was a feature, a bugfix, and what the code was supposed to achieve.
Use existing code in the codebase as a styleguide. Most work on large codebases isn't groundbreaking or innovative, so you're likely to find existing code similar to what you're currently trying to achieve that you can use to guide you.
If possible, make use of code reviews with colleagues.
- Familiarizing yourself with vim is great and one of the best timesaving skills I decided to pick up randomly. You can install vim plugins for most IDEs
- Pick a small feature that you are curious how it works, and focus entirely on how it was implemented. The simpler the better really. Use git history to see the commit that the feature was introduced, and look how that engineer implemented it. It's quite easier for me to learn how something works by just focusing on seeing the Pull Request for the implementation. Just focus on looking how various things were implemented, and you'll (hopefully) see a pattern in code practices & design patterns.
- If you have questions, you should feel free to ask other engineers that work on the project. I was in a pretty senior position at my last job and we had quite a large codebase, and I personally never minded helping out junior and new engineers ever when they were coming up to speed on a new codebase. I always took the viewpoint that it's always in my best interests to help out the newer team members get up to speed quicker and have a better understanding of the product so they are able to effectively contribute. Hopefully engineers on the project you are working on think similarly.
- Absolutely learn how to perform proper debugging if you are not familiar already. Learn how to hook the browser to your IDE, how to troubleshoot things on the backend if you get to the point where you're doing fullstack work. Effective debugging is an incredibly important skill to pick up.
- Learn how to use IDE / other editor tooling such as jump to definition, refactor, inspect, find all occurrences. If this is a react project learn how to use browser extensions like devtools to help with debugging and understanding document structure.
A lot of people in this thread are recommending grep and ripgrep, but you do not have context with those tools, so it isn't as helpful, imo. With an IDE, you can find usages of symbols, trace connections, build graphs, etc.
If it's a different language, you could try looking around in the docs if anyone's generated callgraphs, or you could look up ways to do so.
To trace the code you could use a debugger -- e.g. if it's gdb just issue the command 'start' and then step through from the main function to see how things go. Or (assuming it's something like a C program) you could get an strace (example usage: strace -vvvttf -o strace.log ./program) and maybe get a feeling for the config/etc. files read or written to, network services accessed, etc.
It would help if you could tell us what kind of program it is, or what kind of programming language it's written in.
Other then that there is not much to that you would not do in other code bases as well. Set up your IDE probably do you have all the linters ready and can easily navigate the code base.
Ask your lead or mentor to show you the low prio backlog Tickets that have 1 or 2 story points and start the to get your feet wet. Once you have solve one, ask how your commit process works and follow it. Rinse and repeat three times then pick a ticket from the Sprint filter and start contributing.
IMO it’s by far the fastest way to understand how an application bootstraps itself and produces its output.
Maybe between Christmas and New Year I might have some time to figure out what is wrong.
* If there's no architecture-document that gives a short (3-5) page overview on the layout of the code then start writing one.
* Skim through the folders / packages / classes and write down a few obvious questions that pop into your mind while "speedreading" the code.
* Try to make an improvement to see how the whole CI/CD process works.
* Look through the list of direct dependencies / libraries and see where they are used.
Personally, i've also had good experiences with Sourcetrail, for seeing how bits of code fit together within a codebase, although the development on it has ceased recently: https://www.sourcetrail.com/
Also, some are recommending text editors with plugins or specialized tools, but i'd also like to suggest just getting a really good IDE that's integrated with the tech stack that you use. Personally, JetBrains fills that niche for me: https://www.jetbrains.com/products/
Depending on the language and framework support, it can lead to an amazing development, refactoring and testing experience, albeit has some drawbacks in comparison to text editors like Visual Studio Code - it uses more memory and CPU resources (especially when indexing the project, a tradeoff that most IDEs out there make in one way or another) and also is a paid product, i just got the Ultimate package of all tools for my personal and work needs.
Apart from that, i'm not even sure - jumping around definitions in source code and seeing how different things are connected, what the dependency graphs are like and so on is nice, but understanding why things were built that way might require ADRs (https://adr.github.io/) which many companies still don't use, or trudging through issue management systems (like seeing what issue a piece of code was developed under, then reading the user story in Jira etc.). Or course, having READMEs and automated scripts for project setup or common actions in it, ideally versioned alongside the code, can also be really nice!
Personally, i think that we as an industry would benefit a lot from more focus on DX (developer experience), both in regards to tooling to explore codebases, as well as practices in regards to documentation for the actual devs to use and dogfood.
To see how it uses the backend, open the network tab in Chrome developer tools (or equiv in other browsers), look at what requests it makes, and then find corresponding controllers within the code.
Look at git commit (messages) and issues/PRs. Which parts of the codebase/process is involved in making which type of changes.
Any codebase, especially large ones, may not be consistent during to legacy and many reasons.
Use an IDE + grep/ag.
grep -HIron 'your search pattern' *
Sometimes I'll add -i (case-insensitive), or drop the -o (for some context but beware minified files), etc.
Step 2: Document their knowledge