What are the ways you do it? Is there anything similar available for functional programming?
1. I go straight to the entry point, the main(), and then follow how the initial configuration, flow of data, sanitisation, and routing is done.
2. I look for bugs. Fixing bugs reveals the complexity as you need to look for side effects of the fixes when you don't yet know the system. Writing tests for those fixes also helps understand the system.
3. I look for the least changed part. I find these are usually the oldest and most core part of how the program works, whereas more recent changes are business logic and feature addition.
But of these, the first yield the greatest initial understanding and allows me to change things with less fear.
Get a good flame graph up, and you'll have a really solid visual representation of what's going on.
Bonus: on almost any project, nobody has done a profiling pass in at least a few months, so you'll probably discover some extremely easy performance improvements and you'll look like a goddamn hero when you speed up e.g. the test suite by a factor of 3 in your first week on the job.
I suggest:
1. Find a senior dev, ask then for exisiting pointers to good documentation to self learn.
2. give that a go, make note of all the questions you have
3. then have a session with that dev for platform walk through. Take lots of notes and ask your questions.
4. offer to update docs where you found errata or missing steps or even complete topics not mentioned
5. suggest to the team anything about onboarding that can be improved.
Hopefully there's an overview of the code base in an `ARCHITECTURE.md` file[1], and then read through it, and the respective documentation and tests for the main modules mentioned in it.
If you assume their tests cover the important business logic / stuff "they want to keep" (ref. "Beyonce Rule"[2]), they should inform you about the most important stuff.
> [1] https://matklad.github.io/2021/02/06/ARCHITECTURE.md.html
> [2] https://www.oreilly.com/library/view/software-engineering-at...
Someone can explain a code path, what it should do, what the bug is and with that you can get familiar with a path through the application.
Instead, I get myself a couple specific tiny bugfixes/features to do first. Just finding out where those are, one by one, tells you a lot and may not be as simple as it sounds.
I was once hired to help with polishing a code base for imminent shipping. I fixed one bug. The fix was one line, but not trivial at all. Took me a whole week of reading code. The customer was extatic. There were like 12-15 years worth of layers of code to read.
I'd start from the highest level abstraction of the code and work downwards until I reach a domain I'm either interested in or asked to work on and specialize on that vertical for a while (this can be anything from a few weeks to say 6 months). I then repeat the process on other verticals if needed/wanted.
So going from highest level of abstraction down to actual code:
1. read docs or converse with others around what the value proposition/s are of the product/service/app.
2. Understand the main use-cases or if not obvious, read product brochures or w/e you have in terms of "sales" material for end-users.
3. Try to map the main use-cases back to high-level architecture diagrams (if available).
4. After doing above steps if there are multiple domains I would pick one based on either personal interest or assigned work.
5. When starting with a business domain (meaning some high level grouping of code based on their business function), I tend to focus first on the design of the persistence layers as its usually less dense and less sprawling than other parts of a code base and can give you some idea of state management.
6. From here I generally start up the service/s or apps related to he domain I'm studying and try to play around with it, trying to tie previous steps together in my mind with what I'm observing with my interactions.
7. At this point I would generally have documented my findings (whatever means / form it is done) and ask for a session with someone that's familiar with this domain and ask their opinion of my documentation, making corrections where needed.
7. After this it's generally best in my opinion to just jump into work.
8. Personally I find doing support work fixing bugs for about 6 months gives you a very good lay of the land and people.
Jumping straight into feature work is not optimal in my experience as it's less likely to provide as wide an array of exposure as support.
This obviously only fits certain scenarios, but for your garden variety product/s this is how I'd go about understanding the code base.
Oh, commit history is also a very very rich source of info if there's an established culture of good commit messages.
Usually at least one of them stands out, so I at least read this through (usually diagonally).
I might also pick different things based on my goals.
Once I think I have a grasp of the high level aspects, I start pairing or validate with tiny feedback loops.
Update: I also create my own (naive) helicopter view diagram of the context and validate it with people on different levels.
Someone on my team has been giving the dev team demos of the functionality and thinking behind the product a few days a week. My one request at the beginning was that they should learn enough about the product to be able to give a demo back to us. It took them about 2-3 weeks (maybe 8 45 min overview sessions from my team, which owns the product requirements), but it showed that they know what it is the tool is supposed to do.
They spent another 3 weeks “getting comfortable” (6 weeks from start) they finally felt comfortable to start implementing small features and bug fixes. I’d have preferred that they start fixing bugs right away (it might take 2-3 weeks to fix the first bug because they need to figure out how to get access to systems, documentation, deployment, etc.) because it’s more tangible, but I know I’m impatient and let them do it this way. It seems to work ok so far but will be another month or so before I can decide whether or not they are actually competent. I guess the good news for them is they (team of 10 in Eastern Europe) aren’t being bugged by the client, and if they actually are good, should be enjoying the freedom to do things their way and implement their own processes.
Projects, especially messy ones, often behave like lava flows where there is an active and ever expanding edge where changes are currently being made. Beneath this are layer upon layer of nearly impenetrable and often implicitly deprecated code from former developers.
This practice came from a time when I was brought in midway through a rewrite to get rid of unmaintainable code from some offshore contractors. I saw a repository where half the code lacked any organizing principles and had massive security issues. The second half was textbook (pedantic even) OOP, the kind taught in Java textbooks. It was beautifully executed except for using a few outdated tricks to do OOP in early versions of PHP (no longer needed in the version used for this project).
Because I didn't look at the dates, I assumed the neat OOP code was the result of the rewrite. I was wrong.
Sort of unrelated, but I've got a story about a project I was looking through that confused the hell out of me. It was a C# library that would allow you to render an element from a shockwave flash file (it was either .swf or .fla).
I spent ages digging through the code. The example worked really well, but I couldn't get it to work with one of my files.
Eventually I contacted the author and he told me the library used reflection to get the name of your variable and would look for that variable name in the flash document.
0) Read the code base docs (or README).
1) Pair with someone with knowledge of the code base or ask them to walk you through the code base.
2) Identify the public interface to interact with the app/api. How do consumers use the software. Play around with the app or api to get a sense of how things link up.
3) Identify various tools used in the code base(db, messaging, external api, etc). Now you know each tool is setup somewhere and used in one or more places.
4) Identify the patterns and conventions used (CQRS, mediator, dependency injection, middleware, pipelines, logging, etc). Now map the flow of each public interface using this knowledge.
Second - I learn how different components are wired together.
Try actually using the program as an end user would.
Read error messages, read code, make predictions about what the code does, find out if your predictions are true.
Another strategy I like is picking parts of the codebase and trying to refactor them. You don't even need to commit anything if you're not supposed to go around changing things: just by spending some time moving things around, seeing what breaks and so on will give you a better understanding of the code and what it does.
1: https://www.scitools.com/spaghetti-code
2: https://blog.ndepend.com/visualize-code-with-software-architecture-diagrams/
3: https://www.sourceinsight.com/#call-graphs
I find the only way for me is to actually run the code locally, play around with it until I understand the data flow.
I've recently did it on huge, very very specific codebase and after 2-3 months I understood it (what to add where, not just what's happening where) relatively OK
Reading the codebase does surprisingly little for me, you essentially have to change it and see what happens.
Also, attempting to draw a sequence diagram and fixing it as you go trains the brain to handle a mental model of a large process.
1. Fix a couple of bugs
2. Add a small feature
3. Refactor a small piece of it
Always start with few small things and keep increasing the complexity of the things you do. Working on a codebase is the best way to understand it.
Also try to read tests, because they can show a lot about how the components are used and their properties.
Given this background, “What are the ways you go about getting comfortable with a new codebase?” Here is some of what I feel/think/do:
1. In these companies, it is “understood” that a new hire will take about a year+ before they know enough to be useful. These systems took peoples entire careers’ to learn about and bill. It is “understood“ that a new hire will not learn the system “overnight”.
2. Even knowing that it is “impossible” to know the system quickly, I have a personal desire to contribute to the company (e.g. be of use) as quickly as possible. It is very important to have reasonable personal expectations. Chill out and allow yourself time to learn. You can’t avoid paying the TIME cost, to learning a complex system. You can’t skip the Time/Exposure cost required to learn a complex system. If you do skip the TIME/Exposure cost, you are fooling yourself into thinking you “know” something you do not. It will show.
3. Get access. You need access to the code base repository(s). Get access to the database(s). Get usernames and passwords (if allowed in your company). Get login credentials to your applications. Get access to the documentation store(s), ticket tracking system(s), applicable laws and regulation(s) locations, etc.
4. Get a development environment running. Checkout the code. Get it running on your local machine. I want to be able to run in debug mode where possible.
5. Define what it means for you to feel like you know the system. For me I feel like I “know” the system when: a. all the names and acronyms are familiar (every company and system has their own “language” b. I can look a random page in the system and known what database tables are referenced when the page loads, and what database table(s) are changed when the UI buttons on the page are pressed. c. I have a mental model of the data and business validation that will occur when a given UI button is pressed d. I am comfortable enough in the language(s) and technology stack(s) to make changes, review code, and deploy code e. This standard is pretty easy for me to “say” but takes YEARS of hard work to achieve.
6. Commit to keeping a log of your daily activities. Writing what you are doing, seeing and learning can act as a form of team coding with yourself. As you write, you are forcing yourself to “teach” and “explain” what you are doing and why. This activity helps me internalize the system quicker. This can help a lot if you are in a company that requires you change problem contexts a lot during the day or week. Also take LOTS of screen shots in your log (using Snagit or ShareX, etc).
7. Commit to creating system documentation. This can be in markdown, confluence, word, etc. It can be in a git repository or file system. It must support screenshots and page links to related pages. I tend to do UML diagrams in Drawio or Visio (exported as images) but if your tool supports UML all the better. I like to organize my system documentation by UI Page. For every major UI Page, I have a system documentation page of the same name. It may have many child pages (depending on how complex the page is). This simple structure means that I can quickly find the documentation that I have 6 months later when working on the same page again. Having a documentation space to put documentation makes it easier for me to create more documentation.
8. Depending on the company, you may ask to job shadow another developer for a period of time. This can mean spending an hour a day or 8 hours a day with them in a team coding environment. Ask another developer to do this. If they say yes, you can learn a lot more quickly about the system.
9. Depending on your company you may have leeway in how you use your time. You may be given issue tickets on day one or you may be sheltered from the storm of requests and given time to learn for a month to a year+.
10. If you are given leeway to research and learn (which is rare), one of the best ways for me to learn is to try to build a mimic system. I look at the existing system and try to build my own system to do what it does. If I can produce a screen that loads data that looks like the original system than I have a high degree of confidence that I “know” how it works. If I can produce a button that saves data (and does all of the complex validation, database changes, etc) that the original system’s save button does, then I have a much higher degree of confidence that I “know” how it works. As I mimic the system, I attempt to document what I am learning about the original system. These notes can become invaluable later (when in a time crunch to fix some critical issue). Building a system mimic makes it much harder to “fool yourself” into thinking you know what is going on. Let the compiler and screen comparison be a mentor. They can be brutally honest but effective teachers.
11. If you are not given such leeway to research and learn (which is common – people pay you to produce), you can learn a lot be reviewing prior completed issue tracking tickets. You can see the most common topics and rough patches in the system (for that time period in the year – problem hotspots change over the year(s)). The existing system documentation (if any exists) may be very helpful. If there are Unit Tests, these can be very valuable learning aids as well. As you work issue tracking tickets you will be forced to learn details about that specific area of the code. It may take longer to get a “wholistic” view of the system this way (as opposed to building a mimic system) but it will eventually get you to a similar level of understanding, via a different route.
12. Some of these company’s architectures have and support Unit Tests. Count yourself luck. Make use of them and add to them. Other company’s architectures do not (or make it really hard).
13. Spend as much time as you can playing with the Application UI. Pretend to be an end-user, loading pages, entering data, clicking thru the process and saving. You will see a lot of validation errors. You can learn a lot about the workflow and pain points in the workflow. You can learn a lot about how the front end effects the database. Systems tend to get better if you (as a developer) “eat your own dogfood” that you make your end users eat. Be in and use the application(s) on a daily basis. This will help you learn the company language and help you communicate with your users. It will also help you see why users are having trouble with a process and perhaps see ways to make the workflow better. It will also expose bugs and give you an opportunity to fix those bugs and learn more about that particular part of the system.
14. One of the most useful things I can do to learn a new system is to build “UNDO” features. I am not certain why this isn't more 'popular'. It feels like it grants SUPER POWERS to me. If I am looking at an Application UI button that does a complex Batch process, the best way to “know” that I know what it does is to build a script (and another button) to undo what that Batch button did. Many of the systems I have worked with do not have an UNDO button for complex processes. This means that everyone is scared to run them because they do so much. This means they do not get run or tested as often as they should. This builds up additional fear of clicking the button. If I can create an UNDO script, I will have to document everything it does. I have to “Know” every database record that it changes. Once I have the UNDO script, it becomes easy to run the button to do the complex batch process and it becomes easy to get back to a virgin state again. The easier it is to run, the more roundtrip testing you can do and the more likely it is for you to make the code better. This all can greatly reduce fears of that process in the future and can greatly speed up supporting the feature in the future.
I am sure there are could be more added to my list. Of course every company and every position is different. You'll have to figure out what makes sense in your situation.
Hopefully it gives you some ideas. Good luck learning your system(s).
- Check out the source code and look at how the codebase is laid out. i.e., the directory names. This often gives me an understanding of what is vendor code or third-party dependencies, test, and 'core' source code.
- How to build the product, and what artifacts are produced. This will tell you a lot about where to start looking.
- Take a small bug where further information is requested to help triage/solve the bug. This could be reproducing the issue by clicking around or performing a workflow by issuing commands, or rooting through logs and database entries to ascertain the state of the system. This is often an instructive exercise because it requires that you understand how the system is deployed, which is always a good thing to know, and how to troubleshoot an area that you know little about, which you will often be called upon to do. Reading the logs gives you a pretty good idea of the system initialization sequence, particularly in a cloud product. Ditto with looking at the core metrics or example traces of the system. Traces, if available, instantly lay bare a 30000 ft view of the system before you.
- For web apps, the router file. There is usually a file that contains various route definitions and the entrypoints to them. This is a great start for figuring out what links to what. Something basic like a simple GET of a collection or a health check is a great way to get your feet wet.
- For web apps and others that use a database, the database schema. Often I just do a COUNT(*) of tables and look at the schema tables that contain the largest number of entries.
- Unit Tests. For a particular functional area, these are an excellent aid to understanding what the expectations are and how they are tested. I also write unit tests for areas that do not have them as a way to get familiar with the codebase.
- Results of Smoke Test and Integration Test runs. These often give you a bigger picture idea of what the system does in relation to others that surround it, and the major 'compartments' of the system as it were.
- Fixing bugs of varying complexity. This is an excellent way to instantly get familiar with building, testing, code reviewing, and putting through standard precommit testing your change. The change itself is incidental; the things that you learn during this process will help you get productive quickly.
- Writing small features that are very self-contained and interact with one or two areas of the system. This helps you understand a few system areas inside-out, and you can slowly grow your understanding of the system by doing features that touch newer areas that you'd like to know.