I never know where to start looking, specifically if its a language i am not too familiar with i have no idea where to start and sometimes i have no idea where the program execution starts.
Is there maybe a ladder of small to large open source projects that can get you there?
For example i have no idea how to begin reading the Flask(1) open source code.
What approach can i take to get to a point where i can analyze a project like Flask and get something meaningful from it?
(1) https://github.com/pallets/flask
When I was contracting, every few months I was looking at a new-to-me code base. First - if you don't know the language the code was written in, then you need to learn that at some basic level. Next, if there was some framework or set of libraries involved, see how many of those you can identify and how many you might already know. Start with trying to run through a build, as you'll either have success or it will throw errors and that will teach you something about both the health and the components of the project.
Once you actually get into the codebase, look for patterns. Often there will be pretty clear layers or types of components, identifiable either by name or organizational structure. Sometimes you'll see patterns that tell you "Oh, George must have written all this code, and Steve must have written all that code" because of personal idiosyncrasies. That's the worst case, but sometimes it's what you have to go on.
I spent a lot of years in application triage, where I was called in to fix serious problems in apps with which I had no experience. Like today's Apple outage - I used to parachute into situations like that and had to get things running, fix, post-mortem, etc. There are great tools out there where you can inject them into running code and watch real / test users execute code. Thats's a great way to very quickly learn how everything is assembled, IE: why did that piece of code get executed there, or why didn't that piece of code get executed.
- It just takes a long time: I find I'll spend a good 1.5-2 years at a company before I can finally say with confidence I have a decent grasp on the entire codebase;
- I try to focus on making small changes/bugfixes to specific components. Do that enough and you start to see how things fit together;
- Running the code with lots of logging sometimes helps;
- Finding some sort of high-level architecture diagram or documentation works wonders. Usually in the industry there's one dude(ine) who's been there for a while to set you on the right path, not sure about FLOSS...
Using blame or navigating closed issues can do a lot to put specific problems into plain English to help with this kind of inward-out exploration (which can also help with understanding the use cases of the application). Same with reading any tests that might be there. And if there aren't tests, writing them is a great way to get into understanding specific functionality.
2. Ask the developers.
3. Play with the software and trace through the source code. For flask, make a simple application and grep for the definitions of the functions/methods/classes you’re referencing. While doing this, you will find references to other flask functions. Find read those definitions Ad-infinitum and eventually it’ll “click”.
I hope however if you are going to analyse a codebase in a language you are not familiar with one of two things apply -
1. There is a part of the codebase that uses some language you are very familiar with, or that touches a specific knowledge domain (like frontend development or database querying) that you are very familiar with, if so start at the point of greatest familiarity and try to walk backwards and document for yourself bits of code that are touching the stuff you do know well.
2. You are in a situation where you are a secondary developer to someone who is an expert in the language, hopefully they can give you a tour of the codebase.
If neither of these are the case I guess you are going to have to learn the language and get a beginners guide to Flask or something like that. Maybe ask on a forum for the language what the best learning sources are.
As far as reading and understanding the code, I start with entry point and make my way through doing a mental map. Once I understand how the app structure I setup and some of the conventions, I dive in to specific features and try to make changes.
Additionally, I like to diagram things as well, just for my own mental map model. Depending on the tech stack there could be libs out there that diagram the app with a way to navigate, and this is also really helpful.