How do you learn a massive codebase?

Question

Kernels, browsers, schedulers, etc. often easily reach the 100k-1m+ LoC mark. What strategies do you employ to understand the high level architecture and begin contributing in a meaningful way? What tools do you use? What is your thought process? How do you get up to the same speed as the developers who have been working on the project already for years if not decades so that you can become an equal contributor?

iamstupidsimple · Accepted Answer

Figure out the core data structures and do this at different level of abstraction. I'm currently reading through bits of the TensorFlow codebase, and starting with TF's GraphDef[0] was a good place to start. This is great because it's high-level enough that I can understand how TF models are structured on-disk and I can follow the deeper threads (e.g. how does device acceleration work) if needed.https://github.com/tensorflow/tensorflow/blob/master/tensorf...

tyroh · Answer

1. If the original authors are still around, we do a knowledge transfer. 2. Create diagrams about the relationships between all the systems involved. UMLs, ERDs, and all the rest of them 3. Clone the system locally and trace how the data flows from one point to the next. I'd go with logins first since we all know how it should work so it's easy to follow. 4. Lots and lots of trial and error when contributing new features. If it's well made, tests will catch your bugs. If you have a good team, they'll cut you some slack since you're just new and they were also new, too.

karmakaze · Answer

Go deep. Find something interesting and pull on that thread and keep going until you figure something out. If you do/don't get that far, start changing some things and see how it (mis)behaves, iterate.

db48x · Answer

The same way you eat an elephant: one bite at a time.

goy · Answer

Check this: https://mitchellh.com/writing/contributing-to-complex-projec...

Flankk · Answer

Talk to the people who understand it well. Unless it is extremely well documented that is the only sane method. If you can get a class diagram it will help.