Do you have favorite repos that highlight this?
I have an irrational fear of unknown codebases since it feels most of the code is either boilerplate or tied to some framework.
Do you have tips and tricks you use to read codebases?
The reason it always impresses me is that C can look like gobledygook, but yet this codebase is clean and understandable.
If so, then here's distributed consensus in Zig:
https://github.com/coilhq/tigerbeetle/blob/main/src/vsr/repl...
Something that differentiates this from many consensus implementations is that there's no boilerplate networking/multithreading code leaking through, it's all message passing, so that it can be deterministically fuzz tested.
I learned so much, and had so much fun writing this, that I also hope it's an enjoyable read—or please let me know what can be improved!
Here's a windows manager (dwm) and it's docs and build system in 13 files and just around 3000 lines of code.
https://git.suckless.org/dwm/files.html
And sbase, a sort of "busybox-like" set of common *NIX base utils written to be small and portable. Some of the commands are just a few dozen lines.
It's funny because I remember comparing it to mine that I had tried to write during college, and appreciating how much better it is.
Pay attention to how there's a bunch of different types of chess in there too, and how that's factored.
e.g. "assert_equal" is really just "expected == actual" at it's core but it uses both both a block param (a kind of closure) for composing a default message and calls "diff" which is a dumb wrapper around the system "diff" utility (horrors!). There is even some evolved nastiness in there for an API change that uses the existing assert/refute logic to raise an informative message. this is handled with a simple if and not some sort of complex hard-to-follow factory pattern or dependency injection misuse.
https://github.com/seattlerb/minitest/blob/master/lib/minite...
The best way to level up is to code. Reading code can be a complementary activity that can bring insights but it's not a way to level up. Active > passive.
> Do you have favorite repos that highlight this?
For what language? Desktop, mobile? Systems programming or web development? Linux/BSD/etc all have source code available. I believe microsoft has open sourced the .Net Framework or parts of it.
It's like you are learning a foreign language and want us to recommend good books? Can't really help you if you don't tell us the foreign language and your goals for the language ( casual conversation, business, translation, etc ).
I don't know boilerplate-heavy systems like Rails or Django too well. But I just wouldn't suggest starting with reading web app code (though maybe I've ignored reading too much web app code over time).
The easiest code to start thinking about is libraries and things you use today already like the nginx code base or the CPython code base or your logging library or your web server library code.
In these cases maybe you download the repo, build it, see how you could make a small tweak and run it. And soon you're looking through its code to understand how it works.
Another maybe easier technique to start reading more is when you are programming and have an error in a 3rd party library, use grep to find that error in 3rd library code and just start poking around when you do. Maybe add some print statements to it so you can see more of what goes wrong. Try to solve the problem just looking at the code and modifying it instead of using google.
If you ever get into it I'd love to hear from you. Email is on my site and Discord is in my HN profile.
It feels like it has more comments than code. The comments are written in a very nice, understandable language that even activley teaches about concepts that are only adjacent to the code at hand.
E.g. https://github.com/grbl/grbl/blob/master/grbl/stepper.c#L142 or https://github.com/grbl/grbl/blob/master/grbl/stepper.c#L233
I'd suggest this codebase as an excellent lesson in how bloat and complexity enter into the picture over time - I wish the actual commit history was available, but unfortunately the open source release was just a snapshot in time.
Often this will be along the lines of "How does it do X?" - where X is something I either didn't know was possible or that I suspect to be really difficult.
Then I can dive in to the codebase (usually starting with GitHub code search) and try to figure out how they do it.
This helps me skip straight past the boilerplate and means I often get to a satisfying conclusion - where I've learned something new - in a very small amount of time.
And along the way I pick up knowledge about how their code is organized and often a few other tricks too.
Both languages are extremely readable, even when looking at unfamiliar code.
The Zig standard library is small, yet covers a lot of common tools and structures. Every file contains implementations of one particular thing, so you can casually browse random files and understand what's going on without having to understand the entire context.
It is easy to read and has taught me some neat Python-isms.
I find a lot of code fairly alienating to read. Lots of codebases require you to get into the "mindset" of the person who wrote the code: their idioms, assumptions, patterns they lean on, etc. So unless you've got the time to get deep into it, the insights you can draw from reading it are minimal.
Ramda, by comparison, is just a library of utility functions, and all of those utilities perform very simple operations: merging, plucking, appending, equality checking, etc.
There's a lot of intention in the Ramda API as well. All functions are "data last," meaning that the actual piece of data you're operating on is the final argument to every function. This enables you to write Ramda code that is very structurally consistent: function parameters first, data last, every time.
It gives me a sense of empowerment, reading the code. It's like "This doesn't have to be rocket science. If you just start from these basic operations, and write those basic operations with a simple but strict ideology of 'data last' every time, and stick them together like lego blocks using compose, then you can achieve some very cool stuff with very little code."
I’ve got two main strategies:
1) I look at the part of the app I want to modify when I use the app and search for that part in the code. Once I’ve found that code I roughly try to find out how that code works by adding exploratory code (you can also use a debugger). Once I “think” I know what is going on I try to modify the code. This is where you usually find some exceptions or misunderstandings on you part if you haven’t touched the code before. If you are lucky and work in a team somebody can tell you in a code review that you didn’t understand. If you are alone you will have to see things blow up, debug and fix the problem.
2) You can try to figure out from the main entry point how the app works. This works better for some apps than for others. If you have an event based app this is most likely just a supplement to method 1, if you have a cli app or some type of data munching app this can replace method 1.
3) You can try looking at early versions of a code base in GIT to get an understanding of its architecture before the app became “more complex”.
You will always be a bit overwhelmed by any code base and many code bases are just to large for a single person so get comfortable working on “parts” of an app first rather than working on or understanding “the whole thing”. Also, code reading is not like reading books, code is way way denser than any book you can read (and that includes Heidegger) so you will not just “read” it, you will need to work with it. Zed Shaw’s “Learn X the Hard Way” series relies on you working with the code to understand it. The same holds true for code you “read”, you will at least need to try to “run” the code in your mind if you can’t run it for real.
You might also want to get over your thing about frameworks. QT, GTK, Ruby on Rails, React, ncurses, frameworks and libs are in just about any app and many apps that get larger might extract significant parts of their functionality into libs or frameworks. A lot of boilerplate is usually a good indication that an app could benefit from a framework. I never understood the “I want to be free from the constraints of frameworks” people. Their code bases usually have the start of multiple architectures and a lot of boiler plate code. I think they always search for some “perfect” solution and just can’t find it. The truth is, libs and frameworks are great, they give you an easy in on a new app and they give you documentation that probably wouldn’t exist on fully home grown code. In other words, they mace “reading” code easier.
Hashicorp projects also seem very well done too especially given how extensible they are.
If you are using ruby, for instance, just search for https://github.com/search?q=language%3Aruby and look for popular codebases. You can decide which are beautiful for yourself.
In terms of tips and tricks, I often start looking at new code by trying to write out in plain english prose, a bit of a story of how the code works. Almost like I'm writing a blog post explaining how things work to someone else. Often this process uncovers rabbit holes that I need to go down to understand isolated bits of logic before I can return to building this big picture view, which is sort of the point.
If you have a Linux machine, you can compile and install manually by just following the instructions on the README.
Then you can customize the window manager by copying and pasting the patches into your version and recompiling. That forces you to learn how to build and extend your own window manager in pure C. And it isn’t hard at all, even to a beginner.
That inspired the creation of many tiling window managers, because people understood the code and decided to build their own, like i3 or xmonad.
The project also features other easy to read C apps, like ST terminal and the surf web browser.
My trick is to dig in when something doesn’t work the way I expect. Or someone says “I don’t think there’s a way to do X with blah”. My immediate reaction is to clone the code and take a look. I have a “tools” folder on my local machine that contains many of the tools / libraries is use.
Orientation is easier than you expect. The easiest scenarios are around “why did I get that error” situations. Grep for the error and away you go. But having a question to answer will definitely give you a direction to investigate.
Are you interested in any particular languages?
For Python, take a look at: https://github.com/psf/requests
I see every day code that is elegant but has bugs, ugly code that is foolproof, optimized code that performs abysmally because of some architecture change that happened in between, and a lot of abominations that make the code bad for guy A and good for guy B (e.g. a neat typechecked, object-oriented, very elegant, Pythonic numerical code that is 100 times more confusing for your research level numerical analyst than an uglier but functional Matlab script).
What I agree on is "the best way to improve X in my code" is "read code that has quality X".
Given the broadness of your question I suspect you are still finding your way around programming in general. If that's the case my method is to be driven by curiosity.
- Why does macOS behave this way? Let's look up xnu's code - I wonder about list implementation... Let's look at cPython code for appending items to a list
And so on... There is a lot of open code for stuff we are using everyday. It is interesting to get into it.
For C++, try Chromium: https://chromium.googlesource.com/chromium/chromium/+/refs/h...
One upside of this might also be that it's not as you said boilerplate, because it's very foundational and not heavily using other stuff. It also is well documented, so you'll find good explanations why things are the way they are.
I've also heard good things said for OpenBSD's readability.
Working through some badly written code that actually performs well can be a real eye opener. I mainly work in C and reading some legacy code (sometimes even my own) can be a challenge to work out exactly what's going on.
If you want to learn how an algorithm works, then a good clean codebase with lots of comments is a good way to go. If you want to learn the details of a particular language, then just read a lot of code in that language whether it’s good or bad.
https://www.doomworld.com/idgames/utils/level_edit/deu/deu52...
For jumping into new codebases I stick to the Jetbrains toolbox because it’s usually a consistent enough environment to investigate a new codebase. I also greatly appreciate the indexing.
Also, a lot of "clean code" stuff can be confusing dogma.
You should try building things you find interesting, and try to build them in a way that "feels correct", and try to emphasize - what if someone else was reading this? What if someone else dived into this codebase to add this feature? Could they?
- Lua
- Redis
- idtech3
- libuv
- linux kernel
- sqlite
As much as Ruby, Python, and Go tout for being elegant or clean to read, they are pretty horrible to read in the wild. C is where it's at.
#1: If the codebase is huge, you can't read all of it. So you'd best know how to navigate it.
#2: You need an IDE or cscope-like too to navigate a codebase. The codebase is like a web of, say, wikipedia articles, and you're going to have to browse it a lot like how you'd browse wikipedia. Symbols are links!
#3: It helps to understand the big picture. What does this codebase implement? Where are the "entry points" -- where to start reading? What's the architecture? (E.g., Java is a byte-compiled language with a bytecode interpreter known as a JVM.) What's the design look like?
#4: If it's just for fun, well, just browse till you find something interesting, then read it carefully, and go spelunking like it's a wikipedia article.
#5: If you're reading it to debug something, you need to first find the relevant entry points.
#6: If you're reading it to add features, you really need to read the developer docs (if they exist), the internals docs (if they exist), and figure out a lot of things like APIs exported, internal utilities libraries, portability layers, external dependencies, protocols, etc. This will take time, and that's ok. Start with small features, and work your way. You'll build a deeper understanding as you go.
#7: You don't have to understand all that much about the codebase in question, and it might not be possible to if we're talking about a codebase that's in the hundreds of millions of lines of code. You'll have to specialize as you dive deep, and generalize as you wade "near the top".
#8: It can take time to pick up these skills to the point where you can do this quickly. And even then, it can take time to understand a large codebase well enough. There's just a ton of detail that you have to digest into a mental picture that's sufficiently high-level that you can use it productively. So be patient, and keep on going. Just because it's a lot to learn, you shouldn't be discouraged.
To really deal with huge codebases, you have to be a bit like a generalist who can specialize as needed.
For example, if you're reading the OpenJDK, you'll want to understand what Java is, what the JVM is, and so on, though you won't have to understand all of that if you just want to read the OpenJDK implementation of, say, TLS, but you will have to be able to navigate outside that particular bit of the OpenJDK sometimes, but if you tease out code threads far enough, you probably will learn a thing or three about seemingly unrelated things like the GC.
Get comfortable doing these things, and you'll be able to deal with codebases in the millions of lines of code.
Adding on Tailwind, nothing lock you in.