HACKER Q&A
📣 impjohn

Codebases with great, easy to read code?


A colleague told me the best way to level up coding skills is to read excellent code.

Do you have favorite repos that highlight this?

I have an irrational fear of unknown codebases since it feels most of the code is either boilerplate or tied to some framework.

Do you have tips and tricks you use to read codebases?


  👤 tbrock Accepted Answer ✓
Redis. Read the redis source code if you want to see nice C.

The reason it always impresses me is that C can look like gobledygook, but yet this codebase is clean and understandable.


👤 jorangreef
Are we allowed to share repos we've written? :)

If so, then here's distributed consensus in Zig:

https://github.com/coilhq/tigerbeetle/blob/main/src/vsr/repl...

Something that differentiates this from many consensus implementations is that there's no boilerplate networking/multithreading code leaking through, it's all message passing, so that it can be deterministically fuzz tested.

I learned so much, and had so much fun writing this, that I also hope it's an enjoyable read—or please let me know what can be improved!


👤 oso2k
Almost anything from suckless.org.

Here's a windows manager (dwm) and it's docs and build system in 13 files and just around 3000 lines of code.

https://git.suckless.org/dwm/files.html

And sbase, a sort of "busybox-like" set of common *NIX base utils written to be small and portable. Some of the commands are just a few dozen lines.

https://git.suckless.org/sbase/files.html


👤 sgc
Along with the other recommendations, I was introduced to The Architecture of Open Source Applications from an HN post some time back, and have found it quite interesting. You can use it together with a more detailed walk through the respective projects' source code, to get a great idea of what some big names are doing.

http://aosabook.org/en/index.html


👤 foobarbed
I've always enjoyed lichess's chess API: https://github.com/lichess-org/scalachess/tree/master/src/ma...

It's funny because I remember comparing it to mine that I had tried to write during college, and appreciating how much better it is.

Pay attention to how there's a bunch of different types of chess in there too, and how that's factored.


👤 amichal
https://github.com/seattlerb/minitest really removed the FUD for me when i started learning Ruby and Rails. Its full of metaprogramming and fancy tricks but is also quite small, practical and informal in its style.

e.g. "assert_equal" is really just "expected == actual" at it's core but it uses both both a block param (a kind of closure) for composing a default message and calls "diff" which is a dumb wrapper around the system "diff" utility (horrors!). There is even some evolved nastiness in there for an API change that uses the existing assert/refute logic to raise an informative message. this is handled with a simple if and not some sort of complex hard-to-follow factory pattern or dependency injection misuse.

https://github.com/seattlerb/minitest/blob/master/lib/minite...


👤 qiskit
> A colleague told me the best way to level up coding skills is to read excellent code.

The best way to level up is to code. Reading code can be a complementary activity that can bring insights but it's not a way to level up. Active > passive.

> Do you have favorite repos that highlight this?

For what language? Desktop, mobile? Systems programming or web development? Linux/BSD/etc all have source code available. I believe microsoft has open sourced the .Net Framework or parts of it.

It's like you are learning a foreign language and want us to recommend good books? Can't really help you if you don't tell us the foreign language and your goals for the language ( casual conversation, business, translation, etc ).


👤 eatonphil
I've been thinking a lot recently how to get devs to read more code and it's a very interesting reason you give that you don't want to wade through boilerplate. I never thought of that before.

I don't know boilerplate-heavy systems like Rails or Django too well. But I just wouldn't suggest starting with reading web app code (though maybe I've ignored reading too much web app code over time).

The easiest code to start thinking about is libraries and things you use today already like the nginx code base or the CPython code base or your logging library or your web server library code.

In these cases maybe you download the repo, build it, see how you could make a small tweak and run it. And soon you're looking through its code to understand how it works.

Another maybe easier technique to start reading more is when you are programming and have an error in a 3rd party library, use grep to find that error in 3rd library code and just start poking around when you do. Maybe add some print statements to it so you can see more of what goes wrong. Try to solve the problem just looking at the code and modifying it instead of using google.

If you ever get into it I'd love to hear from you. Email is on my site and Discord is in my HN profile.


👤 ramboldio
GRBL the CNC firware for Arduninos:

https://github.com/grbl/grbl/

It feels like it has more comments than code. The comments are written in a very nice, understandable language that even activley teaches about concepts that are only adjacent to the code at hand.

E.g. https://github.com/grbl/grbl/blob/master/grbl/stepper.c#L142 or https://github.com/grbl/grbl/blob/master/grbl/stepper.c#L233


👤 thom
SerenityOS, especially the userland, has always seemed very elegant to me:

https://github.com/SerenityOS/serenity


👤 iamricks
How do you guys approach the "start" of reading a code base, i never know where to start looking, specifically if its a language i am not too familiar with i have no idea where to start and sometimes i have no idea where the program execution starts

👤 munk-a
I have a very different suggestion. This codebase (RPI Engine[1])is what I initially cut my teeth on and I learned a lot about good program design just by viewing what works and what didn't work. Reading and understanding code that's stood the test of time can also be quite valuable because you can see which patterns can survive lots of people touching it and which patterns start to fall apart when the original designer isn't available to onboard new people - MUDs develop through time with a few concurrent developers at most, and generally have stretches where there are no active developers, or the people executing code changes are learning it as they go.

I'd suggest this codebase as an excellent lesson in how bloat and complexity enter into the picture over time - I wish the actual commit history was available, but unfortunately the open source release was just a snapshot in time.

1. https://github.com/webbj74/RPI-Engine


👤 simonw
Something I find really helpful is to start with a question that I want to answer.

Often this will be along the lines of "How does it do X?" - where X is something I either didn't know was possible or that I suspect to be really difficult.

Then I can dive in to the codebase (usually starting with GitHub code search) and try to figure out how they do it.

This helps me skip straight past the boilerplate and means I often get to a satisfying conclusion - where I've learned something new - in a very small amount of time.

And along the way I pick up knowledge about how their code is organized and often a few other tricks too.


👤 jedisct1
Anything written in Zig or Go.

Both languages are extremely readable, even when looking at unfamiliar code.

The Zig standard library is small, yet covers a lot of common tools and structures. Every file contains implementations of one particular thing, so you can casually browse random files and understand what's going on without having to understand the entire context.


👤 dzuc
A bit old now of course but both Underscore [1] and Backbone [2] have annotated sources and are a pleasure to read.

1. https://underscorejs.org/docs/underscore-esm.html

2. https://backbonejs.org/docs/backbone.html


👤 cperciva
In past threads, people have mentioned enjoying my Tarsnap (https://github.com/Tarsnap/tarsnap) code. I personally think that the spiped (https://github.com/Tarsnap/spiped) code is even better.

👤 bikingbismuth
When people ask this question about Python codebases, I always recommend the Shodan Python client - https://github.com/achillean/shodan-python

It is easy to read and has taught me some neat Python-isms.


👤 afry1
I find Ramda very easy to read! It's a functional Javascript library based on currying and composition. https://github.com/ramda/ramda/

I find a lot of code fairly alienating to read. Lots of codebases require you to get into the "mindset" of the person who wrote the code: their idioms, assumptions, patterns they lean on, etc. So unless you've got the time to get deep into it, the insights you can draw from reading it are minimal.

Ramda, by comparison, is just a library of utility functions, and all of those utilities perform very simple operations: merging, plucking, appending, equality checking, etc.

There's a lot of intention in the Ramda API as well. All functions are "data last," meaning that the actual piece of data you're operating on is the final argument to every function. This enables you to write Ramda code that is very structurally consistent: function parameters first, data last, every time.

It gives me a sense of empowerment, reading the code. It's like "This doesn't have to be rocket science. If you just start from these basic operations, and write those basic operations with a simple but strict ideology of 'data last' every time, and stick them together like lego blocks using compose, then you can achieve some very cool stuff with very little code."


👤 johntdaly
To be honest, I don’t know any code bases I would call “great” or “easy to read” but I can tell you what I do when I need to work in codebases I don’t know.

I’ve got two main strategies:

1) I look at the part of the app I want to modify when I use the app and search for that part in the code. Once I’ve found that code I roughly try to find out how that code works by adding exploratory code (you can also use a debugger). Once I “think” I know what is going on I try to modify the code. This is where you usually find some exceptions or misunderstandings on you part if you haven’t touched the code before. If you are lucky and work in a team somebody can tell you in a code review that you didn’t understand. If you are alone you will have to see things blow up, debug and fix the problem.

2) You can try to figure out from the main entry point how the app works. This works better for some apps than for others. If you have an event based app this is most likely just a supplement to method 1, if you have a cli app or some type of data munching app this can replace method 1.

3) You can try looking at early versions of a code base in GIT to get an understanding of its architecture before the app became “more complex”.

You will always be a bit overwhelmed by any code base and many code bases are just to large for a single person so get comfortable working on “parts” of an app first rather than working on or understanding “the whole thing”. Also, code reading is not like reading books, code is way way denser than any book you can read (and that includes Heidegger) so you will not just “read” it, you will need to work with it. Zed Shaw’s “Learn X the Hard Way” series relies on you working with the code to understand it. The same holds true for code you “read”, you will at least need to try to “run” the code in your mind if you can’t run it for real.

You might also want to get over your thing about frameworks. QT, GTK, Ruby on Rails, React, ncurses, frameworks and libs are in just about any app and many apps that get larger might extract significant parts of their functionality into libs or frameworks. A lot of boilerplate is usually a good indication that an app could benefit from a framework. I never understood the “I want to be free from the constraints of frameworks” people. Their code bases usually have the start of multiple architectures and a lot of boiler plate code. I think they always search for some “perfect” solution and just can’t find it. The truth is, libs and frameworks are great, they give you an easy in on a new app and they give you documentation that probably wouldn’t exist on fully home grown code. In other words, they mace “reading” code easier.


👤 twothumbsup
I've found the Chef project (https://github.com/chef/chef) to be high quality and easily readable but I've been working with Chef for like 8 years at this point which might be influencing how I view it.

Hashicorp projects also seem very well done too especially given how extensible they are.


👤 lazyweb
Pihole [1] is mostly written in bash, which reads rather well, as far as I am concerned.

[1] https://github.com/pi-hole/pi-hole


👤 gorjusborg
I have found using github's language search to be helpful for this sort of thing.

If you are using ruby, for instance, just search for https://github.com/search?q=language%3Aruby and look for popular codebases. You can decide which are beautiful for yourself.


👤 khalladay
I think my favourite open source project to poke around in recently is [Reshade](https://github.com/crosire/reshade). The code is pretty readable and is doing a lot of interesting stuff. Every time I've taken a look at it I've learned something new. Definitely super light on boilerplate, given that it's solving a bit of a unique problem.

In terms of tips and tricks, I often start looking at new code by trying to write out in plain english prose, a bit of a story of how the code works. Almost like I'm writing a blog post explaining how things work to someone else. Often this process uncovers rabbit holes that I need to go down to understand isolated bits of logic before I can return to building this big picture view, which is sort of the point.


👤 sirodoht
Every time that I can't figure out how to do something with Django, I just read the code [1] and then everything is easy and clear.

[1]: https://github.com/django/django


👤 malkosta
I really like DWM: https://git.suckless.org/dwm/

If you have a Linux machine, you can compile and install manually by just following the instructions on the README.

Then you can customize the window manager by copying and pasting the patches into your version and recompiling. That forces you to learn how to build and extend your own window manager in pure C. And it isn’t hard at all, even to a beginner.

That inspired the creation of many tiling window managers, because people understood the code and decided to build their own, like i3 or xmonad.

The project also features other easy to read C apps, like ST terminal and the surf web browser.


👤 aidos
Look through the stack you’re familiar with. For me that means nginx, uwsgi, flask, sqlalchemy, alembic - but I’ll look at anything I have a question about.

My trick is to dig in when something doesn’t work the way I expect. Or someone says “I don’t think there’s a way to do X with blah”. My immediate reaction is to clone the code and take a look. I have a “tools” folder on my local machine that contains many of the tools / libraries is use.

Orientation is easier than you expect. The easiest scenarios are around “why did I get that error” situations. Grep for the error and away you go. But having a question to answer will definitely give you a direction to investigate.


👤 ramesh31
Doom 3 is a perennial favorite for "most beautiful C++ codebase" lists [0]

[0] https://github.com/id-Software/DOOM-3-BFG


👤 aitoehigie
This is a very interesting question.

Are you interested in any particular languages?

For Python, take a look at: https://github.com/psf/requests


👤 sharikous
I hate to be the "it's complicated" guy but "excellent" is too broad.

I see every day code that is elegant but has bugs, ugly code that is foolproof, optimized code that performs abysmally because of some architecture change that happened in between, and a lot of abominations that make the code bad for guy A and good for guy B (e.g. a neat typechecked, object-oriented, very elegant, Pythonic numerical code that is 100 times more confusing for your research level numerical analyst than an uglier but functional Matlab script).

What I agree on is "the best way to improve X in my code" is "read code that has quality X".

Given the broadness of your question I suspect you are still finding your way around programming in general. If that's the case my method is to be driven by curiosity.

- Why does macOS behave this way? Let's look up xnu's code - I wonder about list implementation... Let's look at cPython code for appending items to a list

And so on... There is a lot of open code for stuff we are using everyday. It is interesting to get into it.


👤 pogopaule
You might want to join the https://codereading.club/

👤 asimpletune
Im surprised no one has said reading tests as a good starting point. Any way, besides main, tests are usually good too.

👤 NWoodsman
This guy made a HN mobile reader and put all the code on Github for his NDC Oslo presentation, it was good and shows off very readable asynchronous code in C#:

https://github.com/brminnick/AsyncAwaitBestPractices


👤 shrikant
I see that you're primarily looking into Python work, so I'd recommend `smart_open` as a nice, compact way to get started.

https://github.com/RaRe-Technologies/smart_open


👤 anonymoushn
The zig stdlib has been good reading so far. You also basically have to read it if you want to use it.

👤 dataflow
For C, I've yet to see better code than ReactOS. Look at how they keep even monstrous functions readable: https://github.com/reactos/reactos/blob/3fa57b8ff7fcee47b8e2...

For C++, try Chromium: https://chromium.googlesource.com/chromium/chromium/+/refs/h...


👤 maxehmookau
GitLab is an excellent example of a large, complex Rails codebase: https://gitlab.com/gitlab-org/gitlab/

👤 happy-dude
Stockfish is well written, commented, and documented C++ code:

https://github.com/official-stockfish/Stockfish


👤 elcapitan
I remember back in the day reading parts of the Python standard library. I don't know if that's generally good advice or still viable, but that's what I did, and I found it helpful. It was directly available, and usually connected to things I used with Python.

One upside of this might also be that it's not as you said boilerplate, because it's very foundational and not heavily using other stuff. It also is well documented, so you'll find good explanations why things are the way they are.


👤 ibraheemdev
A lot of the Java concurrency primitives written by Doug Lea and co. are great reads, and very well commented. See the source of `ConcurrentHashMap` for example: https://github.com/openjdk/jdk/blob/master/src/java.base/sha...


👤 srvmshr
Depending on your interest, I could vouch for OpenBSD having a very clean readable codebase. Often it has some of the best practices coded in with useful commentary.

👤 markstos
For TypeScript, Ghost: https://github.com/TryGhost/Ghost

👤 65
Wordpress is pretty great.

https://github.com/WordPress/WordPress


👤 sparker72678

👤 HeckFeck
I've had a look at NetBSD's codebase before. It was fairly easy to follow.

I've also heard good things said for OpenBSD's readability.


👤 sethlivingston
Reading and using YUI3 (https://github.com/yui/yui3) took my JavaScript to the next level. It's no longer relevant because of improvements to the language, but it's the best model of readable JavaScript I've ever seen.

👤 numtel
Postgres

👤 everyone
Box2D https://github.com/erincatto/box2d I went over every file of this writing a Unity plugin for it in work once. I was really impressed, learned a lot.

👤 tanaygahlot
The book programmers brain contains a lot of tips on improving code reading skills - https://www.manning.com/books/the-programmers-brain

👤 SteveMoody73
wordI think it can be hard to recommend a particular codebase, well written code can be good to read but if you want to become better at a language or problem domain then sometimes reading badly written code may be a better way to learn.

Working through some badly written code that actually performs well can be a real eye opener. I mainly work in C and reading some legacy code (sometimes even my own) can be a challenge to work out exactly what's going on.

If you want to learn how an algorithm works, then a good clean codebase with lots of comments is a good way to go. If you want to learn the details of a particular language, then just read a lot of code in that language whether it’s good or bad.


👤 exyi
For anyone looking for a (nontrivial) C# project, I can only recommend going through ILSpy decompiler. https://github.com/icsharpcode/ilspy

👤 e9
I had to modify FFmpeg for a job and I found it surprisingly accessible and easy to read/modify: https://github.com/FFmpeg/FFmpeg

👤 bluedino

👤 makk
It's been years since I've looked but I remember being impressed by the NGINX codebase. https://github.com/nginx/nginx

👤 traviscj
I’ve learned a TON from the [okhttp3](https://square.github.io/okhttp/) codebase, highly recommend studying it.


👤 redocecin
Read source code of libraries using in your current projects. It helps you to understand them more and improve your coding skills. You can start with a small feature, an API, a util or a configuration.

👤 jamescodesthing
Not my current employer.

For jumping into new codebases I stick to the Jetbrains toolbox because it’s usually a consistent enough environment to investigate a new codebase. I also greatly appreciate the indexing.


👤 jmkni
Noda time is very clean/well written IMO -> https://github.com/nodatime/nodatime

👤 morelandjs
Prefect workflow orchestrator: https://github.com/PrefectHQ/prefect


👤 renewiltord
I think `xsv` is easy to read. I have a fork of it for personal use and it was easy to add features to it even though I'm not a rust daily user.

👤 tristor
Honestly, the SQLite codebase is a fantastic read.

👤 winrid
You don't get good at a language by just listening to it all the time. You get good by engaging. Same goes for programming.

Also, a lot of "clean code" stuff can be confusing dogma.

You should try building things you find interesting, and try to build them in a way that "feels correct", and try to emphasize - what if someone else was reading this? What if someone else dived into this codebase to add this feature? Could they?


👤 throwaway9233
- Anything from suckless

- Lua

- Redis

- idtech3

- libuv

- linux kernel

- sqlite

As much as Ruby, Python, and Go tout for being elegant or clean to read, they are pretty horrible to read in the wild. C is where it's at.


👤 samoit
Anything writen in List /scheme

👤 spullara
I really enjoyed working with the Redis codebase. Great, easy to understand C code.

👤 hobabaObama
I wish people mentioned the language of the repo they are sharing, in their posts.

👤 cryptonector
> Do you have tips and tricks you use to read codebases?

#1: If the codebase is huge, you can't read all of it. So you'd best know how to navigate it.

#2: You need an IDE or cscope-like too to navigate a codebase. The codebase is like a web of, say, wikipedia articles, and you're going to have to browse it a lot like how you'd browse wikipedia. Symbols are links!

#3: It helps to understand the big picture. What does this codebase implement? Where are the "entry points" -- where to start reading? What's the architecture? (E.g., Java is a byte-compiled language with a bytecode interpreter known as a JVM.) What's the design look like?

#4: If it's just for fun, well, just browse till you find something interesting, then read it carefully, and go spelunking like it's a wikipedia article.

#5: If you're reading it to debug something, you need to first find the relevant entry points.

#6: If you're reading it to add features, you really need to read the developer docs (if they exist), the internals docs (if they exist), and figure out a lot of things like APIs exported, internal utilities libraries, portability layers, external dependencies, protocols, etc. This will take time, and that's ok. Start with small features, and work your way. You'll build a deeper understanding as you go.

#7: You don't have to understand all that much about the codebase in question, and it might not be possible to if we're talking about a codebase that's in the hundreds of millions of lines of code. You'll have to specialize as you dive deep, and generalize as you wade "near the top".

#8: It can take time to pick up these skills to the point where you can do this quickly. And even then, it can take time to understand a large codebase well enough. There's just a ton of detail that you have to digest into a mental picture that's sufficiently high-level that you can use it productively. So be patient, and keep on going. Just because it's a lot to learn, you shouldn't be discouraged.

To really deal with huge codebases, you have to be a bit like a generalist who can specialize as needed.

For example, if you're reading the OpenJDK, you'll want to understand what Java is, what the JVM is, and so on, though you won't have to understand all of that if you just want to read the OpenJDK implementation of, say, TLS, but you will have to be able to navigate outside that particular bit of the OpenJDK sometimes, but if you tease out code threads far enough, you probably will learn a thing or three about seemingly unrelated things like the GC.

Get comfortable doing these things, and you'll be able to deal with codebases in the millions of lines of code.


👤 ppg677
LevelDB

👤 todotask
My tricks in Go projects could use sqlc to transpile from SQL is a great time saving and minimise error prone, glad to avoid ORM as long as possible and minimal framework. It gets my job done and spent more time on business logic.

Adding on Tailwind, nothing lock you in.


👤 heavyset_go
I was always impressed with Near's emulators, RIP.

👤 hoten
cs.chromium.org is an example of how tooling can drastically help with readability. It's incredibly easy to navigate the codebase.

👤 mirntyfirty
I’m a fan of both SQLite and Postgres

👤 xixixao
Codemirror 6

👤 gebt
git, curl & nginx.