HACKER Q&A
📣 agomez314

How to Measure the Comprehensibility of a Codebase?


I was recently working on a codebase that was probably the most incomprehensible I've ever worked with: build files call shell scripts which call python scripts which call Ansible scripts. Very hard to trace where one repository's responsibility started and another ended. No logging whatsoever except for a few "print" statements. It can't even be made to run locally!

This made me think: are there ways to quantifiably measure how "understandable" or "comprehensible" or "easy to learn" a codebase is? I feel this should have been studied in some way already since having a well-defined codebase enables easier onboarding of developers.


  👤 mlhpdx Accepted Answer ✓
Sorry for your experience. I’m not going to be very helpful, but nonetheless…

This should be a solved problem, but isn’t. Looking at the code alone is only effective if there is some hint as to the context — the runtime environment. And, that only works if the runtime context doesn’t include “wet” components (people doing manual things).

A short shell script may look simple, but is it dependent on running inside a Linux container on CodeBuild with certain storage and cache, permissions and base image? Those are all knowable, and should be discoverable in the codebase, but typically aren’t.

On the other hand, this is a perfect AI/ML problem - “On a scale from 1 to 10”, how f*ked up is this codebase: …”

Maybe with large enough context windows we could get answers. ;)


👤 elviejo
Code complexity In software is measured using:

- lines of code - ciclomatic complexity aka McCabe's complexity (branches) - dependency on external modules - Halstead metrics (volume of code).

Go to Wikipedia for those concepts.