This made me think: are there ways to quantifiably measure how "understandable" or "comprehensible" or "easy to learn" a codebase is? I feel this should have been studied in some way already since having a well-defined codebase enables easier onboarding of developers.
This should be a solved problem, but isn’t. Looking at the code alone is only effective if there is some hint as to the context — the runtime environment. And, that only works if the runtime context doesn’t include “wet” components (people doing manual things).
A short shell script may look simple, but is it dependent on running inside a Linux container on CodeBuild with certain storage and cache, permissions and base image? Those are all knowable, and should be discoverable in the codebase, but typically aren’t.
On the other hand, this is a perfect AI/ML problem - “On a scale from 1 to 10”, how f*ked up is this codebase: …”
Maybe with large enough context windows we could get answers. ;)
- lines of code - ciclomatic complexity aka McCabe's complexity (branches) - dependency on external modules - Halstead metrics (volume of code).
Go to Wikipedia for those concepts.