For e.g. some folks say "readability" and "understandability", but how do you define that and how do you think that would translate to a quantitative metric (if possible)?
I'd also be looking at density of interesting ideas(As per the "An engineer's job is to solve a problem with a minimum of new ideas" metric). If I see a lot of novelty then I would start questions whether the codebase was being treated like a sandbox to try out random stuff.
Number of obsolete technologies used would be a big one. If there's 50 things about to be no longer supported, there's a maintenance nightmare waiting to happen.
I'm assuming the black box behavior of both is truly identical in all cases, but in the real world it probably wouldn't be and edge case handling and unnecessary disk writes and such would be a high priority for me.
A common metric is cyclomatic complexity, for which lower is usually better (but you'll find exceptions as soon as you enforce upper bounds in a large project). There are also metrics for modularity and cohesion that might be useful, tho I think it depends on the size of the codebase you're comparing.
I would say smaller executable would tend to be better too, but for short programs you might not see a ton of difference. Variable names/comments that match the intent and semantics... Good luck automating that without replacing humans as programmers. Tho for well defined requirements maybe a language model could correlate them somewhat.
1 - Safety, does it handle errors, etc.
2 - Readability. Did the author make it easy for me to read. Includes comments.
3 - LOC. The shorter the better. Too many people overengineer things. YAGNI.
4 - Dependencies. Are they reasonable? Too many is code smell, to me.
5 - Performance, if applicable.
They all make sense to a point... but can also be gamed and misused once they start being tracked.
Writing good code is often about making good trade-offs depending on the requirements. For example, using a 3rd party library might be quicker and make your code more concise, but there are security, extensibility and maintenance concerns that typically need to be balanced when doing so. And how you balance those will largely depend on your requirements. For example for security-critical code you might not want to use 3rd party libraries at all.
There are also times when I've written awful code on purpose. If I give a developer a task in which the code they're writing will be thrown away then I'd probably want them to prioritise writing that code quickly rather than worrying too much about how clean and maintainable the code is.
I think what you're asking is kind of like asking, "how do I grade writing quality"? Well what kind of text are grading? Is it a kids book? A shopping list? A textbook?
Just a thought anyway, I think I might be overcomplicating things to be honest.
Imagine, you (co)owner of business, planning make sells on black Friday, sure you are not interested on coverage/traceability/etc, you need to proceed client requests, the more share of success, better, but anyway, 10% better than 0%, even if 0% code have 50% coverage and 10% code have 0% coverage.
2. Readme file. Buildme file. Can I understand what the repo is about just by reading the readme file? Good. Can I run the repo locally just by following the Buildme file? Good
3. Data examples alongside code
The first thing I would do is look at usage of global state. Are there data objects used from the global scope? Are dependencies imported and then used directly from the global scope rather than being passed to a constructor? Are there any mutable singletons?
The second thing I would do is look at constructors. Are constructors calling a lot of functions (implication of global state usage), are they doing much besides taking arguments and storing them as object state? Do objects require an initialization method be called?
The third thing I would do is look for "law of Demeter" violations (https://en.wikipedia.org/wiki/Law_of_Demeter):
Explicit violation:
doesSomething.callSomethingElse().thatCallsSomethingElse().thatCallsEvenAnotherThing()
Not any less of a violation but looks better:
a = doesSomething.CallSomethingElse()
b = a.thatCallsSomethingElse()
c = b.thatCallsEvenAnotherThing()
Next I would make percentiles of lines of code per scope. How many lines of code does the average function have? How many lines of code does the average class have? As well as look at un scoped functions (implies global state usage) and the outliers for lines per scope (what does the class with the most lines of code in scope look like, what does the longest funciton look like)?I would probably keep raw counts of the number of functions, classes, imports, if's, and loops themselves.
Obviously anywhere you see lines of code, it might make more sense to look at number of ifs and loops since that is probably a more accurate measure of complexity.
I would definitely (and actually probably first) look at the database tables/how they are represented in the code.
More qualitatively, I would look at what kind of logging and time series data are exported.
I would look at how exceptions are handled in the main loop.
I would look at separation of business logic from server logic.
I would look for strong layers (business logic probably shouldn't be intermingled with presentation logic).
I would probably grep a sample of TODOs and grep a sample of comments.
I would look at test coverage and test implementation (unit and integration).
I would look at test run time.
I would look at build time.
I would try to look at a dependency graph.
I might look at git blame to see how many people edit how many different files.
I would look for bazel/build logic.
I would look at the build script itself to see how assets are generated and stored.