Given two repos of code, what criteria would you use to grade quality?

Question

I'm curious, given two repos containing code that accomplish the same goal and produce the same correct results per requirements, what quantitive and qualitative criteria would you use to grade reach repo and the engineer's skills that produced them?For e.g. some folks say "readability" and "understandability", but how do you define that and how do you think that would translate to a quantitative metric (if possible)?

eternityforest · Accepted Answer

Time from zero to being able to make a change and compile it would probably be the most objective metric I'd be looking at.
I'd also be looking at density of interesting ideas(As per the "An engineer's job is to solve a problem with a minimum of new ideas" metric). If I see a lot of novelty then I would start questions whether the codebase was being treated like a sandbox to try out random stuff.
Number of obsolete technologies used would be a big one. If there's 50 things about to be no longer supported, there's a maintenance nightmare waiting to happen.
I'm assuming the black box behavior of both is truly identical in all cases, but in the real world it probably wouldn't be and edge case handling and unnecessary disk writes and such would be a high priority for me.

ZugZug2 · Answer

Shorter code, but there's a lower bound at which things get cryptic and ridiculous. You'd want to distinguish between code golf and regular code.
A common metric is cyclomatic complexity, for which lower is usually better (but you'll find exceptions as soon as you enforce upper bounds in a large project). There are also metrics for modularity and cohesion that might be useful, tho I think it depends on the size of the codebase you're comparing.
I would say smaller executable would tend to be better too, but for short programs you might not see a ton of difference. Variable names/comments that match the intent and semantics... Good luck automating that without replacing humans as programmers. Tho for well defined requirements maybe a language model could correlate them somewhat.

solomatov · Answer

Also, if you have time, implement the same small feature in both codebases. You will quickly understand which code is easier to work with.

silisili · Answer

Not in any particular order1 - Safety, does it handle errors, etc.2 - Readability. Did the author make it easy for me to read. Includes comments.3 - LOC. The shorter the better. Too many people overengineer things. YAGNI.4 - Dependencies. Are they reasonable? Too many is code smell, to me.5 - Performance, if applicable.

tacostakohashi · Answer

There are lots of metrics / static analysis tools for this like cyclometric complexity, coverity, sonarqube, findbugs, etc.They all make sense to a point... but can also be gamed and misused once they start being tracked.

kypro · Answer

Given the complexity involved in grading code quality I'm not convinced you should be using quantitative measures.
Writing good code is often about making good trade-offs depending on the requirements. For example, using a 3rd party library might be quicker and make your code more concise, but there are security, extensibility and maintenance concerns that typically need to be balanced when doing so. And how you balance those will largely depend on your requirements. For example for security-critical code you might not want to use 3rd party libraries at all.
There are also times when I've written awful code on purpose. If I give a developer a task in which the code they're writing will be thrown away then I'd probably want them to prioritise writing that code quickly rather than worrying too much about how clean and maintainable the code is.
I think what you're asking is kind of like asking, "how do I grade writing quality"? Well what kind of text are grading? Is it a kids book? A shopping list? A textbook?
Just a thought anyway, I think I might be overcomplicating things to be honest.

simne · Answer

Business value for exact now and to how much it could improved in 2 working days if first is zero.Imagine, you (co)owner of business, planning make sells on black Friday, sure you are not interested on coverage/traceability/etc, you need to proceed client requests, the more share of success, better, but anyway, 10% better than 0%, even if 0% code have 50% coverage and 10% code have 0% coverage.

dakiol · Answer

1. Directory structure. A thousand files within one folder? Bad (no matter how good the code is). Folder X (very low level stuff like operating with bits and bytes) at the same tree level than folder Y (critical business logic in terms of high level domain objects)? Bad.
2. Readme file. Buildme file. Can I understand what the repo is about just by reading the readme file? Good. Can I run the repo locally just by following the Buildme file? Good
3. Data examples alongside code

throwaway0asd · Answer

I am a big fan of forced simplicity. In practice that means (dramatically) less code, less abstraction, single paradigm, less choices. People tend to find the code extremely readable and easy to follow but yet somehow exotically foreign and thus grade it poorly. To me that sounds like selection bias. So I have stopped worrying how people might grade my code and instead attempt to prove everything with numbers, such as execution speed or time to add new features.

bjourne · Answer

I would download both repos and get them running. If one repo takes me longer than the other to build and run due to non-existent or poor README file, out of date or esoteric dependencies, badly written build scripts, etc, then I would prefer the other one. Bonus points to the repo that is truly cross-platform because getting the same codebase to build on Windows, OSX, and Linux can be a significant challenge.

hayst4ck · Answer

Here is a good start: http://misko.hevery.com/code-reviewers-guide/

The first thing I would do is look at usage of global state. Are there data objects used from the global scope? Are dependencies imported and then used directly from the global scope rather than being passed to a constructor? Are there any mutable singletons?

The second thing I would do is look at constructors. Are constructors calling a lot of functions (implication of global state usage), are they doing much besides taking arguments and storing them as object state? Do objects require an initialization method be called?

The third thing I would do is look for "law of Demeter" violations (https://en.wikipedia.org/wiki/Law_of_Demeter):

  Explicit violation:
  doesSomething.callSomethingElse().thatCallsSomethingElse().thatCallsEvenAnotherThing()
  
  Not any less of a violation but looks better:
  a = doesSomething.CallSomethingElse()
  b = a.thatCallsSomethingElse()
  c = b.thatCallsEvenAnotherThing()

Next I would make percentiles of lines of code per scope. How many lines of code does the average function have? How many lines of code does the average class have? As well as look at un scoped functions (implies global state usage) and the outliers for lines per scope (what does the class with the most lines of code in scope look like, what does the longest funciton look like)?

I would probably keep raw counts of the number of functions, classes, imports, if's, and loops themselves.

Obviously anywhere you see lines of code, it might make more sense to look at number of ifs and loops since that is probably a more accurate measure of complexity.

I would definitely (and actually probably first) look at the database tables/how they are represented in the code.

More qualitatively, I would look at what kind of logging and time series data are exported.

I would look at how exceptions are handled in the main loop.

I would look at separation of business logic from server logic.

I would look for strong layers (business logic probably shouldn't be intermingled with presentation logic).

I would probably grep a sample of TODOs and grep a sample of comments.

I would look at test coverage and test implementation (unit and integration).

I would look at test run time.

I would look at build time.

I would try to look at a dependency graph.

I might look at git blame to see how many people edit how many different files.

I would look for bazel/build logic.

I would look at the build script itself to see how assets are generated and stored.

karmakaze · Answer

Code size per function point. And defect rate or time to add new funtionality.

fawazali · Answer

> clean code (human readable) > time and space efficiency > faster execution of the objective > less pings, memory space or call backs

solomatov · Answer

Total size + how many units are affected by fixes for found issues. The smaller the size the better. The less coupled code the better.

jstx1 · Answer

Qualitative - how easy it is to make changes.

dossy · Answer

Most algorithms are judged on time (runtime performance) and space (memory requirements). That's a good place to start.

giantg2 · Answer

1. Does it work as intended 2. Is it secure

acranox · Answer

Comments and commit messages.