But I'm having a hard time justifying karmas on HN. I would rather see the number of upvotes/downvotes to find out whether a comment is generally accurate and correct or not (or whether the comment conforms to most people's idealogy).
To make me feel more important than other people, of course. When you suck at everything else in life, at least there's the HN karma score showing you aren't totally worthless.
Comments and stories “in the spirit of HN” tend to accumulate points.
Comments and stories antithetical to community norms tend not to.
To me, using visible upvotes and downvotes would no more be judging comments and stories on their own merit than using user karma. Both are external signals. Votes integrate over short term. Karma over long terms.
YMMV.
Few things about public comment karma:
Expanded karma may glimpse into divisiveness (e.g. +50/-50) but also encourages it, by definition. Downvotes on HN, as I understand it, are only for bad manners or the obvious wrongness, even if throwed subjectively. In these terms, +50/-50 basically means the forum is dead as we know it. There’s no reason to show downvotes because if there’s so many then a comment should have been flagged long before. If there’s divisiveness, we should just discuss (or poll) it, that’s what the forum is for.
Otherwise, visible comment score doesn’t feel too bad. But hiding it is a good thing, imo. This makes the content less stackoverflow-y, and this place is not a Q&A forum either.
Also, voting ranks comments. As such this is a means of shifting priority based upon agreement. When a negative vote is cast it becomes one of two means to enforce a heckler's veto. The other form of heckler's veto is by click on the link to "flag" a comment or submission as after an item is flagged 4 times it is hidden from everybody. Notice that those two mechanisms are completely unrelated in their functionality even though the results are similar.
The problem with voting mechanisms is abuse. They are extremely good within small communities, but once a venue grows past a certain size it becomes a means for insecure people to impose echo chambers.
Using Reddit, I usually just scan the page looking for the comments with high points, which usually sways me into thinking they're the best comments
Personally I prefer HN's system, as the number of people that clicked ^ doesn't really matter, it's the content of the comment that matters...
Also, this was discussed thoroughly [1] here (12 years ago)
Anonymous web surfers can't be forced into providing correct commentary. Even if they knew enough to be correct, humans will say inaccurate things just because..
Filtering algorithms like a ChatGPT gatekeeper defeat the point of recieving comments from humans. Just talk to ChatGPT.
I suggest, it's difficult to justify karma because it carries little meaning. It has a mob justice function to it, that keeps thousands of commenters from making a big mess.
Works as intended, though the reward for division and counter-argument behaviour, is maybe a little high.
But also, keep in mind that limiting downvotes to max -4 is how HN is able to maintain quality in many ways.
Karma points on HN are a psychological gamification to encourage posters to write good posts and avoid bad posts with concrete quantized peer feedback.
People like getting points. Currency itself is an ancient precursor to such motivation-driving/measuring point systems. Grades are another from the late 1800s. Experience points are another from gaming in the 1970s. Points give a person a sense of power, progress and control and more subtly may be interpreted by the owner to bolster some piece of their identity (as smart, helpful, knowledgeable, right, etc).
Quantization probably also helps with implicit or explicit goals, being a good chunk of what you need for successful/SMART human goal setting (Specific, Measurable, Achievable, Relevant, Timebound) or AI goal setting, basically efficient feedback loops in general.
Karma points systems on forums are also a form of distributed moderation that helps the forum to scale quality discussions with reduced centralized moderator activity. It was first used widely on an internet forum called Slashdot in the late 1990s. Using karma techniques, Slashdot was able to host relatively meaningful distributed discussions that exceeded the posting volumes of previous online forums such as Usenet, where the largest forums there only handled 3000 comments a day (alt.fan.rush-limbaugh... if I remember my analysis correctly). Usenet had more centralized human moderation and had trouble scaling quality conversations as a result. A good moderator encourages good conversation and cuts off the bad stuff few want to waste time reading. But there are only so many hours in the day to moderate and only so much patience and tolerance for complaints that come with the task. One human or two or three can only give so much feedback. (Until we get LLM/AI moderators trained on all these point histories! Sigh.)
HN, like other sites uses a blend of high-effort centralized human moderator feedback and low-effort peer karma feedback (and high-effort peer replies/posts). The choice of exactly what powers someone with points can get, and whether point totals or history can be seen by whom or are capped or attributable per-topic (stack overflow, kinda) are all forum design decisions that can affect human motivation and/or the potential for abuse.
Some sites show you up votes/downvotes (like YouTube videos, Amazon feedback) so, as you point out, you see the "volatility"/controversy-ness of a post more than you get with a single karma number on a post as on HN.
I don't have firsthand knowledge of HN's motivation for the approach (comments, dang?) but since HN has stated they want the forum to stimulate and reward curiosity, having metrics that help posters more clearly measure their posts' controversy levels may be perceived by HN as potentially doing more harm than providing benefit.
If a design choice is perceived as encouraging or helping trolls measure their success at creating controversial posts, or more precisely measuring the effect of their downgrade brigade, it's not a good thing.
Or it may just be a path dependent arbitrary early design/implementation decision and the risk of change outweighs the benefit of tweaking it.
Don't know, sorry, but hope this gives you food for thought if you don't/can't get an official/authoritative answer from HN and are looking for bystander educated guesses.