For instance, the same article might get submitted 5 times to the same web site and get 0, 0, 9, 35, and 172 votes. That’s the output variable. The input variables are derived from the text of the article, these could be “bag of words” features, the output of a BERT-like embedding, or ideally, a fine-tuned model that outputs a number. I’ve got models working that treat it as a classification model and predict if an article crosses a threshold, and those definitely “work” but I get the feeling they are throwing information away. On the other hand, the L2-norm and even the L1-norm don’t seem appropriate here because of the nosiness and large range of the output variable.
Is there a better way to do this?