HACKER Q&A
📣 caydenm

Benchmarks for models other than LLMs


I have seen some amazing benchmarks used to rank LLMs abilities, it got me thinking are there similar benchmarks for propensity modelling, churn prediction or other types of models?

Are there best practices for comparing model performance beyond benchmark data when they may have different underlying datasets?


  👤 luke-stanley Accepted Answer ✓
On PapersWithCode, different datasets have benchmarks: https://paperswithcode.com/datasets

You can also break down by task here: https://paperswithcode.com/sota

For churn, you might go to time series forecasting first: https://paperswithcode.com/task/time-series-forecasting

They have this subtask which is a bit different because it's about novel products rather that continued sales, for example:

https://paperswithcode.com/task/new-product-sales-forecastin...

But you get the idea of how they organise by task. I'm curious about other benchmarks and interfaces too and would like to see others.

I think HuggingFace and Kaggle have some overlap with different tasks that have benchmarks.