Benchmarks for models other than LLMs

Question

I have seen some amazing benchmarks used to rank LLMs abilities, it got me thinking are there similar benchmarks for propensity modelling, churn prediction or other types of models?Are there best practices for comparing model performance beyond benchmark data when they may have different underlying datasets?

luke-stanley · Accepted Answer

On PapersWithCode, different datasets have benchmarks: https://paperswithcode.com/datasets
You can also break down by task here: https://paperswithcode.com/sota
For churn, you might go to time series forecasting first: https://paperswithcode.com/task/time-series-forecasting
They have this subtask which is a bit different because it's about novel products rather that continued sales, for example:
https://paperswithcode.com/task/new-product-sales-forecastin...
But you get the idea of how they organise by task. I'm curious about other benchmarks and interfaces too and would like to see others.
I think HuggingFace and Kaggle have some overlap with different tasks that have benchmarks.