HACKER Q&A
📣 boringg

Does anyone still use Random Forest models?


It feels like they were hugely popular 5 years ago and have fallen off with the rise of neural nets. Curious to see if they are still in use.


  👤 PaulHoule Accepted Answer ✓
On and off I've trained logistic regression models to sort texts in order of "relevance". That could be submissions to HN, job listings, etc.

I like logit because it can be easily calibrated so I can tune up for 90% precision and furthermore understand the trade-off I make between precision and recall. I often have done "shoot-outs" with other algorithms in scikit-learn.

Random Forests would have worked for my application, but they don't calibrate very well:

https://scikit-learn.org/stable/auto_examples/calibration/pl...

Personally I think calibration is the difference between "finished near the top of the pack on Kaggle" and "coupled it to a Kelly Better and made big $$$".


👤 quantumofalpha
Still pretty useful in high noise/signal regime. Though at my last two gigs in this area they were kind of obsoleted in favor of generalized additive models which are more interpretable, robust and have more powerful fitters e.g. modern boosted (xgboost/lightgbm/whatever), bagged, depth 1-2 tree ensembles then basically fit curves to scatterplots.

More generally in industry the bottleneck is usually data, not models.


👤 tgflynn
I don't have up to date information but I think the answer is probably yes. Deep learning works well when you have huge amounts of data and computational resources but for smaller scale problems other models, including Random Forests seem to perform better. At least that was my understanding several years ago when I was participating in Kaggle competitions, but I would be surprised if it has changed very much (unless the problem is such that some existing DL model can be adapted to it).

I think kaggle.com has quite a few resources on machine learning so you might try looking around there for better information.


👤 wizwit999
Saw it used at a faang.