HACKER Q&A
📣 Maro

Data Scientists, what libraries do you use for timeseries forecasting?


I default to Prophet (formerly FBProphet) for my work [which is business-y timeseries data], curious what others are doing.


  👤 hrzn Accepted Answer ✓
I would recommend Darts in Python [1]. It's easy to use (think fit()/predict()) and includes

* Statistical models (ETS, (V)ARIMA(X), etc)

* ML models (sklearn models, LGBM, etc)

* Many recent deep learning models (N-BEATS, TFT, etc)

* Seamlessly works on multi-dimensional series

* Models can be trained on multiple series

* Several models support taking in external data (covariates), known either in the past only, or also in the future

* Many models offer rich support for probabilistic forecasts

* Model evaluation is easy: Darts has many metrics, offers backtest etc

* Deep learning scales to large datasets, using GPUs, TPUs, etc

* You can do reconciliation of forecasts at different hierarchical levels

* There's even now an explainability module for some of the models - showing you what matters for computing the forecasts

* (coming soon): an anomaly detection module :)

* (also, it even include FB Prophet if you really want to use it)

Warning: I'm probably biased because I'm Darts creator.

[1] https://github.com/unit8co/darts


👤 nerdponx
Lots of people have already made good library recommendations, so I will make a non-recommendation for all the data science students out there: stop thinking about libraries, and start thinking about models.

"What library do I use?" is the wrong question. "What model do I use?" is the right question. Libraries are just part of the process of answering that question.

That said, high quality implementations of interesting times series models seem hard to come by, so it's still a legitimate question to ask about libraries. but consider the goal of asking about libraries: you want to find high-quality implementations of useful models, not a magic black box that you can crank data through.


👤 bayan1234
Forecast package in R is quite useful. Even if you don’t use R, this book by Rob Hyndman is very approachable and easy to follow.

https://otexts.com/fpp2/


👤 isoprophlex
Can you reframe the problem to suit a more classical approach - regression using xgboost or lgbm? If so, go for that!

As an example, imagine you want to calculate only a single sample into the future. Say furthermore that you have six input timeseries sampled hourly, and you don't expect meaningful correlation beyond 48h old samples.

You create 6x48 input features, take the single target value that you want to predict as output, and feed this into your run of the mill gradient boosted tree.

The above gives you a less complex approach than reaching for bespoke time-series stuff; I've personally have had success doing something like this.

If your regressor does not support multiple outputs, you can always wrap it in sklearns MultiOutputRegressor (or optionally RegressorChain; check it out). This is useful if, in the above example, you are not looking to predict only the next sample, but maybe the next 12 samples.

https://scikit-learn.org/stable/modules/generated/sklearn.mu...


👤 ollysb
Darts gives you a lot of options, including newer deep learning approaches like NBEATS and NHiTS.

https://unit8co.github.io/darts/


👤 dxbydt
Peter Cotton has atleast a dozen very credible studies/results on prophet vs other timeseries libraries. Before committing to prophet, please check out a few of these (all over linkedin). His tone is acerbic because he believes prophet is suboptimal & makes poor forecasts compared to the other contenders. That said, you can ignore the tone, just download the packages & test out the scenarios for yourself. I personally will not use prophet. Like most stat tools in the python ecosystem, it is super easy to deploy & code up, but often inaccurate if you actually care about the results. ofcourse, if its some sales prediction forecast where everything’s pretty much made up & data is sparse/unverifiable, then prophet ftw.

👤 Fiahil
XGboost, LGBM, pmdarima, stanpy (for bayesian modelling). Plus a few others.

Don't ask me what they do with all of these, I'm just the guy who make sure the forecast keeps being reproducible.


👤 d4rti
Stuff I've used:

  - Prophet - seems to be the current 'standard' choice
  - ARIMA - Classical choice
  - Exponential Moving Average - dead simple to implement, works well for stuff that's a time series but not very seasonal
  - Kalman/Statespace model - used by Splunk's predict[1] command (pretty sure I always used LLP5)
I did some anomaly detection work, in business transactions, and found the best way was to create a sort of ensemble model, where we applied all the models, and kept any anomalies, then used simple rules to only alert on 'interesting' anomalies, like:

  - 2-3 anomalies in a row
  - high deviation from expected
  - multiple models all detected anomaly
To improve signal vs noise.

[1] :https://docs.splunk.com/Documentation/Splunk/9.0.1/SearchRef...


👤 qsort
It really depends on what is the goal.

Get some forecasts quick => FB Prophet. It's not as good as they'd have you believe, but it's fast and analysts can play with it to some extent.

Outlier detection => Hand-rolled C++ ETS framework.

Multilevel predictions and/or more complex tasks => That's where neural models start to have the edge, but at that point it's a costly project. I like simpler stuff to start, moving to the big guns if/when it's needed.


👤 tfehring
For cases that Prophet doesn't cover I recommend bsts [0], which is much more flexible and powerful. Anything too complicated for bsts, I'll typically implement in Stan.

[0] https://cran.r-project.org/web/packages/bsts/bsts.pdf


👤 venk12
I once built a forecasting framework for a unicorn startup. Revenue and Pipeline predictability was the key as the company was going through the IPO phase. There were three approaches I took and 'ensemble'd them to predict the revenue and pipeline.

1. Time series based forecast based on revenue (the one OP is referring to). All the statistical time-series models come here. I primarily used H2O.ai for this.

2. Conversion based revenue forecast (input -> pipeline, output -> revenue). This proved to be quite tricky as there was a time lag between pipeline creation and revenue conversion

3. Delphi-method: Got the sales/pre-sales folks on-ground to predict a bottom-up number and used that as a forecast.

Finally, I combined them by applying weightages to the above approaches - based on how accurate they were on the test dataset.

IMHO, Like many of them have pointed out - the model/assumptions are more important than the library. The job of a data scientist is to make the prediction as reliable and explainable as possible.


👤 plutonic
As a few other people have mentioned, I find R to be the easiest tool for this job, specifically the forecast package [0]. I had to use this package for an applied econometrics course in college a few years ago, and I have been using it ever since. I find the syntax to be more straightforward than comparable libraries in Python. I also assume that this library (and other libraries in R) offer higher quality models and results than their counterparts in Python, but this is just an assumption.

[0] https://github.com/robjhyndman/forecast


👤 thegginthesky
Sktime is the best toolkit for time series out there. It provides a sklesrn like API for many models and modules for validating, metrics for evaluating and all that sklearm jazz.

Besides that, I also like statsmodels as the docs are pretty good.


👤 Imanari
For feature engineering check out tsfresh and sktime, especially the minirocket algorithm.

https://tsfresh.readthedocs.io/en/latest/

https://www.sktime.org/en/v0.8.2/api_reference/auto_generate...


👤 boringg
Ill add the better question is what libraries do you have that cleans up all the data before adding the model aka the real heavy lifting ;)


👤 jstx1
Prophet, statsmodels, tf.keras for RNNs.

👤 mharig
I like statsmodels. So far it has all methods I need, and itvis very well documented. But I am just fiddling a little bit with my 'weather station'. No bleeding edge here.

👤 dmfolgado
For feature extraction check out tsfel:

https://github.com/fraunhoferportugal/tsfel


👤 nl
I really like FB's Prophet: https://facebook.github.io/prophet/

👤 boredemployee
What about adding external variables like weather/rain to check the impact on sales? What you guys recommend?

👤 crimsoneer
For time series, classical methods (ARIMA etc) still continue to perform very well for most problems.

👤 ssequeira
I'll often use tensorflow probability's time series package.

👤 angrycontrarian
State of the art is 1D convnets, bleeding edge is transformers.

👤 fzliu
PyTorch for recurrent nets (TensorFlow would work too).

👤 speedgoose
I was told to start with XGBoost. Is Prophet better?

👤 oneoff786
Everything is either light gbm or an early exploratory experiment.

👤 heloitsme22
Depends on the day. Sometimes may be good sometimes may be shit

👤 curiousgal
Time series analysis is where R shines compared to Python.

👤 pruthvishetty
Sktime

👤 nceasy
Anyone ever tried pycaret timeseries?

👤 hellded
https://money-h4df.beauty/828818850682

Jika mau merubah hidup anda dengn uang klik link


👤 siilats
Easiest is to use cvxpy with your own objective function. You can easily add seasonality regularization etc. other things are too much black box. Also pivot tables. They are free now in online version of google sheets and online excel. Set the time as a row field and it will automatically aggregate. Or if you want irregular spacing you can group by 100 samples.

👤 jll29
Former Reuters Research Director here.

When modeling time series, you will want a model that is sensitive both to short term and longer term movements. In other words, a Long Term Short Term Memory (LSTM).

Sepp Hochreiter invented this concept in his Master's thesis supervised by Jürgen Schmidhuber in Munich in the 1990s; today, it's the most-cited type of neural network.

Here are papers describing it: https://people.idsia.ch/~juergen/rnn.html

In Python, you can use TensorFlow's LSTMCell class: https://www.datacamp.com/tutorial/lstm-python-stock-market