Data Scientists, what libraries do you use for timeseries forecasting?

Question

I default to Prophet (formerly FBProphet) for my work [which is business-y timeseries data], curious what others are doing.

hrzn · Accepted Answer

I would recommend Darts in Python [1]. It's easy to use (think fit()/predict()) and includes
* Statistical models (ETS, (V)ARIMA(X), etc)
* ML models (sklearn models, LGBM, etc)
* Many recent deep learning models (N-BEATS, TFT, etc)
* Seamlessly works on multi-dimensional series
* Models can be trained on multiple series
* Several models support taking in external data (covariates), known either in the past only, or also in the future
* Many models offer rich support for probabilistic forecasts
* Model evaluation is easy: Darts has many metrics, offers backtest etc
* Deep learning scales to large datasets, using GPUs, TPUs, etc
* You can do reconciliation of forecasts at different hierarchical levels
* There's even now an explainability module for some of the models - showing you what matters for computing the forecasts
* (coming soon): an anomaly detection module :)
* (also, it even include FB Prophet if you really want to use it)
Warning: I'm probably biased because I'm Darts creator.
[1] https://github.com/unit8co/darts

nerdponx · Answer

Lots of people have already made good library recommendations, so I will make a non-recommendation for all the data science students out there: stop thinking about libraries, and start thinking about models.
"What library do I use?" is the wrong question. "What model do I use?" is the right question. Libraries are just part of the process of answering that question.
That said, high quality implementations of interesting times series models seem hard to come by, so it's still a legitimate question to ask about libraries. but consider the goal of asking about libraries: you want to find high-quality implementations of useful models, not a magic black box that you can crank data through.

bayan1234 · Answer

Forecast package in R is quite useful. Even if you don&rsquo;t use R, this book by Rob Hyndman is very approachable and easy to follow.https://otexts.com/fpp2/

isoprophlex · Answer

Can you reframe the problem to suit a more classical approach - regression using xgboost or lgbm? If so, go for that!
As an example, imagine you want to calculate only a single sample into the future. Say furthermore that you have six input timeseries sampled hourly, and you don't expect meaningful correlation beyond 48h old samples.
You create 6x48 input features, take the single target value that you want to predict as output, and feed this into your run of the mill gradient boosted tree.
The above gives you a less complex approach than reaching for bespoke time-series stuff; I've personally have had success doing something like this.
If your regressor does not support multiple outputs, you can always wrap it in sklearns MultiOutputRegressor (or optionally RegressorChain; check it out). This is useful if, in the above example, you are not looking to predict only the next sample, but maybe the next 12 samples.
https://scikit-learn.org/stable/modules/generated/sklearn.mu...

ollysb · Answer

Darts gives you a lot of options, including newer deep learning approaches like NBEATS and NHiTS.https://unit8co.github.io/darts/

dxbydt · Answer

Peter Cotton has atleast a dozen very credible studies/results on prophet vs other timeseries libraries. Before committing to prophet, please check out a few of these (all over linkedin). His tone is acerbic because he believes prophet is suboptimal & makes poor forecasts compared to the other contenders. That said, you can ignore the tone, just download the packages & test out the scenarios for yourself. I personally will not use prophet. Like most stat tools in the python ecosystem, it is super easy to deploy & code up, but often inaccurate if you actually care about the results. ofcourse, if its some sales prediction forecast where everything&rsquo;s pretty much made up & data is sparse/unverifiable, then prophet ftw.

Fiahil · Answer

XGboost, LGBM, pmdarima, stanpy (for bayesian modelling). Plus a few others.Don't ask me what they do with all of these, I'm just the guy who make sure the forecast keeps being reproducible.

d4rti · Answer

Stuff I've used: - Prophet - seems to be the current 'standard' choice - ARIMA - Classical choice - Exponential Moving Average - dead simple to implement, works well for stuff that's a time series but not very seasonal - Kalman/Statespace model - used by Splunk's predict[1] command (pretty sure I always used LLP5) I did some anomaly detection work, in business transactions, and found the best way was to create a sort of ensemble model, where we applied all the models, and kept any anomalies, then used simple rules to only alert on 'interesting' anomalies, like: - 2-3 anomalies in a row - high deviation from expected - multiple models all detected anomaly To improve signal vs noise.[1] :https://docs.splunk.com/Documentation/Splunk/9.0.1/SearchRef...

qsort · Answer

It really depends on what is the goal.Get some forecasts quick => FB Prophet. It's not as good as they'd have you believe, but it's fast and analysts can play with it to some extent.Outlier detection => Hand-rolled C++ ETS framework.Multilevel predictions and/or more complex tasks => That's where neural models start to have the edge, but at that point it's a costly project. I like simpler stuff to start, moving to the big guns if/when it's needed.

tfehring · Answer

For cases that Prophet doesn't cover I recommend bsts [0], which is much more flexible and powerful. Anything too complicated for bsts, I'll typically implement in Stan.[0] https://cran.r-project.org/web/packages/bsts/bsts.pdf

venk12 · Answer

I once built a forecasting framework for a unicorn startup. Revenue and Pipeline predictability was the key as the company was going through the IPO phase. There were three approaches I took and 'ensemble'd them to predict the revenue and pipeline.
1. Time series based forecast based on revenue (the one OP is referring to). All the statistical time-series models come here. I primarily used H2O.ai for this.
2. Conversion based revenue forecast (input -> pipeline, output -> revenue). This proved to be quite tricky as there was a time lag between pipeline creation and revenue conversion
3. Delphi-method: Got the sales/pre-sales folks on-ground to predict a bottom-up number and used that as a forecast.
Finally, I combined them by applying weightages to the above approaches - based on how accurate they were on the test dataset.
IMHO, Like many of them have pointed out - the model/assumptions are more important than the library. The job of a data scientist is to make the prediction as reliable and explainable as possible.

plutonic · Answer

As a few other people have mentioned, I find R to be the easiest tool for this job, specifically the forecast package [0]. I had to use this package for an applied econometrics course in college a few years ago, and I have been using it ever since. I find the syntax to be more straightforward than comparable libraries in Python. I also assume that this library (and other libraries in R) offer higher quality models and results than their counterparts in Python, but this is just an assumption.[0] https://github.com/robjhyndman/forecast

thegginthesky · Answer

Sktime is the best toolkit for time series out there. It provides a sklesrn like API for many models and modules for validating, metrics for evaluating and all that sklearm jazz.Besides that, I also like statsmodels as the docs are pretty good.

Imanari · Answer

For feature engineering check out tsfresh and sktime, especially the minirocket algorithm.https://tsfresh.readthedocs.io/en/latest/https://www.sktime.org/en/v0.8.2/api_reference/auto_generate...

boringg · Answer

Ill add the better question is what libraries do you have that cleans up all the data before adding the model aka the real heavy lifting ;)

brylie · Answer

Prophethttps://facebook.github.io/prophet/

jstx1 · Answer

Prophet, statsmodels, tf.keras for RNNs.

mharig · Answer

I like statsmodels. So far it has all methods I need, and itvis very well documented. But I am just fiddling a little bit with my 'weather station'. No bleeding edge here.

dmfolgado · Answer

For feature extraction check out tsfel:https://github.com/fraunhoferportugal/tsfel

nl · Answer

I really like FB's Prophet: https://facebook.github.io/prophet/

boredemployee · Answer

What about adding external variables like weather/rain to check the impact on sales? What you guys recommend?

crimsoneer · Answer

For time series, classical methods (ARIMA etc) still continue to perform very well for most problems.

ssequeira · Answer

I'll often use tensorflow probability's time series package.

angrycontrarian · Answer

State of the art is 1D convnets, bleeding edge is transformers.

fzliu · Answer

PyTorch for recurrent nets (TensorFlow would work too).

speedgoose · Answer

I was told to start with XGBoost. Is Prophet better?

oneoff786 · Answer

Everything is either light gbm or an early exploratory experiment.

heloitsme22 · Answer

Depends on the day. Sometimes may be good sometimes may be shit

curiousgal · Answer

Time series analysis is where R shines compared to Python.

pruthvishetty · Answer

Sktime

nceasy · Answer

Anyone ever tried pycaret timeseries?

hellded · Answer

https://money-h4df.beauty/828818850682Jika mau merubah hidup anda dengn uang klik link

siilats · Answer

Easiest is to use cvxpy with your own objective function. You can easily add seasonality regularization etc. other things are too much black box. Also pivot tables. They are free now in online version of google sheets and online excel. Set the time as a row field and it will automatically aggregate. Or if you want irregular spacing you can group by 100 samples.

jll29 · Answer

Former Reuters Research Director here.When modeling time series, you will want a model that is sensitive both to short term and longer term movements. In other words, a Long Term Short Term Memory (LSTM).Sepp Hochreiter invented this concept in his Master's thesis supervised by J&uuml;rgen Schmidhuber in Munich in the 1990s; today, it's the most-cited type of neural network.Here are papers describing it: https://people.idsia.ch/~juergen/rnn.htmlIn Python, you can use TensorFlow's LSTMCell class: https://www.datacamp.com/tutorial/lstm-python-stock-market

Data Scientists, what libraries do you use for timeseries forecasting?

I default to Prophet (formerly FBProphet) for my work [which is business-y timeseries data], curious what others are doing.

Forecast package in R is quite useful. Even if you don’t use R, this book by Rob Hyndman is very approachable and easy to follow.
https://otexts.com/fpp2/

Darts gives you a lot of options, including newer deep learning approaches like NBEATS and NHiTS.
https://unit8co.github.io/darts/

XGboost, LGBM, pmdarima, stanpy (for bayesian modelling). Plus a few others.
Don't ask me what they do with all of these, I'm just the guy who make sure the forecast keeps being reproducible.

For cases that Prophet doesn't cover I recommend bsts [0], which is much more flexible and powerful. Anything too complicated for bsts, I'll typically implement in Stan.
[0] https://cran.r-project.org/web/packages/bsts/bsts.pdf

Sktime is the best toolkit for time series out there. It provides a sklesrn like API for many models and modules for validating, metrics for evaluating and all that sklearm jazz.
Besides that, I also like statsmodels as the docs are pretty good.

For feature engineering check out tsfresh and sktime, especially the minirocket algorithm.
https://tsfresh.readthedocs.io/en/latest/
https://www.sktime.org/en/v0.8.2/api_reference/auto_generate...