HACKER Q&A
📣 hacky_n00b

Where to get (free) trading data?


Looking for trades data in a file (or files) as opposed to an API, which I can parse and train an ML model on. Any country/exchange.


  👤 Jugglerofworlds Accepted Answer ✓
I went down this path about a month ago. Don't expect to find any good quality data on stocks for free. If you do need data on stocks I recommend IEX, but expect to pay a good amount of money for any sizable amount of data. I paid around $100 to scrape the historical daily data for the Russell 2000. Getting intraday data for any sizable time period would cost an astronomical amount of money due to the way IEX charges for their data.

For forex, the situation is a bit better - you can get information from Dukascopy/Tickstory for free. For cryptocurrency there is data available from Binance.

There is also QuantConnect, which is an online IDE/system for developing manually coded trade bots. They have historical data for a wide range of financial products and it's all available for free. The catch is that the data can't leave their system, which eliminates the possibility of training any sort of advanced machine learning model.

Edit: for these types of questions I would recommend searching /r/algotrading


👤 rokobobo
Probably the first place to look would be CRSP (crsp.org) and WRDS (wrds-www.wharton.upenn.edu). You would need a .edu email to set up an account and get the data. (If you don't have one, hopefully you can find a buddy who does.)

In terms of pulling price history on-demand, Yahoo Finance seems to be a popular choice, but I don't think it will allow you to download a large enough dataset for meaningful training--that said, I haven't tried it, so I invite anyone to share their experience.

Additionally, you may want to try searching for the most cited papers for "ML in trading" and see what kind of datasets they use. Be prepared for a lot of gruntwork formatting the data before you feed it into your model: those papers might also give you a lot of context on how to do this gruntwork right from the first time.



👤 _ah
There's some good historical data on SimFin (simfin.com), but it's missing the most recent data unless you pay. I suppose you could work on training your models with the free data dump, do some back-testing, and then pay for the latest stuff if you have a viable model. SimFin is nice because they offer fundamental data (quarterly reports) and not just price data.

👤 david-gpu
Alpha advantage offers a free API to query some data. Fetching it and saving it to a CSV is not rocket science. If that is difficult, you may find that training n ML system is pretty hard, too.

👤 jklein11
you get what you pay for