HACKER Q&A
📣 hnthrowaway1099

What are the funnest data sets you’ve played with?


Trying to battle test our startup product to help users prepare data for data science, looking for some interesting data sets to play with. Ideally something publically downloadable, non controversial and under 1 GB.


  👤 gennarro Accepted Answer ✓
US auto data is a lot of fun. Useful for all sorts of projects.

Data: https://www.transportation.gov/developer Example: https://transportation.report/

Well under 1gb if you focus on models and specs, not safety!


👤 giraffe_lady
Destiny 2 has an api that a lot of great third-party organizational tools are built around. But you can also download a sqlite file with all the set game info in it: weapons, abilities, perks, items, upgrades etc. I haven't messed with it in a while but I remember it being pretty small, one or two hundred mb.

It's highly normalized, to the point where it goes backwards and starts to actually interfere with human usability. Not sure if that helps your goal or hinders it. I used to use it for teaching both sql and the pros/cons/usefulness of the normal forms.


👤 vital_beach
Wahpedia has a pretty unusual Warhammer 40k dataset

general overview: https://wahapedia.ru/wh40k9ed/the-rules/data-export/

spreadsheet with dataset links: https://wahapedia.ru/wh40k9ed/Export%20Data%20Specs.xlsx


👤 connordoner
I took a list of UK Parliamentary constituencies and took a while to match each with their International Territorial Levels (the post-Brexit version of NUTS).

It’s here as a CSV if you’d like it: https://gist.github.com/connordoner/9cda1857b8fff5b8e042013d.... There’s no license attached so do as you wish.


👤 chunkyks
Early in covid, I played with a famous 65-year-venerated dataset from where I work. Found some interesting minutiae and ended up on the front page of the wsj over it.

https://github.com/RANDCorporation/milliondigits

It's probably not what you're looking for, but it's my own favorite dataset.


👤 trynewideas
I learned GIS from John Mechalas's effort to georeference and create a base world map for the Pathfinder Roleplaying Game: https://dungeonetics.com/golarion-geography/index.html

👤 hnthrowaway1099
Kaggle has a fair amount of datasets

https://www.kaggle.com/datasets?fileType=csv


👤 mhh__
Not super exciting but tire data# is both hard to model and very expensive.

# Data from spinning a tire while crushing it into a "road" at an angle