The only problem is that I don't have any experience with collecting that magnitude of data. Does anyone have experience with scraping reddit and can offer pointers on how to approach it? I'm also having a difficult time figuring out if reddit even allows this.
As for the other part, getting the data: it's called scraping. Depending on your experience with scraping, you may need to pay for certain aspects of it (eg, getting a large list of proxies so Reddit does not block you, or using a scraping API). Or maybe your project is small enough (or time constraints large enough) that you can slowly siphon the data via your own means.
As per Reddit allowing it: Refer to the legality of scraping, and apply it to Reddit.