My system requires a fairly high volume (1000's of articles) to train a machine learning model so I have been focusing on high volume feeds like preprints from the arXiv, news from the Guardian, etc.
Now Superfeedr charges 10ยข per feed per month, my bill right now is $3 a month. It's reasonable that I can subscribe to about 100 feeds on Superfeedr but subscribing to 1000 feeds seems pricey to me.
There are a lot of independent blogs out there that publish an article every week or every six months. What they all have in common is that somebody has to poll an RSS file many many many times per article ingested. W/ superfeedr it makes for high costs but it is a hassle even if I built my own RSS ingest system.
One thing that would help would be consolidated RSS feeds that aggregate posts from a large number of blogs. Are there good ones out there? Are there other answers to problem of polling hundreds or thousands of independent blogs?
https://en.wikipedia.org/wiki/Planet_(software)
You may want to be careful/cautious about what exactly from RSS feeds you train an ML model on. If you are planning to commercialize your ML model that may be a direct infringement on CC licenses many blogs provide their content as, especially to RSS and aggregators like "Planets". (This applies to using feed content you've already aggregated from Superfeedr as well.) Please use ML responsibly and ethically.