HACKER Q&A
📣 spronket_news

Alternatives to Common Crawl?


Alternatives to Common Crawl?


  👤 spronket_news Accepted Answer ✓
I'm trying to use common crawl for an ML project/search engine, but:

- Requests to download even a small amount of data get rate ACLed (it says slow down/too many requests) - It seems like this is a known issue and that common crawl is no longer well maintained. https://groups.google.com/g/common-crawl/c/BvMGYUY-dro

Are there any alternatives for accessing a large amount of web crawl data?

Thanks!


👤 ccgreg
Please check out the blog post at:

https://commoncrawl.org/blog/oct-nov-2023-performance-issues

And the new status website at:

https://status.commoncrawl.org/