HACKER Q&A
📣 kisamoto

Web-scraping – Do patterns/recipes exist for common scraping targets?


I'm fairly familiar with web scraping/crawling however I was wondering if there is a company/tool that has re-usable modules for scraping common websites?

Examples could include: scraping article texts from news websites; extracting recipes from Good Food etc.

Rather than rewriting what others have - is there an existing library of these scrapers/crawlers to use 'out of the box'?


  👤 abarrettwilsdon Accepted Answer ✓
Not exactly what you're looking for, but there's a OSS Chrome Extension that allows you to record your actions in browser and transcribes them into Nightmare.js code:

https://github.com/segmentio/daydream

Probably the best you're going to get - most things worth scraping are worth money, and as such are not freely available


👤 mendelevium
For extracting news articles: https://newspaper.readthedocs.io/en/latest/

👤 atmosx

👤 thedevindevops
What you're looking for sounds very open ended but the closest thing I can think of is the Huginn project on github?

👤 kilroy123
This is sorely needed IMO.