I am trying to create a tool for myself that can crawl a few websites that I usually go on to compare the price of the same item. They have no APIs.
A couple questions:
1. is this LEGAL? 2. if I am crawling, what is the best way to approach this? does each website's crawling mechanism have to be manually written since they are unique or is there some strategy for scale if i need to expand the number of sites I crawl through in the future?
Thank you!
-F75
A simple way would be a headless browser [1]
But there are also hosted tools that work like a website builder.
The best way is: keep it simple and keep back (check once an hour or day and not every minute).
Many shops use Schema.org markup. So if they support it, you don’t have to write it for every site.
You could also use a library that works with raw html and css. Then you could just use css selectors for extraction.
[1] https://www.atlantbh.com/building-a-dynamic-crawler-with-pup...