HACKER Q&A
📣 dennisy

Why is there no adopted standard for web scraping?


I feel many sites benefit from scraping, or are happy to allow it within certain bounds. My question is why have no standards emerged around how data is structured and accessed for web scraping?

I am aware of standards such as Open Graph - is this the closest we have gotten to a standard of machine accessible data on the web?


  👤 edent Accepted Answer ✓
Schema.org is probably the closest thing.

But, if people want to make their data available, an API is probably the best way.


👤 friend-monoid
I few like there has been a lot of attempts at this; the semantic web, html5 semantic tags (article, header, ...), stuff like robots.txt. Developers really like to generalise.

Getting everyone to cooperate is the hard part.