HACKER Q&A
📣 Martin_Johnson

How do big companies get all their information?


A friend and I are starting a company that helps people optimize their tv subscriptions, so they are paying the least amount possible for the shows that they want to watch. The problem we are having however is that we can't find a reliable source of data. I know big companies like google have web crawlers, but Netflix will not let smaller companies scrape their sites as far as we can tell. Do you guys have any ideas as to how to gather information about what online tv services have which shows, and then compile that into a database?


  👤 endisneigh Accepted Answer ✓
I can't speak to the titular question, but for Netflix, just scrap. There's not really anything they can do. Get a 5 person subscription, rotate ips and get to work using Playwright/Puppeteer or Selenium.

Personally I wouldn't feel bad morally since you're paying for a subscription and literally just scraping the service and not consuming bandwidth needlessly.


👤 Tomte
You should rethink starting that company. I can think of a half dozen shows I'm interested in, and all of them are exclusive to their streaming service. I don't see that trend changing anytime soon.

👤 runawaybottle
Scrape rottentomatoes? Scrape movie news sites? They will talk about the ‘latest on Netflix’, you got only one keyword to look for really :p

Reddit? Pirate Bay?