HACKER Q&A
📣 jerdthenerd

Reddit API vs. Browser Requests


I have been following the Reddit API saga quite closely, and I understand how/why Reddit as a company has incentive to effectively take 3rd Party Apps off the market.

My question is, what is stopping someone from simply writing a web scraper that acts as if its a web browser and scrapes the actual subreddit(via reddit.com not api.reddit.com) and stores them in a local cache? I'm picturing an app that runs on a popular NAS software such as TrueNas, Synology, etc. So storage is not an issue.

Is there a way for Reddit to detect that this isn't authentic traffic from an actual user? If the web scraper authenticates as a normal user, and respects the request throttling, wouldn't it just fly under the radar as a particularly addicted user?


  👤 alexdanilowicz Accepted Answer ✓
I imagine it would be pretty obvious from an engagement metrics perspective how a regular user acts (scrolling, stopping to read, upvoting) vs a robot.

Not to mention the sheer amount of content you'd have to scrape, which would definitely surpass "normal" user engagement.


👤 harrelchris
Scraping will only enable reading from Reddit. To write to Reddit or to read/write private user data, you would need to automate a browser and handle user credentials in plaintext.