HACKER Q&A
📣 DantesTravel

Have you ever used anti detect browsers for web scraping?


I'm in the web scraping industry for a while and I often spend some time creating my "swiss knife" with Playwright or Selenium in case things get tough. Thanks to a niche substack I'm following, I discovered only today the existence of anti detect browsers like GoLogin and others. From what I see, they seem a good solution for small projects, but difficult to scale in larger ones for costs of licensing and infrastructure (most of them require a windows machine to run). Does any of you guys smarter than me use these browsers on a large scale? How is composed your tech stack?


  👤 jmt_ Accepted Answer ✓
How would you actually use an anti-detect browser programmatically? Would you need to write a custom Selenium driver for it or equivalent for Playwright? Even if the browser is built off something like Chrome, you'd still need a way to interact with the anti-detect related features.

A good trick I discovered is using webkit thru Playwright to bypass fingerprinting and related anti-bot measures. Firefox/Chrome simply leaks too much information, even with various "stealth" modifications. e.g: have been able to reliably scrape a well known companies site that implemented a "state of the art, AI-powered, behavioral analysis, etc" anti-bot product. Using Chrome/Firefox + stealth measures in Playwright did not work - simply switching to Webkit with no further modifications did the trick.

Not exactly what you're asking, but my point is, that with a little time and effort, I've usually been able to find fairly simple holes in most anti-bot measures -- it probably wouldn't be terribly hard (especially since you're versed in scraping) to build-out something similar to what you're looking to achieve without having to pay for sketchy anti-detect browsers.


👤 fxtentacle
I've found that it's almost never needed. Most of the "advanced AI human detection" things are glorified IP reputation systems. So you just need a few IPs that would be way too painful to block, for example US residential IPs, and you're good.

But if you really want to make sure, it's pretty easy to remote-control a cheap Android phone. Plus detection thresholds tend to be much higher on mobile, because filling out a ReCaptcha with a touch screen is just such a horrible user experience.


👤 darkpatterns
Good community called Scraping Enthusiasts on this topic here: https://discord.gg/4fGEPZzs Plus curated list of research papers here if you want to go deep on the subject matter: https://github.com/prescience-data/dark-knowledge

👤 splatzone
The Hero browser is designed for this kind of sneaky scraping, it’s very interesting: https://github.com/ulixee/hero

👤 ffgh
Can you share the substack?

👤 jnk345u8dfg9hjk
this smells like an ad for GoLogin

👤 decide1000
What do you mean with "most of them require a windows machine to run"?

👤 QuadmasterXLII
if you don't want to be detected, run chrome in a vm and move the mouse around with pyuserinput