HACKER Q&A
📣 mstipetic

Why are we expected to train Google's NN for free through captchas?


I keep seeing the captchas more and more whose obvious purpose is to train their self-driving NN, and they're getting out of hand. Sometimes I have to work through 5+ images. How is that legal that they can just interrupt me on a whim and ask for me to do work for them? Is there an alternative solution available?


  👤 Crash0v3rid3 Accepted Answer ✓
Take a step back and consider the owners of the websites you are visiting. They need a way to filter out spammers and decided to use captchas, which is their choice to do so.

As a consumer, it’s your choice to not support them if you feel it has become too burdensome to fill out the captcha.

The legality of this shouldn’t even be a question. No one is forcing you to use these sites.


👤 jsnell
If you think about it a bit, it should be obvious that nothing is being trained on your results. You're not being exploited for training data, because computers are already better at identifying traffic lights from street view images than you are.

The image recognition task has not changed for a very long time. How many captchas do you think have been solved by users in the last 5 years, all the same kind of recognition problems on the same kind of data? Trillions? It would be way past any kind of diminishing returns for this task to get more labels. If there was value there, they would have at least switched to a different kind of task.

In addition to that they're trying to get sites to move to Recaptcha v3, which does not have the user solve any kind of. Why would that be if the answers had any kind of value?

The fundamental problem with using captchas for training is that it devalues their value as a tool for security, since it sets up conflicting incentives. And the latter is what Google is selling. (Yes, selling. Recaptcha Enterprise costs money to use at scale, and the list price seems to be $1 / 1000 calls). The people paying for that service are using it to block attackers and let good users through, not to challenge the good users just to get some more labeling data from them.


👤 2pEXgD0fZ5cF
A relevant blog post titled "You (probably) don’t need ReCAPTCHA" [1] was shared recently.

[1]: https://nearcyan.com/you-probably-dont-need-recaptcha/


👤 kasey_junk
They provide an otherwise complicated service for free to the owner of the page you are visiting.

Blame the pages you go to.


👤 auslegung
I can’t stand CAPTCHAs. I assumed I’ve been seeing them more and more because I use a password manager and websites flag this as a possible bot. Anyone else?

👤 GeekFortyTwo
I wonder what is different between your online fingerprint and mine, in generalities.

I have not seen a captcha in months if not longer, I don't remember doing one anytime recently.

I have browser and network level ad and malware blockers in place and sometimes come upon sites I cannot access at all, but never unusable ones due to no captcha.

But, I also have a Google account linked to chrome that stays logged in, and I wonder if that allows me to avoid them.


👤 LinuxBender
A while back there was a concept for a captcha I liked, but it seemed to vanish. It was called something like "StupidFilter" and asked a free form question. If you answer the question correctly, you can then post data on the site. That doesn't seem complicated to me and if each site had their own unique set of questions it might be harder to make a generic bot to bypass it. The reason I like this method is that you can tailor it to the audience of your site. PC modding site? Have entry-level PC modding questions. Bots would have to become domain content aware. Not perfect, but I don't need perfect.

Asking the web developers on here, how difficult would it be to make something like that? Or does that already exist?


👤 grundoon
The thing that really ticks me off about those captchas -- besides that it seems I'm more likely to have them forced on me when I'm on a VPN, so presenting an AWS IP to their server -- is that sometimes they think I'm wrong about, e.g., which square has an image with a bicycle. No! It's YOU, captcha algo, that are incorrect! Your images are wrongly labeled! Grrr, so annoying :-(

👤 bllguo
this just seems self-centered. why do we expect access to everything on the internet for free? captchas are hardly a significant price..

👤 kenny11
I wonder about the quality of the data they're getting from all the random people training them. A few days ago I had to answer one by "selecting all the parking meters" and it wouldn't let me continue without also selecting a picture of what was clearly a rural mailbox on a post.

👤 emteycz
You're not required to do anything - you can simply close the page. It's as legal as any other code on that page the owner decided to put there. You aren't entitled to use the page in your way.

👤 nunodonato
A few days ago I thought of building my own captcha system. Was wondering, how would you test a captcha system? Like, how can I get a few clever bots to try and break it to see if it works?

👤 mam2
Why do you expect to have a say in it ? The have power, you dont.

👤 rognjen
To commenters asking why do paid sites use them: because fraud is real and the fact that you pay doesn't mean you aren't a bot.

👤 ashalhashim
Why would this be illegal?

👤 trianglem
It is a way to prove you’re human and get useful work out of it. Is your issue that you’re giving free labor to Google specifically or would you have an issue irrespective of the company?