HACKER Q&A
📣 bookaway

Does HN have a net stance on AI training on HN user comments?


In lieu of Reddit's training deal with Google I tried to find info on HN's stance. Neither the guidelines nor the FAQ has any info on whether any third-party companies (YC itself, YC companies, or otherwise) are allowed to train AIs on HN user generated data (i.e. comments).

When it's not explicitly stated I don't know what the stance automatically defaults to. Looking at the court cases recently, it seems to be "Try it out, I guess. But don't be surprised if we, YC, sue you somewhere down the line." For the users: without an explicit statement I guess we have to assume YC is silently/privately allowing training by some companies on this data.

I would expect a FAQ question explicitly addressing this issue (and update it accordingly when the situation changes), since I believe users have a right to know if their comments will be used for AI training by some companies with YC's blessing. I also understand why they would want to avoid addressing the issue (fearing a chilling effect on the discourse on the site).

(apologize if this topic has been addressed somewhere earlier. I did a prerequisite search on the topic on the site because I figured someone must have brought it up before. Alas, couldn't find a post on it)


  👤 mtmail Accepted Answer ✓
Looking at this court case it seems public data is scrape-able and usable (even sell-able) https://techcrunch.com/2024/02/26/meta-drops-lawsuit-against...

HN's /robots.txt doesn't disallow scraping.

I wouldn't expect a FAQ. What data is collected and how used is usually part of terms-of-service so I'd expect to be a small part of https://www.ycombinator.com/legal/

The Reddit deal is probably a short-cut to deliver data in structured format and saves both parties money compared to scraping millions (billions?) of pages. Wikipedia has a program to deliver data for such customers, too. https://enterprise.wikimedia.com/


👤 sinuhe69
Training on the contents will require a scraping. Thus, maybe looking up something about using the contents or scrapping it?

👤 pvg
Questions about HN are better addressed at hn@ycombinator.com