When it's not explicitly stated I don't know what the stance automatically defaults to. Looking at the court cases recently, it seems to be "Try it out, I guess. But don't be surprised if we, YC, sue you somewhere down the line." For the users: without an explicit statement I guess we have to assume YC is silently/privately allowing training by some companies on this data.
I would expect a FAQ question explicitly addressing this issue (and update it accordingly when the situation changes), since I believe users have a right to know if their comments will be used for AI training by some companies with YC's blessing. I also understand why they would want to avoid addressing the issue (fearing a chilling effect on the discourse on the site).
(apologize if this topic has been addressed somewhere earlier. I did a prerequisite search on the topic on the site because I figured someone must have brought it up before. Alas, couldn't find a post on it)
HN's /robots.txt doesn't disallow scraping.
I wouldn't expect a FAQ. What data is collected and how used is usually part of terms-of-service so I'd expect to be a small part of https://www.ycombinator.com/legal/
The Reddit deal is probably a short-cut to deliver data in structured format and saves both parties money compared to scraping millions (billions?) of pages. Wikipedia has a program to deliver data for such customers, too. https://enterprise.wikimedia.com/