Had anyone found reasonable heuristic to block them? Can someone maybe collect a small dataset to train a classifier? If HN becomes a target for this, manual moderation may quickly prove insufficient.
Besides, how would the classifier scheme work? Validate the input or prune the threads? Good luck with either approach.
1. *Heuristic Identification*: - *Account Age and Karma*: As you mentioned, new accounts with low or negative karma could be a red flag. Filtering out comments from these accounts might help, although it might also block new, genuine users. - *Comment Content*: Look for patterns in the comments, such as generic or overly formal language, repetition, and lack of personal experience or detailed technical knowledge. - *Engagement Metrics*: Check the engagement these comments receive. Comments that are ignored or downvoted could be another indicator.
2. *Training a Classifier*: - *Data Collection*: You'd need a dataset of known AI-generated comments and genuine comments. This could be challenging but necessary for creating an effective classifier. - *Features*: Potential features for the classifier could include linguistic cues, metadata (account age, karma), and engagement metrics (upvotes, downvotes, replies). - *Community Involvement*: Encourage the community to flag suspected AI-generated comments. This could provide more data for training and improve the classifier's accuracy.
3. *Manual Moderation*: - While manual moderation might not be scalable, especially if the volume increases, it is still crucial for edge cases where automated methods might fail. - Moderators could focus on verifying flagged comments rather than monitoring all comments, making the process more efficient.
4. *Community Guidelines*: - Clear guidelines about AI-generated content could help. Encourage transparency if users are experimenting with AI-generated comments and provide a proper context.
5. *Technical Solutions*: - *CAPTCHA*: Implementing CAPTCHAs during account creation or before posting could deter automated systems from flooding the site. - *Rate Limiting*: Limiting the number of posts or comments a new account can make in a short period could reduce the impact of spam accounts.
By combining these approaches, HN can better manage the influx of AI-generated content and maintain the quality of discussions.