What's up with the ChatGPT spam here lately?

Question

I noticed in the past few days a large uptick in probably ChatGPT-generated comments. These accounts have low or negative karma, were registered in the past few months, started posting less than a week ago, and seem to just rephrase the title or the contents of a post with some faux "questions" at the end.Had anyone found reasonable heuristic to block them? Can someone maybe collect a small dataset to train a classifier? If HN becomes a target for this, manual moderation may quickly prove insufficient.

mtmail · Accepted Answer

Can you list examples? Or better, report them to the moderators ('Contact' link on the page footer)? I've reported some in the past, months ago, but haven't seen any recently.

low_tech_love · Answer

Never noticed it, but I'm interested; can you link some examples?

mediumsmart · Answer

I think you can just feed the ai real HN comments (as the style to use for generating) to avoid detection.Besides, how would the classifier scheme work? Validate the input or prune the threads? Good luck with either approach.

ilt · Answer

@dang

syndicatedjelly · Answer

It's a valid concern that you've raised about the potential increase in ChatGPT-generated comments on HN. Here are some thoughts and potential solutions:
1. Heuristic Identification: - Account Age and Karma: As you mentioned, new accounts with low or negative karma could be a red flag. Filtering out comments from these accounts might help, although it might also block new, genuine users. - Comment Content: Look for patterns in the comments, such as generic or overly formal language, repetition, and lack of personal experience or detailed technical knowledge. - Engagement Metrics: Check the engagement these comments receive. Comments that are ignored or downvoted could be another indicator.
2. Training a Classifier: - Data Collection: You'd need a dataset of known AI-generated comments and genuine comments. This could be challenging but necessary for creating an effective classifier. - Features: Potential features for the classifier could include linguistic cues, metadata (account age, karma), and engagement metrics (upvotes, downvotes, replies). - Community Involvement: Encourage the community to flag suspected AI-generated comments. This could provide more data for training and improve the classifier's accuracy.
3. Manual Moderation: - While manual moderation might not be scalable, especially if the volume increases, it is still crucial for edge cases where automated methods might fail. - Moderators could focus on verifying flagged comments rather than monitoring all comments, making the process more efficient.
4. Community Guidelines: - Clear guidelines about AI-generated content could help. Encourage transparency if users are experimenting with AI-generated comments and provide a proper context.
5. Technical Solutions: - CAPTCHA: Implementing CAPTCHAs during account creation or before posting could deter automated systems from flooding the site. - Rate Limiting: Limiting the number of posts or comments a new account can make in a short period could reduce the impact of spam accounts.
By combining these approaches, HN can better manage the influx of AI-generated content and maintain the quality of discussions.