https://twitter.com/jakezward/status/1728032639402037610
But spammers are going to spam, and as long as there is a strong financial incentive to game the system, it's going to happen more and more. So assuming we are stuck with this kind of behavior, what can Google pragmatically put into place to minimize the impact on organic search ranking?
Some ideas off the top of my head:
- Specifically look to detect this sort of thing-- a new website popping up out of nowhere with a huge number of never before seen articles. The articles closely mirror the sitemap of another currently high-ranking website. Analysis of the articles suggests high probability of AI generation (this part is tough obviously). All of this directly leads to banishment from the first 100 pages of results at a minimum.
- Give massive preference to pages that have been up continuously for at least 2-3 years.
- If you can't find any organic links to the site on places like HN/Twitter/Reddit (where the comments/posts containing those links have at least a few upvotes), then it's probably not a "real" site.
- Boosting rankings for content that is clearly attributable to a real person whose existence can be verified in various ways (the same way you might look up who the poster of an HN comment is IRL).
I feel like this is absolutely a solvable problem if they take quick, decisive action. At the very least, they could try to minimize the impact. Otherwise search really will become useless and we will lose a very powerful tool.
where he was at about the same level of despair you are at 10 years ago. Recent posts reveal a lot of information about ranking factors in modern search engines, just how bad the problem is from a technical perspective (like link age being useful but pernicious in its own way) and a social perspective (if Google's results were good, why would you ever click on an ad?)
To harp on link age there is a tough balance between two factors:
1) Content that has stood the test of time is likely to be good, and
2) Content that is fresh is likely to be up to date
If we really privilege old sites decisively then we entrench a set of old sites whose owners don't have a reason to keep them up to date. Even if this policy is good in the long term, it would be disaster in the long term because it means there is no reason to start new sites, old sites don't have competition, and the web gets worse.
Anecdote: just today, I searched for a restaurant by named and town. The first several resultscwere sites like TripAdvisor. The restaurant's own site was far down the list. Why? Bet: Google makes good money off TripAdvisor&Co, because they also advertise, but probably none from the restaurant.
Google has allowed their ad business to drive their priorities. Everything else is secondary.
And honestly, looking for ridiculous amounts of pages being added in short periods of time should be one of the first things Google flags for a website. Ignoring a few very unlikely exceptions (site is bought out and merged into another one, site goes viral to the same degree as TikTok, Amazon sets up another store), adding thousands or tens of thousands of pages in weeks, days or hours should at least get a site flagged for manual review.
Similarly, sites with a large percentage of content copied from elsewhere (like those Stackoverflow ripoffs) should be easily downranked into oblivion. Forget AI, one of the biggest problems Google has right now is that obvious copycats are ranking as high as (or even higher than) the originals, despite the company clearly having numerous ways of knowing who the original is and who the ripoff is.
Also perhaps treat different types of sites differently too. For example, often when a wiki is forked, you'll have two sites with similar content; the original and fork. Prioritise the one with more edits and activity, and you'll see things like Wikia/Fandom get rightly wiped out SEO wise.
And yeah, definitely use social media as more of a benchmark for legitimacy, since virtually everyone uses it in some way to communicate online. If a site isn't mentioned by anyone who's been around for a few years or with a decent reputation, then don't rank it as high as one that said people are mentioning. Stuff like Hacker News and Reddit karma, likes/retweets/their equivalents, mentions on blogs and personal sites, links from wikis, etc should be seen as authenticity factors in a much more strict way than they are at the moment.
Hard disagree. That makes it impossible for newcomers to rank, because not everything is of interest to the social media crowd. If you're opening a new bakery, nobody will be talking about it on HN, Twitter or Reddit, even if you have the best stuff in town.
Why do you occupy yourself with trying to solve a problem for a trillion dollar company that could easily solve it themselves? Why do you care about them? They sure don't care about you even the slightest imaginable.
If you want to use a good search engine that tries to fight SEO spam, then you have Kagi as a good option already.
Ideally we'd just get to the point where search results are themselves unnecessary because an AI can summarize the results, distinguish the spam from the good stuff on its own, and present original research without hallucination, citing prior sources (human or otherwise).
The spam and lists/summaries problems might actually get better once it's just AI vs AI and there's no human in the loop and nobody putting money into SEO spam anymore because they can't outcompete the AI writers. Humans and AI can still both provide new material not yet in the training sets, but we can get rid of an entire industry of junk writing?
However, for a while now, the name of the game in SEO has been authority.
Do the things that help establish your authority and you will rank well. No matter what AI or SEO tricks other sites use.