📣 endofreach

What are these low quality “code snippet” sites?

Whenever i am trying to google a code issue i have, there is countless low quality sites just showing SO threads with no added value whatsoever. It is so annoying it actually drives me mad.
Does anyone know what's up with that?
I am really disappointed because the guys creating these sites (i guess for some kind of monetization) must have some relation to coding. But i feel this is an attack against all of us. Every programmer should be grateful for the opportunity to find good quality content quickly. Now my search results are flooded with copy & paste from SO. They are killing that.
Am I the only one experiencing this or being that annoyed by it?
P.S: I don't name URLs because if you don't know what I am talking about already, you probably don't have that issue.

👤 ScottWRobinson Accepted Answer ✓

For years now I've ran a programming site (stackabuse.com) and have closely followed the state of Google SERPs when it comes to programming content. A few thoughts/ramblings:
- The search results for programming content has been very volatile the last year or so. Google has released a lot of core algorithm updates in the last year, which has caused a lot of high-quality sites to either lose traffic or stagnate.
- These low-quality code snippet sites have always been around, but their traffic has exploded this year after the algorithm changes. Just look at traffic estimates for one of the worst offenders - they get an estimated 18M views each month now, which has grown almost 10x in 12 months. Compare that to SO, which has stayed flat or even dropped in the same time-frame
- The new algorithm updates seem to actually hurt a lot of high-quality sites as it seemingly favors code snippets, exact-match phrases, and lots of internal linking. Great sites with well-written long-form content, like RealPython.com, don't get as much attention as they deserve, IMO. We try to publish useful content, but consistently have our traffic slashed by Google's updates, which end up favoring copy-pasted code from SO, GitHub, and even our own articles.
- The programming content "industry" is highly fragmented (outside of SO) and difficult to monetize, which is why so many sites are covered in ads. Because of this, it's a land grab for traffic and increasing RPMs with more ads, hence these low-quality snippet sites. Admittedly, we monetize with ads but are actively trying to move away from it with paid content. It's a difficult task as it's hard to convince programmers to pay for anything, so the barrier to entry is high unless you monetize with ads.
- I'll admit that this is likely a difficult problem because of how programmer's use Google. My guess is that because we often search for obscure errors/problems/code, their algorithm favors exact-match phrases to better find the solution. They might then give higher priority to pages that seem like they're dedicated to whatever you searched for (i.e. the low-quality snippet sites) over a GitHub repo that contains that snippet _and_ a bunch of other unrelated code.
Just my two cents. Interested to hear your thoughts :)

👤 Rastonbury

Somewhat tangential but I believe Google Search is going downhill, they seem to be losing the content junk spam SEO fight. Recently, I've had to append wiki/reddit/hn to queries I search for because everything near the top is shallow copied content marketing.

👤 nickthesick

Not only SO threads, I particularly hate the ones that mirror GitHub Issues. They don't even link back to the original thread, for Christ's sake!

👤 MiddleEndian

uBlacklist is a great tool to block sites you don't want from showing up on your search results on google (and a couple others including bing and duckduckgo). It supports regular expressions as well, I use /pinterest\..*/ to block all pinterest-related content.
https://addons.mozilla.org/en-US/firefox/addon/ublacklist/
https://chrome.google.com/webstore/detail/ublacklist/pncfbmi...

👤 terafo

Sites that do auto translate of original SO threads and pretend that it's their original content are the worst. Google sometimes prefers to give me that results instead of the actual SO thread because I'm not in English-speaking country. I have to waste some time to understand that it's just stolen SO thread. And it's not even that useful because some of them AUTO TRANSLATE CODE.

👤 howmayiannoyyou

IMHO the biggest offender:

    https://www.geeksforgeeks.org/

They are making me insane with the modal login demand. I wonder if Google has downgraded the authoritative standing of StackOverflow?

👤 shmde

Just to chip in with a very minor annoyance, I hate how google puts up w3schools results above MDN for anything related to JS/HTML.

👤 pupppet

Really ticks me off that Google allows itself to be so easily gamed, it's your core business for christ's sake.

👤 bschwindHN

The answer is, if someone can make money by doing something shitty but not illegal, they will do it.
Almost everything on the web is some scheme to put ads in your face so someone can make some money.

👤 ljm

My pet peeve is ApiDock, which has managed to SEO itself so high up the rankings when searching for anything connected to Ruby or Rails that it is actually quite difficult to get to the legitimate, official documentation.
What's worse is most of the results are outdated so you're looking at web-scraped API docs for Rails 3 or something.
Really frustrating.

👤 brianwawok

It’s an easy way to make money. Scrape a popular site like Stack overflow or Wikipedia and add a bunch of advertisements.
One of the many ways that scum ruin the web.

👤 vegancap

I really hate these. Especially when I'm trying to figure something out and I'm struggling to find answers, I end up haplessly finding the exact same wrong answer on three different sites.

👤 hidden-spyder

An index of these sites will be helpful to mass blacklist them with the uBlacklist extension.
Anyone up for creating one so everyone can contribute to it?
The extension allows subscribing to blacklists via links, so a single txt file will be enough.

👤 throwaway_dcnt

I spoke to a VP at google in 2006 in london and discussed using a combination of curation and entropy to flush out duplicates. He seemed pretty excited by the idea but I don't think anything materialized. Which is another way of saying this is not new - in those days these sites were copying newsgroups too.

👤 endofreach

Well, as i wrote i understand that they try to monetize it.
But: why the sudden explosion? I feel there is more of these sites going live regularly. Many times they make up 80% of the first pages on search results, just repeating the SO threads listed before. So it‘s really getting difficult.
Something must be done…

👤 dmortin

Search spam sites can be reported to google: https://developers.google.com/search/docs/advanced/guideline...

👤 mypastself

The biggest problem is that they waste your time even when the content is ostensibly helpful, since the search result is usually listed after the Stack Overflow page it crawled from anyway. Each click steals a few seconds of developers’ time, which adds up, given how frequently these types of results pop up on Google search lately. That makes them worse than useless, they actively subtract value.

👤 analog31

Not related to coding, but I've noticed a lot of "best of" and "top ten" sites that appear to be of the same ilk, possibly automated, that just combine pictures and paragraphs ranging from ad copy to pure drivel. On topics ranging from bicycles to Linux distro's.

👤 pronik

Important tidbit: SO's content is CC-licensed and this is probably completely legal (apart from those who fail to add a link to the original). Not that I don't want those sites to burn in hell, but they are not even in a grey zone legally.

👤 austinhutch

Here some of the sites that have gotten traction in my SERPs lately that I can't stand
https://pretagteam.com/
https://www.codegrepper.com/
https://issueexplorer.com/
They are all scraping Stack Overflow / Github in some fashion

👤 lloydjones

As others have said: SO content is ripped-off (poorly) and mirrored. The page games Google's algorithm and shows up as a 'legit' result.
Probably more complex than simple keyword stuffing, which isn't supposed to work these days..

👤 tsrand0m

Scrape-paste is one of the easiest way to make significant money, if it takes off, and that’s why these sites are made.
I think, google does well in general with coding or SO questions, but will show you these low quality sites when the questions are new or very specific and difficult to answer. Maybe, time to apply your head more.
*been on both sides

👤 TYPE_FASTER

People are crawling content that is searched for frequently, then using SEO to rank higher in the results than the original content to make money from the ad revenue.
Code and recipes are two examples.
I'm also seeing politicians posting Tweets containing a link to their personal website, which has ads.

👤 notreallyserio

I've found that Bing does a better job at detecting spam like this. Not perfect, just better.

👤 hawthornio

I’ve had to switch to !py on ddg because the official Python docs never make the first page. It’s really frustrating. :/

👤 cheriot

I wonder if there's a market for a software engineering specific search engine. Skip the shitty content farms, include code from open source projects, and potentially be more smart about finding package uses

👤 rwmj

I've noticed that Google Alerts for my open source projects have been useless for years. Full of snippet sites as well as outright scam sites which take code from SO or my blog or just mixed up tech words and repost it.
Here's the Google Alert from yesterday (scammy URLs redacted):
Guestmount qcow2 - Casino en ligne fiable It uses libguestfs for access to the guest filesystem, and FUSE (the ``filesystem in userspace'') to make it appear as a mountable device.
Stdin 1 libguestfs error usr bin supermin exited with error status 1 - Aritco Since libguestfs 1. sudo apt-get install libguestfs-tools mkdir sysroot # Just a test file. Supermin and Docker-Toolbox #14. DIF/DIX increases the ...
Edit Qcow2 Image - A-ONE HEALTH BRIDGE The libguestfs is a C library and a collection of tools on this library to create, view, access and modify virtual machine disk images in Linux. img

👤 Agentlien

Another thing I've noticed recently is that a lot of queries about computer graphics, especially tied to Unity's render pipelines, bring up what look like blog posts full of code snippets but the actual "article" seems nonsensical and impossible to follow. I suspect they are machine translated and they're really annoying.
edit: after doing a single Google search for "urp rendercontext" I found this: https://programmerall.com/article/71251053239/
Looking at it closely there seems to be some red thread and the images and code snippets do seem to follow a logical progression, but the text itself is a complete mess. I can tell it sometimes references things from the code snippets and hints at things I can see in the image, but it's certainly not informative.
Their site description says "Programmer All, we have been working hard to make a technical sharing website that all programmers love." I'm sorry, but I really don't.

👤 steve_avery

I am with you on this. Lately I have noticed that I've googled for an issue, find a low quality site with relevant results, and later discover that it's just a copy of the GitHub issues page from the original project. Why didn't the GitHub issue link make it to the first page of Google and this crappy knockoff, with no link back to the source material beat it? So frustrating.

👤 jppope

Just putting this out there... try brave search. The best answers from stackoverflow etc are all snippets and their results are getting better and better every day. I got sick of google after they made the BERT update. Really happy I switched (except for google maps data. google is still winning that game)

👤 michele127

I made this chrome extension because of this exact issue.
https://chrome.google.com/webstore/detail/search-noise-filte...

👤 abakker

I mean, while we're at it, can we get rid of blogspam? Try googling for instructions for installing cellulose insulation. It takes AGES to find a site that isn't just garbage vague content. It should be possible to detect and demote this stuff. it is so obvious.

👤 dvirsky

One thing that makes SO an easy target for this is that they let you download all their data and you don't even need to crawl and scrape the content from the website. Just download a dump, put it in an database, slap an HTML template on top of it, splash a few ads, and boom.

👤 temp8964

The worst of all are those websites that only show the content in search engine. When you click the link to their webpages, you can only see random texts have nothing to do with the search result at all. There must be some really narcissistic programmers behind these.

👤 userbinator

At least the SO clones will still probably have content that you can make use of in some way; what's worse is when you search for an error code and you get back tons of pages that don't even have the exact code you searched for, which seems to be increasingly common recently.
It also used to be the case that you could dig into the second, third, ... sometimes even 20-30 pages in and hit the jackpot. Now, the results are even less relevant there, you soon get to "the end", and if you change your query slightly and try again a few times to search harder for what you want, you'll get hit with the unsolvable CAPTCHA hellban.

👤 grinchygrinch

Same thing with various GitHub issues lookalikes.

👤 distortedsignal

There was an SO outage this last year, and I only found that out when I tried to go to SO and couldn't get to it. I checked page 2 of Google and found one of the mirrors that you're talking about. I grabbed the content from there and continued with work.
I think it's a matter of perspective. If you _know_ that you want a specific site, use google's 'site:' specifier. If you're looking and find something that is from SO, redo the search and get to the SO Q/A. As for me, I'm moderately grateful for the decentralized backups.

👤 viktorcode

The issue is actually pretty old. There was a time when Google introduced blacklisting of search results and revenue of those sites dived. Sadly, later Google rolled back the blacklist.

👤 jonas21

All user-contributed content on Stack Overflow is under a CC-BY-SA license. So what the sites are doing is allowed under the license, as long as they're providing attribution.
Is it annoying? Sure. But neither Stack Overflow nor the authors of the content can do anything about it since they gave away a license to do it.
One of the things you have to accept when you release something under an open-source or Creative Commons license is that other people can take it and use it in ways that you don't like.

👤 vanusa

They get fed into a web crawler and then into a giant hopper whence they become the backbone of that shiny "No Code" technology you've been hearing about.

👤 bryanrasmussen

I have this problem, and contrary to a lot of people I don't protect a lot of my PI from Google. It used to be Google was good at giving me stuff I wanted in ads, especially in gmail but they don't really anymore. You would think the more they knew about you they would be able to give you better results, so maybe a large scale test should be done - if Google knows your PI do you get better or worse results, or doesn't it matter at all.

👤 ruffrey

I suspect they were always there, but google and ddg are getting gamed more now. The quality of results has dropped quite a bit in the past 4 or 5 years in this regard.

👤 bigtex

If I recall the Stackoverflow dataset is open source or at least made available to download so I assume all these sites just download that information regularly.

👤 leifdenby

Does anyone know of a good list of these copy sites? I just came across this Firefox extension which makes it possible to filter sites from search results: https://addons.mozilla.org/en-US/firefox/addon/hohser/. Would be great with a community blocklist like those for pi.hole

👤 nightfeather

Surely, even using an extension mainly to hide these stuffs.
Not only mirroring SO, also its siblings (like serverfault and askubuntu), and others like GitHub.
But the most annoying part is it keeps showing those mirrored and machine translated stuffs that offers little to none benefits to me and I'm already being forced trained enough to identify those at first glance.
Those even shows up when I'm already searching in other languages, ahrr.
edit: formatting

👤 edmcnulty101

I use personal blocklist extension.
It lets you remove entire sites from your search.
https://chrome.google.com/webstore/detail/personal-blocklist...
I've also removed a bunch of trash news sites and wwwSchools, and it's really sanitized my search.

👤 ellis0n

- 1990 no big data, no data, google indexing a porn and no ads and black hack market, no open source code, no seo articles, no market, bbs only - $
- 2000 censorship, business, big data and ads ads - $$
- 2010 code learning projects, quora, reddit, iphone, spam indexing and seo ads ads ads ads - $$$
- 2020 ai indexing everywhere, no-code indexing and code is a porn of no-code now so ads and ads ads ads seo seo seo - $$$$$$
- 2030 profit $googleplex?

👤 dlandis

FYI, people have been posting about this on stackoverflow meta going back at least 8 years: https://meta.stackexchange.com/questions/200177/a-site-or-sc...

👤 kristaps

I wonder if Google as an organization cares enough to start a new spam crusade, but here's what happened the last time: https://googleblog.blogspot.com/2011/01/google-search-and-se...

👤 brainlessdev

Plugging my own FF/Chrome browser extension that lets you add domains you want to block and will simply prepend matching text links with an angry emoji and prompt you to confirm whether you want to visit the page or not:
https://github.com/fnune/nay

👤 aryeshalev

This reminds me Yahoo! Answers clones 10 years+ ago. To get traffic to website and cheat the search engine they would have index the Yahoo! answers website for specific niche category and create a garbage website with questions and answers not crediting the source and cramp the website with Ads everywhere to earn massive revenue.

👤 mkl95

I believe Google have hit a sweet spot (for them) where they can keep you browsing a specific topic for a long time while still showing you mildly interesting results. Since the results are consistently on topic, you are shown ads that are interesting to you time and time again, which results in a lot of clicks and a lot of revenue.

👤 gompertz

This is why I often search solutions directly on Stack Overflow, and not via Google. Or I add "site:stackoverlow.com" to my search. Generally SO has all I ever need... I find vendor forums to be a total wasteland for help (ie. Power BI forums) so don't need them as part of Google results.

👤 thesunkid

> Every programmer should be grateful for the opportunity to find good quality content quickly
Totally. There should be a better way to index SO.
you.com seems to try doing it that way. For most code issues, it's easier to navigate and decide what's worth reading from You than from google IMO.

👤 whatsakandr

Recently I've found duckduckgo to provide better coding results than Google, which really surprised me. I was only using duckduckgo at home for privacy, and Google at work because "best tool for the job", but I think that might not be the case anymore.

👤 sam0x17

It's quite simple. SO has a huge easily indexable database of answers, and SEO scammers can make a quick buck by copying it all and making it seem like they have an answer for unanswered questions. Nothing to see here, blame your search engine.

👤 sinuhe69

I wonder if you are logged in Google and allow search history to be saved? (assuming you talked about Google search). Because I don’t have that kind of problems and I know Google use your search history to improve your personal results.

👤 null_object

I’ve got into the habit of clicking the 3-dot icon next to the search entry (often number 1 in Googles’s results) and reporting these sites as scrapers, stealing content from SO.
Maybe if we all did this, google might eventually take notice?

👤 x86_64Ubuntu

It will probably take some time, but sites such as roseindia and expertsexchange also clouded search results in a similar manner. They are now history because Google and others deranked them to the depths of hell.

👤 worik

I do a lot of this sort of search
I have no idea what you are talking about (except for Apple's efforts at astro turfing, but that is not what you mean I think)
I use DuckDuckGo, is that why this does not bother me?
I have not used Google for search for years.

👤 slater

Spammers. They mirror SO stuff on their own sites and put google ads on them

👤 slownews45

Google search is going downhill in my experience.
My question: What is the alternative?

👤 speby

Reminds me a little of an oldie but a goodie: expertsexchange.com

👤 schaum

annoyed by that too, weirdly enough they pop up for certain queries and for others they don't. i have also seen that for github issues ..
but i switched to https://www.ecosia.org/ as my main search engine, and i like the results much more than on google - nothing special, but somehow more reliable/predictable. and meanwhile you plant some trees :P

👤 mattwad

It's for all kinds of sites lately. The uBlacklist extension has solved it for me - one click and you can remove an entire domain from future results.

👤 timka

This is just Google doesn't need real search anymore. They're now the portal. Market cap is what drives them not some geeks' needs.

👤 system2

Why don't you name the URL? Share so we know to avoid. It is not like we are going to dox the guy or something.

👤 tomaszs

It is just Google, with its amazing algorithms that rank established website way higher than a random spam website.

👤 emodendroket

They are doing that because it's technically not violating the license the way they do it.

👤 mccorrinall

I hate those websites which just proxy github or npm with a different stylesheet so much.

👤 lykahb

The content farms get ahead of the organic results for many other areas too. Search for the programming questions isn't so bad. At least, the garbage is easy to recognize. Queries about products and services probably have the worst results.

👤 radiantttt

Maybe code snippets are "enriched" in these sites?

👤 dgb23

This is an obvious case of SEO spam. But there are tons and tons of other examples worth mentioning.
For example many news sites have soft paywalls that can easily be circumvented with a few clicks. The reason they don't have an _actual_ paywall is likely to come up in search results. So essentially they spam search results and obscure the content for technical illiterate users instead of just paying for ads. They want their cake and eat it too.
Now this whole dynamic is super weird. We often talk about these issues as if Google was some kind of public service that should make useful and fair search suggestions. Sure they have the incentive to do so, but they have conflicting interests at the same time.

👤 dadboddilf2

baeldung is the worst offender for java echosystem

👤 mtoddsmith

stackoverflow.com doesn't have google ads. Those copy sites do. What is google's motive to fix that?

👤 firemelt

this shits is just as same as quora and pinterest w3school and apidock

👤 pawelduda

site:stackoverflow.com
That's what I do

👤 ahtaarra

Kmnnlm

👤 reliable

Is it bad that I've actually found my answer on some of these sites haha. But yeah, they're pretty low quality in general.

👤 diveanon

This is more of an issue with google results than the content itself.
Google is a shit product and you get shit results when you use it.

👤 legrande