Google might be deliberately making their search results "worse" (for whatever value of "worse" you prefer), but nobody else is doing all that much better, despite the obvious motivation to do so. I'm a DuckDuckGo user, but I don't think that the results I get are particularly better than Google results.
When people complain about Google, the root cause is usually that the thing they're looking for doesn't exist anywhere on the internet. Spam publishing sometimes drowns out the thing you're looking for, but often that's illusory - if you clicked through every single results page, you still wouldn't find what you're looking for, because it isn't there. A search engine can boost the signal-to-noise ratio in the results that it gives users, but it can't generate signal where none exists. Fixing that problem is altogether more difficult.
With good tools, one person could probably maintain the deshittifier for a few sites, at least until the sites started getting adversarial about it.
I played around with it, mostly typing in a bunch of DOS search terms. If this search engine is working the way it's suppose to.. I should have thousands of very old results. After a single page or 3, I was quickly looking at results from 2006/2007.
This might have been a problem with the search engine or a (more serious) problem with Google throwing away much of the past. We already know this is a problem. We just don't know how serious it is.
A recreation of Google would involve reindexing to include what Google (DDG/Bing and others) have abandoned (the recent past).
PS: I use https://wiby.me and https://yandex.com with much greater success in finding older material.
What changes to the business model are you envisaging?
This has already been done --- see DuckDuckGo.
The problem is not ads --- it's privacy invasion (aka "personalized ads") and advertisers who respond to concept.
DuckDuckGo shows ads --- "context sensitive" ads --- you know, ads that are related to what you search for and might actually be helpful. Not something you did last week or last month that may no longer apply.
"Personalized ads" are one of the dumbest ideas ever --- virtually guaranteed to waste a lot of people's time and squander mental and electronic bandwidth. And yet Google makes billions --- because advertisers are stupid and lazy enough to literally turn over most of their ad budget to them.
We need to upend the idea the idea that, "No one ever got fired for using Google" --- and the only way to do that on a personal level is to stop using Google.
We need more ordinary people to grasp and respond to the idea that "Google" is just another word for "privacy invasion".
And overall, I don't think I suffer too much from SEO on Google Search. On the other hand, I'm very upset with the way Youtube has gone. It's harder and harder to find quality content even though it's there. I mostly don't want to see the professional youtubers.
Google thought it had a business model that incentivized them to stay good. The opportunity to sell results didn't go away and eventually they took it.
SEO is in the structure of the internet now. Original Google was great because there was no incentive yet to buy a domain and blogspam it. Google getting shitty is just a natural instance of Goodhart's law, applied to domains and content.
Now, Google originally was based on PageRank; which based itself on every domain being a unit of authority. These have been compromised and drowned by SEO, but the concept remains valid and we could choose people as units of authority. For example PageRank on scientific papers accurately reproduce Nobel Prizes attributions. A person publishing papers is a solid enough foundation for this unit of authority.
It remains to be organized though. And if we take people as units of authority, it means they'd have to 'cite' or vote for each other. This has social consequences and might not be doable. Are you ready to refuse to cite your boss when he/she ask you to do so? Maybe if the vote is secret and delayed by 5 years?
I think there’s an easier way. Train an ML model to be able to tell apart legit web sites from garbage ones. It’ s just a binary classification. A site should either be blocked, or not.
Legit web sites being ones created by actual humans with actual content. Few to no ads. No malware, phishing, or other security threats. No content farms or SEO sites. No sites generated by other ML models. No paywalls, no pop-ups or other annoyances. Just real web sites.
You’re going to need a bunch of smart and trustworthy humans to spend hours and hours to help do this classification. But a model can help multiply the effectiveness of their efforts.
If the model works, then yes. You can make a very simple search engine. You just tell all the web crawlers to check the model, and only add sites to the index if the model says they are good web sites and not garbage sites.