HACKER Q&A
📣 syedkarim

Why doesn't anyone create a search engine comparable to 2005 Google?


I seem to recall that Google consistently produced relevant results and strictly respected search operators in 2005 (?), unlike the modern Google. And back then, I think search results were the same for everyone, rather than being customized for each user.


  👤 gbmatt Accepted Answer ✓
Ha, yes, I've done that at https://gigablast.com/ . The biggest problems now are the following: 1) Too hard to spider the web. Gatekeeper companies like Cloudflare (owned in part by Google) and Cloudfront make it really difficult for upstart search engines to download web pages. 2) Hardware costs are too high. It's much more expensive now to build a large index (50B+ pages) to be competitive.

I believe my algorithms are decent, but the biggest problem for Gigablast is now the index size. You do a search on Gigablast and say, well, why didn't it get this result that Google got. And that's because the index isn't big enough because I don't have the cash for the hardware. btw, I've been working on this engine for over 20 years and have coded probably 1-2M lines of code on it.


👤 lolinder
The consistent theme every time this comes up is that dealing with the sheer weight of the internet is almost impossible today. SEO spam is hard to fight and the index gets too heavy. However, I wonder if this is a sign that we're looking at the problem wrong.

What if instead of even trying to index the entire web, we moved one step back towards the curated directories of the early web? Give users a search engine and indexer that they control and host. Allow them to "follow" domains (or any partial URLs, like subreddits) that they trust.

Make it so that you can configure how many hops it is allowed to take from those trusted sources, similar to LinkedIn's levels of connections. If I'm hosting on my laptop, I might set it at 1 step removed, but if I've got an S3 bucket for my index I might go as far as 3 or 4 steps removed.

There are further optimizations that you could do, such as having your instance not index Wikipedia or Stack Overflow or whatever (instead using the built-in search and aggregating results).

I'm sure there are technical challenges I'm not thinking of, and this would absolutely be a tool that would best serve power users and programmers rather than average internet users. Such an engine wouldn't ever replace Google, but I'd think it would go a long way to making a better search engine for a single user's (or a certain subset of users') everyday web experience.


👤 BitwiseFool
Natural Language Processing is a pox on modern search engines. I suspect that Google et. al. wanted to transform their product into an answer engine that powers voice assistants like Siri and just assumed everyone would naturally like the new way better. I can't stand how Google is always trying to guess what I want, rather than simply returning non-personalized results solely based on exactly what I typed in the textbox.

While that may be good for most people, there is still a lot of power and utility in simple keyword-driven searches. Sadly, it seems like every major search engine has to follow Google's lead.



👤 abhaynayar
I'm probably the only person who doesn't think Google search has deteriorated. I play security CTFs, so a lot of times I have to search for peculiar technical details on various software. Also, like any other human being, I also make generic queries. In both cases, I feel like I almost always get to the desired webpage within the top few results.

👤 pydry
Early 2000s google index ran in a garage. The current google index has dedicated power stations.

It's a bit like the car industry - you could run a startup from your garage in the early days but you need titanic amounts of capital to compete now thanks to vertical integration.

Major governments and billionaires can compete but everybody else is locked out of the market (most "startups" use bings index).


👤 rovingEngine
I think Google was “better” from a users point of view in 2005 because it wasn’t that good at selling ads yet. I still remember the epiphany of the first time I used Google in 1999. It was amazing.

I’ve thought the same about pre-ad Twitter and Facebook.

Early on, startups with free services look a lot like non-profits and just maximize user benefit to grow. The problem is they’re not non-profits, and have to make money at some point. That has tended to mean ads.

I’d easily pay, say, $9/mo to have access to an ad-free search engine that made me feel the way 1999 Google did.


👤 wodenokoto
The web has changed drastically. I’d imagine 2005-google engine today would be nothing but abandoned Wordpress blogs with comment spam.

👤 nfriedly
I think DuckDuckGo is closer to what you want. Same results for everyone, better privacy, and they're proactive about improving their results.

https://duckduckgo.com/

Part of the problem is that there's a lot more low-quality content to wade through now than there was in 2005. I think the Google of 2005 would have trouble delivering quality results today also.


👤 mrkramer
They do[0] but nobody cares anymore. Google controls web distribution through Google Chrome. I think we are at the point of no return. There won't be any competition anytime soon no matter what US government does. Only innovation can displace Google.

[0] https://search.marginalia.nu/


👤 pkamb
I would use a search engine that only indexed Reddit, Stack Exchange, Wikipedia, and a small number of other sites.

And that specifically blocked Pinterest, Quora, most non-personal “blogs”, etc.

People suggest DDG ! operators, but I don’t want to use a site’s (bad, single-site) search box. I want a multi-site SERP that only displays results from known good sites, which are customizable.


👤 moralestapia
Please do it! Google is now complete trash.

Also gmail, used to have the best spam filters out there, now it's utter crap. Emails from my google analytics account, for whatever reason and disregarding how many times I have clicked on "Not Spam", go to spam, and it's their own service; while messages who are textbook spam ("Hi, I just got some inheritance ...") go to my inbox.

AI (in its current state) is crap, when is the industry going to accept these are the emperor's new clothes.


👤 nickpp
Because we're not having a 2005-Web anymore. More to the point, SEO & Google have evolved together. To have barely relevant results today you need to be good. That takes stellar talent which costs huge amounts of money.

Thus, the Google of today, which is optimized to extract that money from us.


👤 indymike
Brave's new search engine seems to work pretty well. Have been using it as my primary for about 10 days, and so far, I've only had to revert to Google once, and when I did the results were chock full of spam.

👤 freediver
We are building one [1] as well as a few other people that I am aware of with different approaches and business models.

We also need to be aware that when we remember past times it usually carries a romantic, nostalgic note. Web is very different than it was 15 years ago and the problem of search has evolved.

What you are looking for is basically 'grep for the web' but it is just one facet of search that we use today. 15 years ago you would not get an instant answer to a question like you do today and many users would not be able to live without that today. There are also maps and location based answers, all sorts of widgets like translation etc. Also world became more polarized so an objective best search result became more difficult to produce, specially for events covered in news, which means bias inevitably starts to creep in.

This is not to say that Google is good or bad today, it is what it is and they are doing best they can. Startups like ours see an opportunity on the market, in large part to help savvy users find what they want.

[1] https://kagi.com


👤 ColinHayhurst
You might call this a search engine based on the principle of Information Neutrality.

“Information Neutrality is the principle to treat all information provided (by a service) equally. The information provided, after being processed by an information-neutral service, is the same for every user requesting it, independent of the user’s attributes, including, e.g., origin, history or personal preferences and independent of the financial or influential interest of the service provider, as well as independent of the timeliness of information."

I wrote about this in relation to search [0]. We need to be allowed more freedom to choose search engines and services. One (default or selected) choice for search is unhealthy. We shouldn't have to choose between Google or Bing; DuckDuckGo or Startpage; Brave or Ecosia; Mojeek or Gigablast ..... Personally I use all 8 of these and more, as also explained [0].

[0] https://blog.mojeek.com/2021/09/multiple-choice-in-search.ht...


👤 boyter
I had a brief stab at this with https://bonzamate.com.au although its Australia specific to reduce the crawling and indexing requirements. It's main twist is that it runs entirely in AWS Lambda's meaning it costs nothing when it's not being used.

👤 ravenstine
I think what [some] people actually want isn't the Google of 2005 but to have a search engine where they don't feel like they're being manipulated.

👤 erpellan
Even if Google dusted off their 2005 codebase and ran it on today's web it wouldn't come close to the results quality of Google in 2005. The SEO industry has been in an arms race with the search engines for 16 years. 2005 Google would be like a goldfish in a piranha tank.

👤 jakub_g
Cliqz wanted to build new search engine but failed. It's just too difficult to operate at that scale and break the existing monopoly of big G.

https://www.burda.com/en/news/cliqz-closes-areas-browser-and...

https://news.ycombinator.com/item?id=23031520

https://0x65.dev/blog/2019-12-06/building-a-search-engine-fr...


👤 ab_testing
I think a lot of people are ignoring the issue that the web has changed considerably since 2005. It is approximately 10 times larger in terms of number of websites and web pages. And a lot of it is SEO junk that is just designed for search engines to be easier to parse and show ads in your face.

Also user preferences have changed in the last decade or so. I know millenaials and users in their late 30's or early 40's still yearn for the old web where they would type a search term and correct results would astonish them. However, younger users tend to gravitate to videos and that is why a large portion of the google results are now video results.


👤 greyman
1) Google is better at AI, for example let's take this sloppy search: "some joke where you can't tell if it is serious or joke"

It is called Poe's law, and Google returned it at #4. Bing or Duckduckgo don't have a clue...

2) They have a years of user's data, like for specific term, they see what users clicked most, so they see which results were perceived as most relevant. It is hard to catch up if you dont have such data.

3) They developed anti-spamming tools during the years of fighting against SEO-spammers.


👤 gorgoiler
Random thought, based especially on using DuckDuckGo for two years:

Search engine isn’t singular, it’s plural.

(1) Search engine for something I know exists.

(2) Search engine for finding something new.

There’s a market for both, but you don’t have to solve both problems with the same product.

Sometimes I switch to Google for the former, but the latter works well enough for me that I don’t care what else Google would’ve shown me.

More often than not, my feeling is Google would only have shown me more ads in addition to whatever I could already find elsewhere.


👤 keddad
While I feel that Google has become worse in last couple of years, I'm pretty sure it is still better now when 15 years ago. Maybe it is just some kind of nostalgia?

👤 ineedasername
SEO wars are at least part of it. Google's algorithm has evolved over time not just top optimize advertising views/clicks and take over more screen space, but also to battle the constant gamification of their algorithm by SEO that, once you eventually get to the real results, will surface less relevant/spammy/scammy etc results if Google doesn't constantly push back against the worst SEO abusers.

👤 phendrenad2
The 2005 Google model only made sense in the 2005 internet. Google had the luck to become a search monopoly, and they quickly created Chrome to ensure that no one would ever switch away from Google search, so they could maintain the monopoly.

Now that Google exists, you can't create another one. There's only room for one.

Another thing is the rise of "content sites", like this one (Hacker News). I'm sure YCombinator doesn't like getting hit by dozens of crawlers. The impulse to ban everything that crawls except (Google|Bing|Baidu|VK) is too great.

A lot of alternative suggestions are being thrown into this discussion. Let me throw in mine: Reverse the concept of the "crawler". Instead of following links around the internet randomly, require sites to register with you and request to be crawled and/or submit a sitemap. It would be hard to get started, but once something like this gained momentum, I believe that there's room for several of these reverse-search-engines to compete.


👤 chilling
Yesterday there was a discussion[1] about it and someone suggested yandex.com. I'm using it since than and really love it. It's like going back to 2003 where everything was just plain and simple.

[1]: https://news.ycombinator.com/item?id=29393467


👤 jerhewet
Do not force me into autocomplete mode when I'm typing in my search terms. I don't care what anyone's "reasons" are for forcing me to put up with flashing, irrelevant bullshit when I'm searching for something. I don't care how "fast" it is.

Just let me type stuff into the search box -- including typo corrections and modifications to what I'm searching for -- and hit ENTER to start the actual search.

When I'm ready to start my search I'll hit the fucking ENTER key. Stop annoying me with your stupid assumptions about what I'm looking for.

This ONE THING is why I switched to Webcrawler.com two years ago. I type in five or ten words with ZERO craptastic guesses flashing around on my screen, hit ENTER, and THEN it returns what I'm looking for.


👤 causi
Lately I've noticed Google has just started ignoring search operators. Search results are missing terms in quotes and include terms with a leading - sign on them. It's like they've decided we're too stupid to know what we're looking for.

👤 8bitsrule
Looks like millionshort.com (which I learned-of on HN) died recently. For me, its results were more useful than most others (even without the 'leave out the top nnn sources' feature). Hoping it was an experiment that will bear fruit.

👤 drcongo
I've been using kagi.com for a month or so now, and it consistently beats DDG and Ecosia for result quality. I'd guess it beats Google too, since last time I used Google it was nothing but ads and spam which is why I stopped.

👤 BbzzbB
No mention of DDG in the comments? Is there a reason I'm not seeing or it's just not the preferred alt-search on HN? Seems to have been working fine for me when I struggle to get past the funnels and content mills on Google.

👤 flipdot
Not sure if this is any close to what you’re trying to find, but there’s https://github.com/benbusby/whoogle-search

👤 llaolleh
Everyone runs in the other direction anytime a search engine is mentioned. The thought of competing with Google turns people off.

Even in 2021, despite how bad it's become, it's still miles ahead of other competitors.


👤 not2b
I think you're being nostalgic for something you don't remember very well.

In that era, Google would return a match based on words that appear in the links to a URL but not in the article itself, meaning that it was easy to produce "Googlebombs". For example, from 2005-2007 the top hit for "miserable failure" was the Wikipedia article for George W. Bush.

See https://www.screamingfrog.co.uk/google-bombs/ for some of the "better" ones.


👤 karmasimida
Google does its job.

I heard HN constantly crying over its deteriorating quality, but I am not noticing it that much, not better not worse, it just does its job.

To create 05 Google, it is easily billions of dollars and years of investment, before people will treat you seriously.

The reason we didn't get 05 Google could only because it is not profitable. Some nation state attempt to demonopolize the search engine business might work, but I didn't expect any for profit organization to easily attempt doing this, let alone individual hobbyists


👤 simonebrunozzi
These guys [0] have built something really close to 2005-Google, and possibly slightly better.

The parent company, Tiscali, was a huge hit in the 1990s, as it provided internet access to millions of Italians. It went through some struggle for several years, but lately the original founder, Renato Soru, came back to run the company.

The company is based in Cagliari, the capital of Sardinia, Italy.

[0]: https://www.istella.it/en/


👤 dave333
Does Gigablast ignore or downrate stuff on .info domains? Seems to like https://www.fiendishsudoku.com/ for "fiendish sudoku" search but doesn't know about https://www.extremesudoku.info/ for "extreme sudoku" search.

👤 jacquesm
I'd love a much simpler version of search engines: an engine that I can give a long list of websites to crawl, and to completely ignore the rest.

👤 criddell
Why don't you want personalized results? If I search for "subaru service" I want to find Austin Subaru, not Thorp Subaru in Cape Town.

👤 willcipriano
I'd like to see a "just search" engine, all it does it search for a specific string, case insensitively, across the entire web. No curation or anything, just sorted in lexicological order closest match first maybe falling back to page age if it has more then one exact match. Perhaps give me some regular expressions as well.

👤 swframe2
Have a look at gpt-3 if you want to see what the future dominant search engine will be. It will not find relevant results, it will write it on the fly customized for exactly what you want to read. (Maybe products will just ship to your door and be auto paid because the future ad targeting AIs will know you so well.)

👤 emodendroket
Well, Cuil had a lot of money and couldn't do it. I don't know how you quantify your assertions but I suspect that if you brought back 2005 Google it would be easily gamed and struggle to deal with social media sites where a lot of content people are looking for is now found.

👤 kumarsw
I feel like we are at the low point or even losing the battle between search engines and SEO spam. Maybe it is time for the Yahoo-style curated directory to return? We seem to be getting a microcosm of this with the awsome-* GitHub lists and Gemini with its near-nonexistent search.

👤 andrewclunn
What about a search engine that only indexed information and technology "alternative" sites, specifically to give you the results most likely to be purged or demoted from Google's results? Would be simple enough in scope and have a built in market and use.

👤 motohagiography
I do like the idea that instead of crawling and indexing, the next generation search will likely be more like a federated community search app that indexes the stuff members actually read. Google search isn't so much a repository as a consensus about what's important, hence why it's so politicized to the point of becoming unreliable, but also why it too is vulnerable to disruption.

Imo, 2005 google got initial traction because of its tech forum post indexing, as I remember my switch to it was because it became an extension and then replacement for manpages. In that sense, what made it good was it reflected the consensus of what its incredibly influential userbase thought was important and just managed that really well. The demographic impact of the U.S. Gen X all using it at once didn't hurt either.

The equivalent today, as a lot of us say, is that blockchains are in the 1997 internet phase, and the service that makes the content of those as navigable as the 90's internet, will likely grow in a similar way.

Search that provides young people with privacy and freedom to pursue their true interests will be the dominant strategy. Its success will be because it's a product that rides growth, and not because it "solved a problem." Imo, we all index too much on the privacy pattern because the freedom pattern is too risky.

What's changed since that time are the maturity of things like Bloom and other probabilistic filters, Apple's private set intersection, differential privacy, zksnarks, and everybody you'd ask an opinion from now gets their content through mobile devices. Apple's ecosystem is equipped to do this kind of search, but they're too exposed politically to get into it. Meta will likely go there, but nobody's going to trust them willingly.

A protocol that generated a cryptograpically strong anonymous index from your browsing - and instead of putting it on google's servers, it was on a chain, or the content index information and its evolving consensus score was included in something like a DNS record - may still unseat these ensconced interests. IPFS and other P2P or torrents might do something like that as well. Blockchains maybe good for that consensus/desire score.

It's not something you architect and design top down that has to solve all cases, it will be just another useful product that grows while riding a demographic change. It would be on the level of inventing HTML/HTTP again, which, when you think about it, was just another dude making a thing he needed.


👤 baggachipz
https://kagi.com/ is a new engine (and Orion Browser) which seems like what you're talking about. I've been using it some and like it so far. The browser is fantastic.

👤 hvasilev
All big tech businesses at their core are monopolies. Once a significant field has been figured out, it is very difficult to compete with the market standard, unless they screw up so hard that that THE AVERAGE user starts searching for an alternative.

👤 WalterBright
I'd like to see categories like travel, science, history, art, etc. The web pages could pick which categories their page falls into using meta tags. The user has the option of selecting which category they are interested in searching within.

👤 lgrialn
What I miss most of all from the Good Old Days was getting as many hits back as I could read.

Rather than being told "No, there are only eight pages of results on anything in the goddamned world. Really. Would I lie to you?"


👤 dang
Ongoing related thread:

Gigablast Search Engine - https://news.ycombinator.com/item?id=29421898 - Dec 2021 (10 comments)


👤 marksbrown
I'd like a way of automatically filtering for websites that :

* Don't use JS * Don't use Google analytics * Don't weigh more than a few kB per page * Don't show any sites with ads

That would be a place to begin.


👤 dragonwriter
> Why doesn't anyone create a search engine comparable to 2005-Google?

Because the universe being searched isn't the internet of 2005 and earlier, and because user expectations have moved on, too.

Plus the index expense.


👤 mrfusion
I’ve always wondered why you can’t use SEO optimizations for GOOGLE as a negative weight and penalize those pages.

For example if my search term appears in the URL I can almost guarantee I don’t want that page.


👤 tigerlily
Surely there must be some way to have distributed search compute a la folding/seti@home or those mersenne prime guys.

I'd gladly pool in some of my CPU time if it helps build a better search.


👤 gompertz
I've been having a lot of good luck with Lycos (yeah, that Lycos, from 1995!)... It never returns pay wall or "opinion" based results (I.e. Medium).

👤 est
Because today's web are full of walled gardens, and most content are going mobile , in streaming, and SPA rendered, which is no longer plain text based.

👤 s1k3s
I don’t know how Google was in 2005, but in ~2010 I was able to pull a website on #1 with 0 cash spent, just by manipulating PR. That doesn’t seem great to me.

👤 vangelis
They have, sort of: https://search.marginalia.nu/

👤 beefman
Can you also create a web comparable to the 2005 web?

Well, it's wikipedia. So just create a search engine for that, since their search sucks rocks.


👤 amelius
Also, where are the books about writing a search engine?

Knuth's "Searching and Sorting" volume desperately needs an update.


👤 anotheraccount9
Check out the dead internet theory. If most people browse 1% of the web, what's up with popular search engines?

👤 hermann123

👤 axegon_
Two major reasons: costs to build and maintain and manpower needed. Both are practically impossible to come by.

👤 ChrisArchitect
related 2 days ago:

Ask HN: Has Google search become quantitatively worse?

https://news.ycombinator.com/item?id=29392702

Inviting all the paranoid/speculative/hearsay/personal experience responses. Lame Ask HNs!!!!!


👤 mkbkn
I am a non-dev and Ecosia and DuckDuckGo are perfect for me. Not used Google since more than 3 years now.

👤 ofou
An end-to-end machine learning system should compress all the web into a good search engine soon.

👤 hereforphone
Because the money lies in modulating your product according to the whims of the highest bidders.

👤 chrisgoman
too many crappy websites, probably needs a "committee" to whitelist domains (only good quality ones) but probably too much work for not enough money or needs some monetization strategy

👤 rasengan
This is how Private Search [1] works since it decouples the search from the user. This means nobody knows both who searched and what they searched for. This is a huge leap for privacy in search.

[1] https://private.sh


👤 peanut_worm
Isn’t that what DuckduckGo is?

DDG is pretty useless though unfortunately.


👤 richardsocher
you.com supports many of the standard operators and has specific reddit, stackoverflow, MDN apps for developers.

👤 fnord77
information-dense pages of yore have been replaced by really wordy, probably generated SEO optimized blog junk.

👤 Hakashiro
What is your gripe with DuckDuckGo?