HACKER Q&A
📣 rushingcreek

What would make a next-gen search engine?


With new search engines like you.com popping up and Google facing regulatory scrutiny, it's an interesting time to consider what a next-gen search engine would look like.

What unsolved problems would a next-gen search engine potentially address?

What if you could have a conversation with a search engine, asking follow-up questions within the context of the previous ones?


  👤 lkrubner Accepted Answer ✓
History. It’s been documented that Google no longer tries to index everything on the Web and is allowing older material to disappear. Tim Bray has a blog post about this. A full history would be useful, also the ability to run a search with the Page Rank of a given day. For instance, what site was the most linked node on July 19 of 2011?

👤 ggm
* Tree of results not list. Group by functions.

* Confidence ranks you can interrogate. "Why did you rank high?"

* on the fly refinement of specific hit terms. "Find more like this"


👤 RileyJames
What I’d like in a search engine? I’d like existing engines to be sources / indexes, upon which I’d add collaborative curation.

I think uBlacklist is a good implementation.

Braves proposed “goggles” is better, but as yet I don’t believe there are any implementations of it. [2]

[1] https://github.com/iorate/ublacklist

[2] https://brave.com/wp-content/uploads/2021/03/goggles.pdf


👤 onionisafruit
I think the biggest challenge for a search engine is getting the corpus to search — the crawling and scraping. A collective of search engines could go in together on that together. Then they could go their own way on indexing it and making it useful.

👤 gkjrnmtmt
Related question. How can an individual start building a new search engine without initial seed funding? Bing’s free search tier is very easy to exhaust. My idea is to use some else’s search results and remove results that don’t fit my criteria

👤 tboyd47
Search engines that are extremely good at finding info on a specific thing.

Search engines that filter out websites that don’t follow a certain code of conduct.

Search engines that are local to a certain area.

Search engines that are completely open-source.


👤 manx
I'd like to see a modern web directory, that is built automatically, but can be corrected collaboratively.

It should still feature search that finds websites or categories in the directory, but also allows to explore from these points and discover related content.



👤 throwaway81523
Have a way to discard all results that don't contain your search terms.

👤 fuzzfactor
Should have an option to rank sites according to their lack of advertising.

👤 dredmorbius
NB: I really like the follow-up question concept, though I'd like to see what you have in mind for that. Usually that's something I accomplish by adding or removing terms / conditions to my query. That's often useful, but frequently not, and remove classes of results (e.g., commerce sites, social media, SEO / content farms) is often a frustration. Poorly-detected date ranges would be another.

User-definable ranking and exclusion criteria would be great.

In the past few weeks on HN we've seen articles on:

- How universally reviled autoplay video is. Excluding sites which do this from SERPs would be a strong incentive.

- Paywalls and privacy invasions, similarly.

- Known SEO-baiting sites, e.g., Pinterest and Quora.

- Excessive JS.

- User-hostile designs.

Simple site quality and reputation would be a huge factor for me. I've increasingly taken to searching specific sites rather than general Web search just to be able to cut through the crap.

Suggesting filters to apply might also be of interest --- say, "X filter returns / removes Y results".

As noted in an earlier comment: establishing a self-indexing search standard, which would allow websites to create their own indices, and for those to be distributed to multiple search platforms. (Yes, cheats would need to be identified, and mechanisms for establishing reputation determined.)

Better metadata search, and inclusion / exclusion by category, would be great. Search exclusive to scientific, technical, or academic sites, or exclusive of commercial, paywall, erotic, gaming sites, for example. (Such filters could of course be reversed if that was your kink.) Inclusion/exclusion of social media is probably another big one.

Tools for leveraging site-specific search more cleanly, inspired by DDG's Bang! searches, might also be interesting.

Really good date-ranged search. Google still beats DDG at this, though DDG can at least filter by past day/week/month/year, which is useful.

Search inclusive of the Internet Archive's Web and other holdings would also be great. A search not just of Web space but of Web time.

A true news / magazine archive search.

Governmetnt records searches.