If I were to start today what are some good resources to build a search engine like Google (including in terms of quality, reliability and following web standards).
If you've ever worked on anything like this. I'd like to know why or what challenges I may face since starting or deploying in production.
Power usage will dominate hardware costs, making that hardware expensive, not cheap, probably more expensive than hardware designed to run 24/7 in a data center.
> build a search engine like Google (including in terms of quality, reliability and following web standards).
You won’t get reliability from “an old android phone”
> what challenges I may face since starting or deploying in production.
Buy a napkin first, and use it to make some calculations. Starting numbers: according to https://blog.hubspot.com/marketing/google-search-statistics, Google handles about 250k queries per second. https://zyppy.com/seo/google-index-size/ They have an index with 400 billion documents.
I think the only viable solutions going forward in a post-AI world will be decentralized, small scale and non-general, curated by human beings based on a reputation system rather than algorithms, possibly even torrent-based and not touching the web at all.
https://www.crunchydata.com/blog/postgres-full-text-search-a...
The problem is getting all the data. If you try to scrap another search engine it will punish you.
You could think on a distributed collaborative SE that scraps the web from the IP addresses of each device.