What relevant research or projects are trying to make these sorts of algorithms and data accessible as a future commodity people can build on top of?
I believe simple two stage system might be enough to produce decent system. Stage - 1. query expansion and reverse index based retrieval Stage - 2. re-ranking based on few combination of heuristics. (page rank + word embedding + query analysis)
Would you like to talk more about this ? I have email address in my profile.
I'd also try to separate out multiple meanings of phrases if possible... for example "hypertext markup" could mean the HTML language, or it could mean actually marking up hypertext (annotation). I'd let the user have some way to disambiguate the meanings.
The quality of the search algo wasn't better the sites indexed were better.