I've investigated a few such as those based on solr but my concern is that they do not handle multiple human languages with minimal configuration. Ideally something that has the stemming and tokenization for multiple languages including East Asian languages such as Chinese Japanese Korean and ideally South Asian languages like Hindi and Urdu as well.
Unfortunately a great many search engines out there seem to have a stemming and tokenization available for romance languages such as Latin derived or Germanic languages. People from everywhere like to archive web content that they browse and the content is in multiple human languages so to provide a good full text search I need something that can handle multiple human languages. I considered maybe writing something myself such as a simple Trie, but I think the rabbit hole of creating a good full text search is a very very long and convoluted one so preferable to plug in something that already exists.
I really love what flexsearch is doing especially how they are using signals from context I think that's the future. But I'm concerned how basic their support for stemming and tokenization is for example: https://github.com/nextapps-de/flexsearch/issues/207
AFAIK there is nothing out there for East Asian languages that works as good as their romanized counterparts. They work pretty ok with text book, perfect grammar, and easy kanji material. They fall apart completely on casual human text/speech.
Do not attempt to solve this problem yourself! I'm guessing only the likes of Google and ML experts will be able to tackle this.