HACKER Q&A
📣 iambateman

Full Text Search


Hey HN,

I need full-text search for a large amount of text data (500gb++).

I've looked at Algolia, Meilisearch, Typesense, Elastic. From what I can tell, they all require a server which can keep everything in memory.

MySQL full text search is probably sufficient, so maybe I'm asking about best practices to store a MySQL database at terabyte scale.

Thoughts on how to scale a database and do faceted full-text search without it costing a ton of $$?


  👤 snikolaev Accepted Answer ✓
Check out Manticore Search for your use case. It's open-source, cost-effective, and doesn't require keeping everything in memory.

Key points:

- Columnar Storage: Efficiently handles large datasets on disk, ideal for terabyte-scale data. It's not enabled by default but can be set up easily with "CREATE TABLE ... ENGINE='columnar'".

- Faceted Search: Probably easier than anywhere else with just "FACET " added to your "SELECT" query.

- MySQL Protocol and SQL Support: If you’re familiar with SQL and MySQL, it's easier to get started compared to other search engines.


👤 throwup238
Have you seen tantivy-cli [1]? It builds and uses an on disk index, which you can store separately (i.e. data on NAS, index locally).

[1] https://github.com/quickwit-oss/tantivy-cli


👤 qwm
What are your needs? I quite dislike MySQL's fulltext search as it's much less configurable than say, ElasticSearch. I would recommend something like ElasticSearch if you need flexibility down the line. I've used it happily for a few years now as a search engine for a content site I work on.