HACKER Q&A
📣 vernondegoede1

Perfect tooling for high-availability statistics?


I’m working at a financial company that has 80K+ active clients throughout Europe. We have a dashboard, that allows our users to see financial statistics. A couple of years ago, we introduced ElasticSearch which indexes all payments. While this was initially just used for transaction searching & filtering, at some point we decided to also use it for transaction statistics.

Given our current scale (hundreds of millions of payments), however, we’re running into performance issues with out current ElasticSearch setup. While it still works perfect for filtering, we realized that ElasticSearch may not be the perfect tool to use for statistics as it’s becoming too resource intensive. We’re currently looking into alternatives, but we’re not sure what to use.

What tools would you suggest to use for this?


  👤 bckygldstn Accepted Answer ✓
Analytics workloads are often a good fit for columnar databases. Popular examples are Redshift (AWS), Vertica (enterprise), and Clickhouse (open source).

Columnar databases are awesome, for the right kinds of task the speedup can be multiple orders of magnitude. Columnar databases excel at filtering and aggregating a subset of columns, storing sparse or slowly-changing data, and timeseries operations.

Of course there's always tradeoffs. Columnar databases tend to suck at reading individual rows, reading large number of columns, and heavy writes.

Another option is of course to put a caching layer between the dashboard and ElasticSearch, and precompute common queries e.g. daily.

Feel free to message me if you want to chat, email in profile.


👤 hodgesrm
ClickHouse is a common replacements for ElasticSearch in cases where data consists of structured records. ContentSquare publicly reported 11x decrease in cost and 10x faster 99th percentile queries after migration. [1] Others have seen similar results.

[1] https://github.com/ClickHouse/clickhouse-presentations/blob/...

Disclaimer: some of the "others" are customers of my employer Altinity, which offers support for ClickHouse.


👤 snikolaev
Do you need full-text search? I think the answer would depend on that as there are only few open source technologies that do full-text search: ElasticSearch, SOLR, Manticore Search, couple others and LOTS of others that don't, but are much better in just analytics. Clickhouse should be a good fit then.

👤 verdverm
Time series DB maybe?

👤 amypinka
Kdb