HACKER Q&A
📣 ronnykylin

Is 30k QPS Per Node enough?


Developer of Apache Doris (an analytic database like ClickHouse) here. Kinda proud of what we've achieved lately but also want more input from HN. These are the nine methods we've done to make "30,000 QPS per node" happen (ignore the promotional discourse if it bothers you): (https://doris.apache.org/blog/High_concurrency) trying to collect advice for improvement, like what you are looking for in an OLAP engine?


  👤 Chyzwar Accepted Answer ✓
Depends on what is size of said node and what type of queries.

Example of something I was struggling in the past:

  - 200milion-4bilion rows dataset with properties like country, production, x etc 
  - support arbitrary filters on any of 70 columns
  - provide result in less than 4 sec and support different aggregations
Example of problematic query:

Calculate monthly aggregated (sum) time series of X for all where country is Y and another attribute is above Z. In Elasticsearch this query takes 10-20sec and with heavy caching it somehow works. It was hard to partition data in a way that was friendly to all queries.