Example of something I was struggling in the past:
- 200milion-4bilion rows dataset with properties like country, production, x etc
- support arbitrary filters on any of 70 columns
- provide result in less than 4 sec and support different aggregations
Example of problematic query:Calculate monthly aggregated (sum) time series of X for all where country is Y and another attribute is above Z. In Elasticsearch this query takes 10-20sec and with heavy caching it somehow works. It was hard to partition data in a way that was friendly to all queries.