I want to be able to quickly (ideally in several seconds at most for result sets with 1.000-1.000.000 datapoints) select datapoints of a given dataset and possibly filter them based on their attribute values, e.g. formulating queries like "give me all datapoints belonging to dataset A for which x < 4.5 AND category = 'test' AND event_date >= '2009-04-10'". Once written, datapoints will not change, though I would like to attach additional information to specific datapoints (e.g. test results or additional labels), which could be done in a separate data structure or table though.
Right now I'm solving this using a simple PostgreSQL database with auxiliary index tables, but I'm looking for more scalable alternatives.
I've considered software like Cassandra or Clickhouse but I'm not sure they will fit my use case well. Do you have any recommendations or did you realise such a system in your work and can provide some ideas or guidance? Thanks!
What is the downstream use? To train, label?