I am looking for an experience-based comment for cost-effectively storing millions of web analytics data in a database. What database have you chosen, and if possible, answer why as well!
I have a few timeseries databases in mind, but apart from that - if anyone else has some other solution which has worked them in a highly cost-effective way for extremely high insertion and low read, especially for storing web analytics data.
Thanks
2. What's the size of each insert?
3. At the end of one year, what's the total size of your dataset?
4. How long can your largest and most complex analytical query take to finish? Should it finish in a minute? Is it okay if it takes an hour? Is it okay if it takes upto 24 hours?
You will find folks recommending Clickhouse.
We use Kafka and Elasticsearch with Wide Angle Analytics.
Kafka gives us scalable and cheap storage potential. Kafka Streams means we can easily create "live" aggregates.
Elastic search gives us fast data discoverability.
We chose our stack because of existing expertise in the team.
Is this the easiest setup? No.
Is it scalable? Yes.
Is it cheap? Can be.
Examples: https://clickhouse.com/docs/en/about-us/adopters Datasets and blueprints: https://clickhouse.com/docs/en/getting-started/example-datas...