HACKER Q&A
📣 r0ckysharma

Suitable database to store web analytics data for million insertion


Hi Hackers,

I am looking for an experience-based comment for cost-effectively storing millions of web analytics data in a database. What database have you chosen, and if possible, answer why as well!

I have a few timeseries databases in mind, but apart from that - if anyone else has some other solution which has worked them in a highly cost-effective way for extremely high insertion and low read, especially for storing web analytics data.

Thanks


  👤 iknownothow Accepted Answer ✓
1. How many insertions do you expect per second or per minute?

2. What's the size of each insert?

3. At the end of one year, what's the total size of your dataset?

4. How long can your largest and most complex analytical query take to finish? Should it finish in a minute? Is it okay if it takes an hour? Is it okay if it takes upto 24 hours?


👤 openplatypus
Depends on the use case.

You will find folks recommending Clickhouse.

We use Kafka and Elasticsearch with Wide Angle Analytics.

Kafka gives us scalable and cheap storage potential. Kafka Streams means we can easily create "live" aggregates.

Elastic search gives us fast data discoverability.

We chose our stack because of existing expertise in the team.

Is this the easiest setup? No.

Is it scalable? Yes.

Is it cheap? Can be.


👤 richraposa
Cloudflare famously uses ClickHouse for web analytics - inserting over 6M rows per second: https://blog.cloudflare.com/http-analytics-for-6m-requests-p...

👤 zX41ZdbW
Almost every web analytics, mobile analytics, and ad tech company uses ClickHouse.

Examples: https://clickhouse.com/docs/en/about-us/adopters Datasets and blueprints: https://clickhouse.com/docs/en/getting-started/example-datas...