HACKER Q&A
📣 HigherConscious

Where to Store Logs?


Say you have a web application and want to store events like page views, clicks, etc. for analytics.

Where should this data be stored? Is it considered acceptable for the web server to just INSERT every event directly into a SQL database table? If so, then at what volume of throughput does that break, and how should one handle higher scale?

Let's say that this is for a website where users can generate content (eg. Youtube) and view detailed analytics on that content.


  👤 RulerOf Accepted Answer ✓
The big problem with stuffing logs in SQL is that a log search can bring down your app. You'll be tempted to implement log search via something like SELECT * FROM logs WHERE message LIKE "%query%" and your DB will fall over when the log table gets big enough.

It's common to ingest logs into something like elasticsearch, for performance and reliability reasons.

This is a common enough problem that MongoDB Atlas has a feature that exposes searchable data through some lucene-based backend.[0] Never used it but found the concept to be interesting because it fits the convenient working pattern of "shove it all in the DB and figure it out later."

0: https://www.mongodb.com/atlas/search


👤 joshxyz
my current preference are as follows

postgresql for transactional logs

clickhouse for analytics data

elasticsearch or quickwit for terabytes of data, disk persisted, if i need thorough search on structured jsons

---

others i use for different use case

typesense for searching mbs to gbs of data, memory persisted

redis for caching kbs of data, memory persisted


👤 codegeek
Don't insert the logs/events/analytics into your Application DB. Usually, you send those to specialist datastores (OLAP etc) that process such high volume of data. This way, you keep the load and storage on your App db low AND if the analytics data is not working, it doesn't impact your Core Application.

You can use something like clickhouse [0] for example or use 3rd party SAAS solutions like posthog [1] etc that are built on top of clickhouse

[0] https://clickhouse.com

[1] https://posthog.com


👤 pighive
I was contemplating on various ways to achieve this at my job for last few days, here's something worth considering.

[0]https://clickhouse.com/blog/analyzing-aws-fow-logs-using-cli...


👤 ludjer
Highly recommend a managed service like datadog or New Relic. Or if you in the cloud like AWS you can use cloud watch. Don't use your application db to store operational data you should seperate them out.