HACKER Q&A
📣 collinc777

How did you scale your analytics workloads (Postgres)?


Our product has some reporting features that require aggregations / analytics functionality. Some of the analytics workloads are time series, others are not and we generally expect these analytics queries to resolve in ~2.5s

We've recently decided to move these workloads to snowflake because we want to protect our transactional workloads.

The snowflake devex has been pretty bad because we'd need a snowflake "instance" for each dev's postgres localhost, and we like that localhost postgres to be ephemeral. Additionally, it'd be nice to have this work all locally.

One interesting piece of software I came across is DuckDB. It's lightweight. There's no additional storage needed. It's an interesting direction for me to test but I don't know if it'll satisfy our latency requirements.

How have you separated and scaled out your analytics workloads from postgres?


  👤 saisrirampur Accepted Answer ✓
It depends on the scale - If you expect to scale to at least a few hundred GBs a (to multiple TBs), ClickHouse would be a safe and proven option. The Postgres + ClickHouse duo is a common pattern adopted by production-grade customers as their default data stack to solve transactional (OLTP) and analytical (OLAP) use cases.

ClickHouse also recently released Postgres CDC connector in ClickPipes for a seamless integration of Postgres. Now you can stream Postgres data into ClickHouse within a few minutes. https://clickhouse.com/cloud/clickpipes/postgres-cdc-connect... This was a result of the acquisition of PeerDB, an open-source Postgres CDC product - https://clickhouse.com/blog/clickhouse-welcomes-peerdb-addin...

Disclaimer: This is Sai from ClickHouse/PeerDB here. However, the answer above is based on multiple years of customer experiences. :)


👤 thenaturalist
DuckDB should give you everything with regards to DevEx that you seem to be needing.

You're not specifying what volume we're talking about here, but I'd be surprised if you can't go a long way with DuckDB. Ultimately the bottleneck for latency in a non-big data scenario will be memory?

You can run it in a browser: https://github.com/duckdb/duckdb-wasm

DuckDB's docs and blog are full of great content with regards to optimizing:

How to tune workloads: https://duckdb.org/docs/guides/performancehow_to_tune_worklo...

Performance guide: https://duckdb.org/docs/guides/performance/overview.html

Optimizers in DuckDB: https://duckdb.org/2024/11/14/optimizers.html

I'm sure there is more.

If I were in your shoes, even if you don't settle on DuckDB, it would be my go to benchmark due to ease of use and unoptimized speed.

Anything else out there like chDB (Clickhouse) or Apache Datafusion should be worth the effort.

For a detailed benchmark of DuckDB vs. Clickhouse for an orientation, see: https://bicortex.com/duckdb-vs-clickhouse-performance-compar...