HACKER Q&A
📣 dbnewbie

120M rows Postgres – how can I speed up queries?


The table is flat (no related), but many queries we make (including aggregates) take hundreds of milliseconds.

Forgetting the specific queries for a moment (basically all queries in this table are relatively slow):

How would you handle such a scenario? Shard the database?


  👤 rapphil Accepted Answer ✓
Have you checked if you are using the correct settings in order to extract the max from your hardware?

https://pgtune.leopard.in.ua/#/

Are you monitoring your machine to check if it is not starving on cpu?

Re the queries, we have been using pgbadger to collect metrics about the usage of the dB and the slowest queries by type. This is helpful as it guides where you should put your efforts.

https://github.com/darold/pgbadger/blob/master/README.md

This is very good ref about scaling Postgres.

https://pyvideo.org/pycon-ca-2017/postgres-at-any-scale.html


👤 ezekg
Check out https://www.pgmustard.com/ if you haven't already. It has helped me optimize queries as much as possible when querying large datasets.

👤 xq3000
There is not a lot of specific details here but have you already tried and exhausted things like analyzing execution plans, index optimizations, index encoding, value encoding, denormalization, read replicas, and [front] caching?

👤 gigatexal
Have you thought about materialized views that pre-aggregate your queries and then updating those views with triggers? I got this idea from a forum so it’s not mine but it works.

Sharding by month or other bucket of time could help.

We have a very similar situation except it’s billions of rows. One benefit is it’s a bit denormalized in that we store the meat of our data in a hstore field


👤 lastofus
From my experience 7+ years ago, I saw an order of magnitude difference running Postgres on higher end bare metal hardware (think $6k rack mounted server w/ nice RAID controller) vs a supposed high end AWS VM w/ an EBS volume. That was testing with spinny disks before SSDs were the norm, but if you are on a cloud VM, it's probably worth investigating.

👤 usgroup
Use c_store fdw to store your data in columns and marvel at the 10x performance improvement without indexes.