120M rows Postgres – how can I speed up queries?

Question

The table is flat (no related), but many queries we make (including aggregates) take hundreds of milliseconds.Forgetting the specific queries for a moment (basically all queries in this table are relatively slow):How would you handle such a scenario? Shard the database?

rapphil · Accepted Answer

Have you checked if you are using the correct settings in order to extract the max from your hardware?
https://pgtune.leopard.in.ua/#/
Are you monitoring your machine to check if it is not starving on cpu?
Re the queries, we have been using pgbadger to collect metrics about the usage of the dB and the slowest queries by type. This is helpful as it guides where you should put your efforts.
https://github.com/darold/pgbadger/blob/master/README.md
This is very good ref about scaling Postgres.
https://pyvideo.org/pycon-ca-2017/postgres-at-any-scale.html

ezekg · Answer

Check out https://www.pgmustard.com/ if you haven't already. It has helped me optimize queries as much as possible when querying large datasets.

xq3000 · Answer

There is not a lot of specific details here but have you already tried and exhausted things like analyzing execution plans, index optimizations, index encoding, value encoding, denormalization, read replicas, and [front] caching?

gigatexal · Answer

Have you thought about materialized views that pre-aggregate your queries and then updating those views with triggers? I got this idea from a forum so it’s not mine but it works.
Sharding by month or other bucket of time could help.
We have a very similar situation except it’s billions of rows. One benefit is it’s a bit denormalized in that we store the meat of our data in a hstore field

lastofus · Answer

From my experience 7+ years ago, I saw an order of magnitude difference running Postgres on higher end bare metal hardware (think $6k rack mounted server w/ nice RAID controller) vs a supposed high end AWS VM w/ an EBS volume. That was testing with spinny disks before SSDs were the norm, but if you are on a cloud VM, it's probably worth investigating.

usgroup · Answer

Use c_store fdw to store your data in columns and marvel at the 10x performance improvement without indexes.