HACKER Q&A
📣 sonthonax

How to optimise databases for latency rather than throughput?


Databases are fast for querying large amounts of data, but terrible at fetching small amounts of data quickly.

Databases perform terribly on benchmarks for getting single rows from small tables. In Oracle, fetching a 100x2 matrix of integers takes 500us, while fetching 1000x2 matrix takes 1000us. I get similar (worse) performance in MySQL and Postgres.

This is hundreds of times slower than a key value store. It's crazy that databases are so slow in 2020. It massively influences how we build applications, and I don't understand why they have to be so bad at being key-value stores.

As far as I understand, modern databases are quite complex bits of software. For example, Oracle performs a lot of IPC between listener, writer and log processes. Postgres has a single process per connection that has to be synchronised with other processes.

What are some of the techniques that people on Hacker News have used to mitigate this problem?


  👤 anarazel Accepted Answer ✓
The biggest influence on small query performance for RDBMSs is server<->client latency. I just checked, and a single simple key-value style lookup in postgres over a local unix-socket connection takes 0.025ms on average.

However, if I use pipelining, I see 1000 lookups performed in 4.725ms (i.e 0.0047ms per lookup).

If you test the same over an actual network, the performance difference is going to be much bigger, because it's not just context switch, but actual network delay.

You're never going to beat a dedicated KV store with a shared-nothing event listener architecture with a general purpose RDBMS. Transactional semantics imply some overhead, non-shared buffer pool isn't generally viable, ... But I don't think the difference is as big as you make it out to be.

Disclaimer: Postgres dev, so I'm certainly biased.