HACKER Q&A
📣 derefr

Does there exist a read-optimized distributed K-V store?


I have a large-ish (right now ~billions of keys, ~100TB) immutable/append-only dataset, with rare inserts (~1GB chunks, once every few minutes) done only by background bulk insert, and no need for any kind of consistency (i.e. inserted data isn't "from" users, and so nobody is expecting inserted data to be reflected immediately; and keys inserted together don't even need to become visible atomically, as clients will retry to fetch dependent keys.)

And I need to serve huge numbers (~trillions) of single-key point queries out of this dataset, out to many consumers over the Internet, with each fetch having as little per-fetch read latency as possible — ideally, with round-trip times resembling e.g. a Redis GET.

Basically, I want the performance properties that you'd get in a single-node scenario from using LMDB and dedicating all your RAM to OS disk cache — but "in the large", with data being sharded into vnodes and then those vnodes being spread+replicated across an elastically auto-scaled set of compute replicas.

You'd think the answer would be a thing that calls itself something like a "distributed KV store", "serverless NoSQL store", etc.

But it seems that all the big "distributed KV store" products and services — DynamoDB, BigTable, Cassandra, Riak KV, FoundationDB, CockroachDB, ScyllaDB, etc. — are all built to optimize for write throughput in a many-distributed-writers use-case, with little concern for per-read latency. They're all "LevelDB in the large", not "LMDB in the large."

Does HN know of any system that would suit my use-case? Or, if not, any ideas why nobody has built one?


  👤 karma_daemon Accepted Answer ✓
you could try clickhouse

👤 yuppie_scum
Google shows TridentKV

👤 hyc_symas
Just use OpenLDAP.