HACKER Q&A
📣 zaghaghi

Do We Need an Embedded Distributed Key-Value Database?


One problem with key-value databases is when values become large enough (e.g. 10MBs), and applications needs frequent access to the latest values, network bandwidth becomes an obstacle.

To address this, I think it's not a bad idea to scale database and application together in one node either as an embedded library or as a container side-car.

I found an example of this setup in Hazelcast (https://hazelcast.com/blog/hazelcast-sidecar-container-pattern)

What are your thoughts? Is this a valid configuration? Can a better design help us solve this?


  👤 jsdeveloper Accepted Answer ✓
My suggested optimization/ scaling strategy:

DB should only contain key value pairs. Don't store big blobs as values in them. Store meta info(like network path, size, name...) for these blobs in values. Blobs sould be on some Network drives.

This way you can scale these two things independently. The client can locally store this meta info and won't need it again and again.

You can have some sort of analytics on these blob access and than replicate more copies accordingly of often accessed blobs, to get more read through out.

Use of "range" http headers to stream data can also be helpfull.

Zero copy is another concept to get the best through put. Familiarize yourself with it.

The easy and best solution is to store these blobs on some properity media server solution. As they do handle most of these optimization. Your database must only deliver meta data, which do contains cloud path to these blobs. it then becomes headache for these media server to scale your big blobs.


👤 andrewfromx
my first thought is prevent the values from becoming over 10MB with logic to break that up into multiple keys each under that 10MB limit. Do your app really need the full 10MB? It probably needs only a small section of it? And if it doesn't could it be refactored to achieve that?