HACKER Q&A
📣 ccorcos

What's the state of P2P database sync?


CloudKit, Realm, and CouchDb/PouchDb offer offline syncing with some kind of cache on the client. This is great, but I'm curious if there are any successful technologies (other than git) that are actually cloudless and P2P.

I recall reading about Noms a while ago when it was acquired by Salesforce and that supposedly had promise as a P2P database but I haven't heard anything since.

I've read some about distributed hash tables used by torrents, but they are inherently public which isn't exactly suitable for most applications.

Have you heard of any new or interesting offline distributed databased?


  👤 mbalex99 Accepted Answer ✓
We've created a peer to peer database at https://www.ditto.live . It runs on servers, WASM, and mobile all with a shared code base. Each row in the database is a robust CRDT that can sync deltas (or diffs) very efficiently. In addition, we are just about to add binary file sync that is quite reminiscent of bittorrent and has all the goodness of retries, resuming downloads, canceling, and pausing all baked in.

The most unique feature of Ditto is really our networking stack. Devices that have the database can sync over any available transport: mDNS, WiFi Direct, AWDL, Bluetooth Low Energy.

Here it is in action: https://www.youtube.com/watch?v=1P2bKEJjdec

Here's a video of how being p2p allowed us to architect a database that fall back to mesh networks if the server can't be reached: https://www.youtube.com/watch?v=0IAle9BYlxQ

We're launching with several massive enterprise companies this year and some mid-to-small ones as well. On your next flight, if you notice that the flight attendants and pilots are using mobile devices to collaborate in realtime even if the internet is down: it's us ;-)!

A couple of notes about how the SDK works:

1. The CRDT that backs our data structure for each row in the database looks like a JSON object to the end user. 2. It is capable of substring modifications, array and map mutations 3. Conflict resolution is automatic and deterministic, the default behavior is LWW but can be customized as long as it's deterministic. 4. It's written completely in Rust! 5. If peers in the mesh network have the same CRDT and only minor mutations are made to part of the CRDT, then they can efficiently sync a subset of them to reduce bandwidth limitations 6. Our replication protocol is tied to queries. Peers will broadcast what they are interested in and devices that have relevant data will send it as long as the proper authorization certificate is there.


👤 ch_sm
Not exactly answering your question, but I‘ve found researching CRDTs (Conflict-free Replicated Data-Types) very enlightening. It‘s what a lot of databases use these days for multi-master replication. Martin Kleppmann has done some great writing about its P2P use-cases.

As for your actual question: GunJS comes to mind and looks interesting but I‘m not sure it‘s production ready.