HACKER Q&A
📣 wzvici

What happened to flatbuffers? Are they being used?


A few years ago, their was a talk about flatbuffers[0] being a memory efficient and quicker method than JSON.

Anyone have any real world experience with it?

[0] https://github.com/google/flatbuffers


  👤 afavour Accepted Answer ✓
I love flatbuffers but they're only worthwhile in a very small problem space.

If your main concern is "faster than JSON" then you're better off using Protocol Buffers simply because they're way more popular and better supported. FlatBuffers are cool because they let you decode on demand. Say you have an array of 10,000 complex objects. With JSON or Protocol Buffers you're going to need to decode and load into memory all 10,000 before you're able to access the one you want. But with FlatBuffers you can decode item X without touching 99% of the rest of the data. Quicker and much more memory efficient.

But it's not simple to implement. You have to write a schema then turn that schema into source files in your target language. There's an impressive array of target languages but it's a custom executable and that adds complexity to any build. Then the generated API is difficult to use (in JS at least) because of course an array isn't a JavaScript array, it's an object with decoder helpers.

It's also quite easy to trip yourself up in terms of performance by decoding the same data over and over again rather than re-using the first decode like you would with JSON or PB. So you have to think about which decoded items to store in memory, where, for how long, etc... I kind of think of it as the data equivalent of a programming language with manual memory management. Definitely has a place. But the majority of projects are going to be fine with automatic memory management.


👤 doomlaser
Apache Arrow uses flatbuffers in the .arrow format: https://arrow.apache.org/docs/format/Columnar.html

👤 pantsforbirds
Similarly does anyone use https://capnproto.org/? It's a project I was really interested in a few years back, but I haven't heard much in the way of it lately.

👤 trgn
They (and similar technologies) are used where it matters.

Games, data visualization, ... numerically heavy applications mainly.

On a side-note; JSON has been somewhat of a curse. The developer ergonomics of it are so good, that web devs completely disregard how they should layout their data. You know, sending a table as a bunch of nested arrays, that sort of thing. Yuck.

In web apps, data is essentially unusable until it has been unmarshalled. Fine for small things, horrible for data-heavy apps, which really so many apps are now.

Sometimes I wonder if it will change. I'm optimistic that the popularity of mem-efficient formats like this will establish a new base paradigm of data transfer, and be adopted broadly on the web.


👤 tinglymintyfrsh
They're used a lot in video games and embedded systems. They're not something you see advertised.

grpc and Thrift are mostly backend service interconnects in lieu of RESTful.

Capnproto is also awesome.


👤 nerpderp82
https://capnproto.org/ is a superior system due to its ability to handle malicious input.

👤 politician
Yes, flatbuffers are fantastic. Let me know if you have any specific questions; happy to respond.

👤 tlrss
FlatGeoBuf [1] is an encoding for geographic data (vector features, i.e. points lines polygons and so on) written around flatbuffers that is increasingly well supported in geospatial software (GDAL, MapServer) and people reporting some experiments and demos on the @flatgeobuf Twitter.

[1] https://flatgeobuf.org/


👤 mikaraento
It's used for several ML-related projects, including as the model format for TensorFlow Lite (TFLite). The TFLite format also has long-term support as part of Google Play Services. The main attraction is ability to pass large amounts of data without having to serialize/deserialize all of it to access fields.

(I work for Google but don't speak for it.)


👤 sosodev
I'm using flatbuffers as the basis of communication for my multiplayer game. They're really quite pleasant to work with after you get into the flow of it.

👤 gavinray
Flatbuffers is used in Arrow, which is a ringing endorsement

👤 joeld42
yeah they're used a lot. I think the difference is json is good for data or apis you want to be easily shared, flatbuffers (or protobuf or captnproto) are good for data that stays internal. That's just a guideline and there are plenty of exceptions but it's a starting point to thinking about it.

👤 gwbas1c
Yes, I work on a product that uses Flatbuffers to control stormwater: https://optirtc.com/

Basically, we use a rather bandwith-constrained link between our services running in the cloud, and Particle-based IoT devices deployed in many locations. Some locations are remote, some are urban.

I personally haven't had to touch the Flatbuffers code since I joined the company two years ago. It's written and hasn't needed to be maintained.


👤 covom
TensorFlow Lite (tflite) uses flatbuffers. This format, and vendor-specific forks of it, ship on hundreds of millions of phones and other embedded devices.

👤 apendleton
I experimented with them (also with capnproto) at my last job for a usecase involving dense numerical data where being able to randomly seek within a dataset would have been really helpful for speed reasons, but found that as compared to protobuf, these formats were unacceptably bulky (lots of extra padding for word alignment, etc.), and the added cost even just to read the extra data from disk mostly negated the savings from avoiding the explicit decode step, plus would have had significant implications in terms of storage cost, etc. I ended up writing a custom wire format that allowed for seeking with less wasted space.

Seems like a neat idea, but as another commenter said, the usecases where it's the best choice seem pretty narrow.


👤 maximilianburke
Yep! Our platform uses flatbuffers as the primary format for both IPC, including for web responses, and for object persistence. It's a phenomenal format; I'm super happy with it.

👤 mcint
I don't have experience with this.

1) SQLite with BLOB storage gives you binary benefits for file layout and database solutions to metadata, versioning, & indexing into large structure.

2) FlexBuffers look like a more flexible solution within the FlatBuffers library.

    FlatBuffers was designed around schemas, because when you want maximum performance and data consistency, strong typing is helpful.

   There are however times when you want to store data that doesn't fit a schema, because you can't know ahead of time what all needs to be stored.

   For this, FlatBuffers has a dedicated format, called FlexBuffers. This is a binary format that can be used in conjunction with FlatBuffers (by storing a part of a buffer in FlexBuffers format), or also as its own independent serialization format.
https://google.github.io/flatbuffers/flexbuffers.html

https://stackoverflow.com/a/47799699/1020467

3) Might see previous discussion of serialization formats on hn:

https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...


👤 felixguendling
If you're looking for something faster but C++ specific, more compact in serialized size, more efficient in serialization you can try cista: https://github.com/felixguendling/cista (Disclaimer: I'm the author, always happy for feedback, eg in the GitHub issues)

👤 jjtheblunt
Did you look at Wikipedia, which has 2 links including to one free software package using them?

https://en.wikipedia.org/wiki/FlatBuffers


👤 jhawk28
I think https://capnproto.org has gained more traction than flatbuffers.