HACKER Q&A
📣 jvanderbot

State of art distributed message-oriented middleware?


I'm generally a C, Rust, C++ programmer with a focus on embedded systems and robotics.

I'm quite familiar with Robot Operating System (ROS). Its sweet spot as middleware is a single system with many processes and peripheral sensors attached. DDS seems at home here (ros2 ~= DDS). Connecting multiple robots, infrastructure, and sensors requires kludgy bridges.

I've read much about inter/intra data center/web middlewares for distributed systems, and played with some. It seems the options here are varied and diverse. From ZMQ to ActiveMQ (and alternatives) to gRPC to AMPQ implementations to NSQ and many others. Any broker assumption is less desireable, and GO/Python are less desirable for embedded (but still good)

I've enjoyed playing with smaller IoT like applications, which seem to rely on MQTT, AMQP or other systems that require static brokers, and favor a small footprint, possibly even without TCP stacks.

Specifically:

I'd love to use ZMQ or NanoMsg, given their simplicity for new projects. Are these dead / obsolete?

What is the 'right' toolkit to learn for these domains?

What do you use (and in what language / application)?


  👤 DenseComet Accepted Answer ✓
If you have things running in a DC where connectivity between all nodes is possible, then ZMQ / NanoMsg will work, but having a broker tends to make things easier with regards to connectivity, which is why IoT always tends to use systems that require static brokers. Logging and monitoring might also be easier with a centralized broker, depending on your needs.

I've found NATS to be a pretty nice solution, its very lightweight and capable of doing low latency at most once delivery, as well as at least once. I've also found that the libraries are pretty decent and includes support for request/response among other things.


👤 captaindiego
Bit of a delayed and rambling reply.

ZMQ and NanoMsg aren't dead/obsolete, these are both used quite a bit in real production products and middlewares. As an example, I've used them in things you'd classify as middleware for distributed systems, which then run the backend for managing communications to many boxes in the wild via bad 3G connections.

You start out mentioning ROS, specifically ROS 2 and DDS. It's worth reading a bit about some of the history of ROS, and details of ROS 1 to understand why they chose DDS while designing ROS 2.

It's also looking a bit back in time at alternatives to DDS. One of these is JAUS, which was originally targeting a lot of military applications. I'd personally consider JAUS a bit out of date in it's approach, but understanding why DDS is the way it is has some value.

Next I'd recommend digging into the differences between different things that all get lumped in as middleware. DDS and JAUS are very much in the category of "messaging/comms middleware", wheras ROS 1 and 2 have a much broader scope in terms of the capabilities they provide (process orchestration, node discovery, time synchronization, etc).

Depending on what you're doing, you can often push some of these tasks onto the OS you run on, or other tools. E.g. a real time OS may have things to help with process orchestration, or even real-time messaging between processes in the same computer. Relying on your OS layer has advantages and disadvantages that are good to understand some (if you rely on QNX for everything then it gets more complicated to just spin up a ton of docker containers for testing).

Deciding the boundaries of your middleware allows you to make important simplifications that reduce complexity. If you decide you're only going to run on a single computer and rely on a hand-written proxy through a broker to talk to the outside world, you don't have to worry as much about time synchronization, serialization for networking, etc. A simple pub-sub messaging bus combined with a simple queue of work handled by a thread pool can often get you most of what you need for robotics/IoT applications in terms of messaging, logging, task execution.

If you however decide to create a middleware that functions multi-node across a networking stack, you need to cross the time sync bridge and then decide whether there's need for knowing if X happened before Y when X is on node A and Y is on node B. The more you can relax constraints like this the easier things get. If you also include the server back-end,l or other robots/devices, deciding to separate synced time domains can dramatically simplify your life (e.g. sync all time on each robot, but don't sync time between them or the server). Digging into PTP/IEEE1588 protocol stuff is definitely a worthy endeavor. A book that's helped me when I've been going really deep on fancier cross-node time synced logging mechanisms in middlewares was Garg's Elements of Distributed Computing as well as some of Leslie Lamport's papers.

If you can rely on embedded hardware for meeting serious real-time constraints you can also relax more. Instead of trying to do a 1kHz motor control loop in Linux with real-time extensions, you could send messages to an Arduino that handles that loop instead.

Suffice it to say there is no 'right' toolkit for these domains, it very much depends on your application. This is why you'll see a lot of companies working in these spaces making their own custom middlewares.

  What do you use (and in what language / application)?
For robotics and distributed backend middleware, I tend to go full custom for simple applications (simple pub-sub messaging with work queue serviced by thread pool). For more complex things I'll start to add in a messaging middleware (DDS for robotics, NanoMsg/ZMQ for more server targetted). For both I'd currentl favor C++, the industry robotics world (which I've experienced) is very much in the C++ world for real-world applications. However Rust is is definitely coming along, and I've had good results with ZMQ+Python previously.

👤 enduku
QNX maybe?