HACKER Q&A
📣 hislaziness

Consolidating Multiple Data Sources


How do you connect multiple data sources? I have a usecase where I have multiple data sources batch and streaming that I need to analyze together. I have used a database to consolidate the various sources but I do not get the realtime outcome I need. I am exploring https://getdozer.io/ any suggestions / feedback?


  👤 gunnarmorling Accepted Answer ✓
Sounds like a great use case for Debezium (capturing changes from databases with low latency) and Apache Flink (for processing these change event streams, e.g. filering them, joining them, applying pattern searches, putting aggregated data to a dashboard, etc.

Disclaimer: I work for Decodable, where we build a managed platform around these technologies and their use cases


👤 ronnykylin
I’ve just learned about the Multi-Catalog feature of Apache Doris (an analytic database). It allows you to connect to various data sources without worrying about data transfer and query data from multiple external sources as simply as querying internal data. (https://doris.apache.org/docs/dev/lakehouse/multi-catalog/)

👤 iamdeedubs
Definitely check out https://debezium.io/. I've been using to stream data out of mongo and postgres to great effect.

You can use it with kafka-connect or a standalone process.