I built a prototype for a system that streamed JSON-like structures (implemented in RDF) through various processing “boxes” like you’d see in a tool like LabView or Alteryx or KNIME. The system used production rules to set up and tear down a reactive streaming fabric, the tear down being important if you want to get the right answers in batch mode. The system would download the data dump from
and do some data cleanup and indexing and merging with other data sets to make a web site for browsing that data. I made an AMI that would build a new copy of the site when it booted up and thus updated the site every day.
The theory behind it was that the data pipelines were more scalable than batch SPARQL queries and easier to create than many other tools so that we could make a system that looks at the metadata, does some profiling, automatically builds a draft of the data import script, the script could then be whipped into shape by applying “patches” to it, facilities for testing the ETL and parts of it would be built in too.
One bit of feedback we got from people in the data analytics space was that our system wouldn’t support columnar query processing so it was too slow and they wanted nothing to do with it.
When I look at that time period and how it worked out I think now the market of low-code development of applications is a better one than low-code support for analytics but the application is the thing that makes money and the spend on analytics is almost a rounding error compared to operations.
If you want to see my decks look up my profile.