HACKER Q&A
📣 orbOfOrthanc

What's your biggest complaint about Apache Spark?


Could be anything from current state, to lack of features, to something else


  👤 throw_gabriel Accepted Answer ✓
My biggest complaint is not about Spark itself, but what people make of it. I'm currently at a company (and I have seen this before in others) where we are spending millions of dollars a year to run huge Spark infrastructures for data processing that could be replaced by a couple dozen servers running well architected apps.

I think there is certain use cases/envs where Spark makes a lot of sense, but I don't think is viable for most cases/teams, specially if you don't plan to use Databricks. The vanilla developer experience is pretty rough: automation is lacking, UI is pretty bad, local dev environments (beyond "hello world" level) are hard to setup, etc; and that's not even accounting for the infrastructure deployment/management side of things.

Mix all this with the fact that (at least in my experience) DE and DS are not know for writing robust/defensive code, and you get systems that break very often.