HACKER Q&A
📣 leokster

Open source alternative equivalent to Palantir Foundry


There was a discussion back in 2017 (https://news.ycombinator.com/item?id=13488795) regarding this topic. I am curious if there are any open-source projects available for creating "self-service data platforms," or a similar concept, akin to Palantir Foundry. Specifically, I'm looking for:

- A platform that includes an execution engine like Spark or another option for creating datasets (akin to Databricks).

- A system similar to Dagster, Airflow, or Prefect for building DAGs of datasets.

- A setup supporting multiple projects or repositories where users can easily define their transformations and version their code (similar to Git).

- A feature for managing data lineage (similar to what Palantir offers) to control access to datasets, requiring access to all upstream datasets for a given dataset.

- A granular access management system for individual datasets.

While the individual components exist, Palantir's value comes from integrating these elements. An architecture combining these tools, along with a simple frontend for interaction, might suffice.

The ultimate aim is to empower business teams within an organization to independently develop "data products" and provide these products to other teams for further development into more complex data products.

Are you aware of such a project, did you face similar issues in your work or is anyone interested in discussing these topics further?


  👤 guwop Accepted Answer ✓
It would be cool. But it is so many already-opensourced things that needs to go together in a good way... i think thats palantirs MOAT

👤 superchink
Databricks meets each of these requirements.

👤 rabbit_man_29
I would be willing to work on something like this with other interested people