HACKER Q&A
📣 flimertrunk

Using Databricks and Snowflake Together


My company's IT team chose snowflake as a new data platform without consulting the data science team (we are in a different department). Now they want to force us to move all of our work into it.

We are planning to propose a department databricks instance to live alongside snowflake. We would let snowflake serve as the data warehouse, and use databricks for ML processing (R jobs, Python jobs, MLFlow, autoML, built in notebooks/git integration) pushing results back into snowflake when appropriate (e.g. needed in a BI tool or something).

I'm expecting pushback on this and wondering if people could share questions/problems I might run into going this route so I can think through them and be prepared to answer.

The main one I know is coming is "why can't you use snowpark", answer: we are heavy R users, snowpark/udfs are clunky, no desire to convert everything to python, no notebook interface built into snowflake.

On expenses there will be some extra in paying for both platforms when we're moving data back and forth, but I suspect the cost might actually be less on databricks to run processing intense ML jobs (I plan to test this). Storage cost is not a factor for us.


  👤 buzzscale Accepted Answer ✓
Databricks Employee here. Contact your Databricks account team, particularly your Account Executive, they can help articulate this to your IT team.

There was a thread on reddit about this:

https://www.reddit.com/r/dataengineering/comments/121mm5c/ma...