It is driving me nuts to use auto-saved versions instead of clearly defined explicitly defined commits that can be proof-read, the lack of my favourite tools from my local environment, like vim, and troubles testing things properly.
I have already found some general hints like modularizing code etc, but I wanted to hear from people in the trenches, if you found some practices or set of practices that actually made a big difference in the development experience?
Another alternative is to split notebooks into “library notebooks” that just define transformations, and “orchestration notebooks” that use code library notebooks to execute a “business logic”.
In both approaches you can do code testing, etc.
P.S. I have a demo of both approaches here: https://github.com/alexott/databricks-nutter-repos-demo
You probably want to have a pre-commit script that deletes all the data and just leaves the code. Some people really hate that because they like having notebooks with results in them in the git repository to read, but if you have data mixed with your code you will have the worst time merging.
Tools like databricks lumber on with inadequate version control because people are used to everything being screwed up all the time when it comes to "data science".