1. All collaboration requires that collaborators are able to recreate a shared version of reality
2. This means version controlling all the things
3. For 'normal' software teams it's often ok to do this for just code and environment, hence git + docker
4. But for data science teams they need to worry about more variables; code, environment, training + test data, hyper-parameters, summery statistics...
GitLFS allows teams to track training and test data (up to 2GB unless you run your own server IIRC) which removes a lot of the headaches around building tooling to tie all these variables together with, for example, Git + Docker + S3.
Dotscience.com is a good example of a project trying to solve this neatly.
Disclaimer: I used to work there.