Do you have one your recommend for a new small business? Are there any that give better support for ML in addition to application code, such as data and model versioning options? What else would you ask if you were in my shoes?
To do otherwise is bike shedding. Every comment you read in this thread, every faq you open, free account you create to poke around. It’s time and energy that could have been spent working on the product. Far more importantly though, is the false sense of achievement you’d feel having spent time choosing bitbucket or whatever wins out, while having not progressed.
The only other modern alternative is Fossil, used for the SQLite project.
They are both good. Fossil is probably more reliable, but the difference is unlikely to affect your business. I have had Git repositories become corrupt, where I had to clone a new copy and abandon the old one. That was OK only because I was not maintaining private branches on them.
The problem with the commercial Git wrapper services is that they try to lock you in. All the secondary parts are not kept in the same, or any, repository, so if you come to want to migrate, it is a big chore. That is intentional.
I would like to know of an online Git service that keeps everything about a project in the same, or anyway some, clonable archive.
Use Github, ignore this thread, and spend your time on revenue-generating activities. You can switch later if you need to, and you aren't going to need to.
Generally I'd view data and trained model versioning to be separate, but linked to the training code versioning. In an ideal world you end up with a system where data version+training code version is in the metadata of a given model version, but there's plenty of other aspects of the data science themed addons to consider.
At the small business scale it doesn’t really matter, won’t materially improve your product and every second spent thinking about it is a second spent not thinking about your actual business.
Currently use GitHub and my company uses BitBucket. I've used SVN in the past.
I think a small company doesn't need BitBucket since it's really just extended for integrating with JIRA.
I use GitHub and GitLab for clients. I use GitLab for my personal repos, because it's free offerings were better. I don't recommend one over the other, per se - both have advantages. GitLab's all inclusive CI is easier to use if you understand it well, but you need to have a tools guy who really understands the value of building your own and not building your own. GitHub as the defacto leader has more third party integrations - I would use Circle CI over GitHub Actions, because actions is very inflexible at the moment. It is planned to get better, and I believe it will.
ML support is not on any of their radar, and specifically for that, GitLab's ability to drop in your own runners would come in very handy, but data versioning support is not a first class feature - Though Git-LFS is as good there as elsewhere.
Go with something simple, tried and tested. I prefer Github. Never warmed up to the idea of Gitlab's UI and few years ago tried its OSS version which was slow as hell (this was in 2015 so it's been a while). I used to be on Bitbucket but when they got acquired by Atlassian, game over.
Now, they all lock you in their own issue and release tracker. I would go with github, just because it's pretty much the industry standard (some OSS projects moved to gitlab but keep a mirror on github just because there's so many users and it's where people expect to find the official repo).
I'm not familiar with the kinds of files ML deals with. But if you want to version control huge data dumps or non-textual files, git is not ideal. Or use it as a general purpose "share folder" to store random things. A git repo is best fit to store source code for a single project.
SVN might be OK if you insist on using 1 source control system for both code and dumps.
But I don't think you need to force all things into 1 system. It would be acceptable to store pure source code in git. And dumps in something else, maybe using old school backup strategies rather than source control if they are truly massive.