HACKER Q&A
📣 tomrod

Which version control system should a new business in 2022 use?


There seem to be a whole slew of version control systems out there. The three I have limited familiarity with are Github, Gitlab, and Bitbucket.

Do you have one your recommend for a new small business? Are there any that give better support for ML in addition to application code, such as data and model versioning options? What else would you ask if you were in my shoes?


  👤 austhrow743 Accepted Answer ✓
Imo with every single thing that’s not your businesses core advantage or differentiation, you should go with what you’re familiar with. If I was in your shoes i wouldn’t let myself turn it in to a decision to decide. Whatever I was already using would win by default.

To do otherwise is bike shedding. Every comment you read in this thread, every faq you open, free account you create to poke around. It’s time and energy that could have been spent working on the product. Far more importantly though, is the false sense of achievement you’d feel having spent time choosing bitbucket or whatever wins out, while having not progressed.


👤 ncmncm
All three of those are Git.

The only other modern alternative is Fossil, used for the SQLite project.

They are both good. Fossil is probably more reliable, but the difference is unlikely to affect your business. I have had Git repositories become corrupt, where I had to clone a new copy and abandon the old one. That was OK only because I was not maintaining private branches on them.

The problem with the commercial Git wrapper services is that they try to lock you in. All the secondary parts are not kept in the same, or any, repository, so if you come to want to migrate, it is a big chore. That is intentional.

I would like to know of an online Git service that keeps everything about a project in the same, or anyway some, clonable archive.


👤 smt88
This is definitely bikeshedding[1]. Your choice of git host has nothing to do with your chances of success.

Use Github, ignore this thread, and spend your time on revenue-generating activities. You can switch later if you need to, and you aren't going to need to.

1. https://en.m.wiktionary.org/wiki/bikeshedding


👤 fundamental
For code, git has won out among the other options. As per ML data+model versioning that area is still evolving and what the right choices are there depends on ML frameworks as well as your approaches to deploying new models.

Generally I'd view data and trained model versioning to be separate, but linked to the training code versioning. In an ideal world you end up with a system where data version+training code version is in the metadata of a given model version, but there's plenty of other aspects of the data science themed addons to consider.


👤 remram
Those are not version control systems.

👤 orf
If you don’t know any VCS, use git. If you know and are very familiar with another VCS, use that.

At the small business scale it doesn’t really matter, won’t materially improve your product and every second spent thinking about it is a second spent not thinking about your actual business.


👤 giantg2
I think technically those three are all git. I believe you can host a get repo/server yourself too.

Currently use GitHub and my company uses BitBucket. I've used SVN in the past.

I think a small company doesn't need BitBucket since it's really just extended for integrating with JIRA.


👤 GauntletWizard
The answer to the question you asked is unquestionably "Git", but the hosting provider for said git repository is open for debate.

I use GitHub and GitLab for clients. I use GitLab for my personal repos, because it's free offerings were better. I don't recommend one over the other, per se - both have advantages. GitLab's all inclusive CI is easier to use if you understand it well, but you need to have a tools guy who really understands the value of building your own and not building your own. GitHub as the defacto leader has more third party integrations - I would use Circle CI over GitHub Actions, because actions is very inflexible at the moment. It is planned to get better, and I believe it will.

ML support is not on any of their radar, and specifically for that, GitLab's ability to drop in your own runners would come in very handy, but data versioning support is not a first class feature - Though Git-LFS is as good there as elsewhere.


👤 codegeek
"New Small business"

Go with something simple, tried and tested. I prefer Github. Never warmed up to the idea of Gitlab's UI and few years ago tried its OSS version which was slow as hell (this was in 2015 so it's been a while). I used to be on Bitbucket but when they got acquired by Atlassian, game over.


👤 908B64B197
The three listed use git as their interface and back-end.

Now, they all lock you in their own issue and release tracker. I would go with github, just because it's pretty much the industry standard (some OSS projects moved to gitlab but keep a mirror on github just because there's so many users and it's where people expect to find the official repo).


👤 foobarbaz33
> better support for ML... such as data and model versioning options?

I'm not familiar with the kinds of files ML deals with. But if you want to version control huge data dumps or non-textual files, git is not ideal. Or use it as a general purpose "share folder" to store random things. A git repo is best fit to store source code for a single project.

SVN might be OK if you insist on using 1 source control system for both code and dumps.

But I don't think you need to force all things into 1 system. It would be acceptable to store pure source code in git. And dumps in something else, maybe using old school backup strategies rather than source control if they are truly massive.


👤 pkrumins
Definitely Github. Everyone is familiar with Github and can instantly start using it without thinking or learning. Rarely anyone has experience with Gitlab or Bitbucket and you'll spend hours figuring out how to get started and where's what.

👤 quickthrower2
Generally Git/Github is a good choice. But you could throw a 3 sided dice and pick any of them and be fine.

👤 bluehuman
For database schema versioning, you may try bytebase.com, it also can work seamlessly with gitlab.

👤 swah
Git + Github/Gitlab, don´t think about it.

👤 crumbits
I’d use CVS, it’s old but very well tested, it’s easy to use, and you can avoid dollar-sucking platforms and services easily with it.