I also hardly use Git submodules, but when I do, I don't struggle.
Yet people talk about Git submodules as though they are really hard. I presume I'm just not using them as much as other people, or that my use case for them happens to be on their happy path.
So why are Git submodules so bad?
1. Git clone not cloning submodules. You need `git submodule update` or `git clone --recursive`, I think
2. Git submodules being out-of-sync because I forgot to pull them specifically. I'm pretty sure `git submodule update` doesn't always work with this but maybe only when 3)
3. Git diff returns something even after I commit, because the submodule has a change. I have to go into the submodule, and either commit / push that as well or revert it. Basically every operation I do on the git main I need to also do on the submodule if I modified files in both
4. Fixing merge conflicts and using git in one repo is already hard enough. The team I was working on kept having issues with using the wrong submodule commit, not having the same commit / push requirements on submodules, etc.
All of these can be fixed by tools and smart techniques like putting `git submodule update` in the makefile. Git submodules aren't "bad" and honestly they're an essential feature of git. But they are a struggle, and lots of people use monorepos instead (which have their own problems...).
I think they're difficult to use, because it breaks my mental model of how I expect a repository to work. It creates a new level of abstraction for figuring out how everything is related, and what commands you need to be able to keep things in sync (as opposed to just a normal pull/branch/push flow). It creates a whole new layer to the way your VCS works the consumer needs to understand.
The two alternatives are
1. Have a bunch of repositories with an understanding of what an expected file structure is, ala ./projects/repo_1, ./projects/repo_2. You have a master repo with a readme instructing people on how to set it up. In theory, there's a disadvantage here in that it puts more work on the end user to manually set that up, but the advantage is there's a simpler understanding of how everything works together.
2. A mono repo. If you absolutely want all of the files to be linked together in a large repo, why not just put them in the same repo, rather than forking everything out across many repos. You lose a little flexibility in being able to mix and match branches, but nothing a good cherry-pick when needed can't fix.
Either of these strategies solve the same problem sub-modules are usually used to solve, without creating a more burdensome mental model, in my opinion. So the question becomes why use them and add more to understand, if there are simpler patterns to use instead.
There was no gotcha of a non-recursive clone/checkout. If you've used this feature, your users wouldn't keep getting "broken" checkouts.
There was no gotcha of state split between .gitmodules, top-level .git state, and submodule's .git, and the issues caused by them being out of sync.
There was no gotcha of referencing an unpushed commit.
Submodules are weirdly special and let all of their primitive implementation details leak out to the interface. You can't just clone the repo any more, you have to clone recursively. You can't just checkout branches any more, you have to init and update the submodules too, with extra complications if the same submodules don't exist in all branches. You can't just commit/push, you have to commit/push the submodules first. With submodules every basic git operation gets extra steps and novel failure modes. Some operations feel outright buggy, e.g. rebase gets confused and fails when an in-tree directory has been changed to a submodule.
Functionality provided by submodules is great, but the implementation feels like it's intentionally trying to make less-than-expert users feel stupid.
So you have to understand the tradeoffs and make every decision at every step. It's the safe option.
Like, what happens if you remove a submodule between revisions? Git won't remove the files, you could have stuff in there. So it just dangles there, as a pile of changed files that you now have to separately, manually remove or commit, because it's no longer tracked as a submodule. And then repeat this same kind of "X could sometimes be risky, so don't do it" behavior for dozens of scenarios.
All of which is in some ways reasonable, and is very much "Git-like behavior". But it's annoying, and if you don't really understand it all it seems like it's just getting in your way all the time for no good reason. Git has been very very slowly improving this behavior in general, but it's still taking an extremely conservative stance on everything, so it'll probably never be streamlined or automagic - making one set of decisions implicitly would get in the way of someone who wants different behavior.
I've always thought of them as a way to "vendor" a git repository, i.e. declare a dependency on a specific version of another project. I thought they made sense to use only when you're not actively developing the other project (at least within the same mental context). If you did want to develop the other project independently, I thought it best to clone it as a non-submodule somewhere else, push any commits, then pull them down into the submodule.
One thing I haven't seen mentioned in this thread though is that they force an opinion of HTTPS vs SSH for the remote repository.
If a developer usually uses SSH, their credential manager might not be authenticated over HTTPS (if they even have one configured at all!) If they usually use HTTPS, they might not even have an SSH keypair associated with their identity. If they're on Windows setting up SSH is generally even higher friction than it is on Linux.
For someone just casually cloning a repository this is a lot of friction right out of the gate, and they haven't even had to start dealing with deciphering your build instructions yet!
-------
Personally I still use Git submodules despite their flaws because they mesh well with my workflow. (This is partially due to the fact that when I transitioned from Hg to Git it was at a job that used them heavily.)
The reality is every solution to bringing external code into your project (whether it's using submodules, subtrees, tools like Josh, scripts to clone separately, IDE features for multi-repo work, ensuring dependencies are always packages, or just plain ol' copy+pasting) all have different pros and cons. None of them are objectively bad, none are objectively best for all situations. You need to determine which workflow makes the most sense for you, your project, and your collaborators.
[0]: https://github.com/git/git/blob/master/contrib/subtree/git-s...
I've not seen many uses of submodules that weren't better served by adding the package from pypi/npm/crates/...
You end up with a lot of gotchas instead of them just working.
The mental model of juggling multiple repos in a non-atomic way also violates the rule of least astonishment. Working with read only submodules smooths this part out at least.
GUI support is slowly getting better at least.
I'd like to jettison the entire model of "reference another repository that you may or may not have", along with the `.gitmodules` file as anything other than optional metadata, and instead have them be a fully integrated thing that necessarily comes along with any pull/push/clone.
I once enabled it as submodule to test. It adds nothing and from that moment any change in the child creates a change in the parent, which for my use case is totally unnecessary (I want both of them to be independent, even if hierarchically one is inside the other).
Submodules are probably a good option to have libraries that you rarely touch, so you can update/modify them as with a maven/gradle project. For most other user cases submodules make more problems that advantages.
An even bigger problem is when you start substituting a dependency manager with submodules. It has no way to deal with transitive dependencies or diamond-dependencies. What are you going to do when lib A->B->D and A->C->D? Your workspace will now have two duplicate checkouts of D and any update to D requires commiting 3 repos in sequence to update the hashes. If you are really unlucky there can only be one instance of D running on the system but the checkouts differ.
The correct way to deal with this is to only have one top level superproject where all repos, even transitive ones are vendored and pinned. The difficulty is knowing if your repo really is the top level, or if someone else will include it in an even bigger context. Rule of thumb would be that your superproject shouldn’t itself contain any code, only the submodules.
i actually ended up creating https://carvel.dev/vendir/ for some of the overlapping use cases. aside from not being git specific (for source content or destination), its entirely transparent to consumers of the repo as they do not need to know how some subset of content is being managed. (i am of course a fan of committing vendored content into repos and ignore small price of increasing repo size).
* some folks had to deal with a LOT of submodules back in the day; for instance, it wasn't uncommon to have a dozen+ in your "vendor/plugins" directory in a Rails 1.x app. More submodules, more problems
* sometimes submodules get used to "decompose a monolith" into a confusing tangle of interdependent repos. Actually changing stuff winds up requiring coordination across multiple repos (and depending on the org, multiple teams) and velocity drops to zero. Eventually somebody declares "submodules SUCK!!!!!one!!!" and fixes things by re-monolithing...
Annoying to setup and keep in sync "correctly" (for given project, EDIT: especially edits).
Sure, this depends a bit on the other tooling and for what reason you use.
This doesn't meant they are hard or complicated, but the moment the defaults do not do what most times is needed and you can't change the default (in the project, instead of user settings) i.e. need to do additional manual steps all the time some people will hate it many will dislike it.
Moreover, when you do the initial fetch, you can limit the depth. That will save space if you don't care for the full history of that repo.
If you lack a dependency mgmt system, work mostly solo or in very small and tightly coordinated team and just need some githubbed project to make yours work, submodules may be the right tool for you.
I hate them slightly more than submodules. Git describe? Totally broken. Everything else is a tangled mess of junk.
But what armchairhacker says is SPOT ON.
You know, the people that already carry everyone else through their job.
---
My colleagues and myself lost work a few times when working on a project whose top-level Makefile / build script ran some "git submodule" commands from time to time, when they detected that a submodule appeared to be out of date.
Those commands wiped work in progress, because of submodules' tendancy to leave junk around when switching to-level branches, or the set of submodules changed, or the same with recursion over vendor submodules. That junk caused a few end-users to end up building and running incorrect code, or to get unexpected build errors, so the policy was for the build scripts to wipe it.
In other words, policy was to prioritise the experience for end-users who cloned the repo and occasionally updated or switched branches.
Unfortunately that meant it clobbered work in progress by devs occasionally. If you were careful it wouldn't happen, but if you left a trail of small changes and forgot about them, and did a git pull or such to get on with other work, intending to still find those work in progress changes later, they'd sometimes be gone. Such changes were things like those which needed to edit inside and outside a module (e.g. for API changes), or improvements to a submodule's code that there was no hurry to finish or commit, changes to add diagnostics, spotted improvements to tests, etc.
Often when those changes were wiped, the dev wouldn't notice for a while. Then later, "eh, I thought I had already done this before...".
My solution was to stop using the standard build target, and remember to use "git submodule" myself at appropriate times such as when switching branches and updating. That way I never lost any work, but it was not how I was "supposed" to work.
The team discussed improvements to the auto-update-clean-and-hard-reset commands, to make it check more carefully so it wouldn't run as often. But the problem remained, and that refinement made the build options rather ugly, a kind of "make REALLY_REALLY_DONT_UPDATE=1" sort of thing. Sensible defaults that Just Work™ for everyone were never found.
I also found submodules annoyingly time-consuming when making many changes that spanned inside and outside modules or across modules. The dance backward and forward to make a PR in the module to something that's compatible with before-and-after, then a PR in what uses it, then a PR clean up the module, perhaps multiple iterations needed, with each step having a potentially slow review cycle in between, over GitHub's slow web interface as well. Understandable for stable APIs shared among multiple projects, but pointless busywork (and worse commit logs) for small changes that could be a single PR in a monorepo.
(ps. Please see first line of this comment.)