HACKER Q&A
📣 gavinhoward

Why are Git submodules so bad?


I have been a git user for a long time, but I've never used Subversion or any other VCS more than a little.

I also hardly use Git submodules, but when I do, I don't struggle.

Yet people talk about Git submodules as though they are really hard. I presume I'm just not using them as much as other people, or that my use case for them happens to be on their happy path.

So why are Git submodules so bad?


  👤 armchairhacker Accepted Answer ✓
Git submodules are fine and can be really useful, but they are really hard. I've run into problems like:

1. Git clone not cloning submodules. You need `git submodule update` or `git clone --recursive`, I think

2. Git submodules being out-of-sync because I forgot to pull them specifically. I'm pretty sure `git submodule update` doesn't always work with this but maybe only when 3)

3. Git diff returns something even after I commit, because the submodule has a change. I have to go into the submodule, and either commit / push that as well or revert it. Basically every operation I do on the git main I need to also do on the submodule if I modified files in both

4. Fixing merge conflicts and using git in one repo is already hard enough. The team I was working on kept having issues with using the wrong submodule commit, not having the same commit / push requirements on submodules, etc.

All of these can be fixed by tools and smart techniques like putting `git submodule update` in the makefile. Git submodules aren't "bad" and honestly they're an essential feature of git. But they are a struggle, and lots of people use monorepos instead (which have their own problems...).


👤 d_watt
Git submodules aren't bad in that they're buggy, they do what the documentation suggests.

I think they're difficult to use, because it breaks my mental model of how I expect a repository to work. It creates a new level of abstraction for figuring out how everything is related, and what commands you need to be able to keep things in sync (as opposed to just a normal pull/branch/push flow). It creates a whole new layer to the way your VCS works the consumer needs to understand.

The two alternatives are

1. Have a bunch of repositories with an understanding of what an expected file structure is, ala ./projects/repo_1, ./projects/repo_2. You have a master repo with a readme instructing people on how to set it up. In theory, there's a disadvantage here in that it puts more work on the end user to manually set that up, but the advantage is there's a simpler understanding of how everything works together.

2. A mono repo. If you absolutely want all of the files to be linked together in a large repo, why not just put them in the same repo, rather than forking everything out across many repos. You lose a little flexibility in being able to mix and match branches, but nothing a good cherry-pick when needed can't fix.

Either of these strategies solve the same problem sub-modules are usually used to solve, without creating a more burdensome mental model, in my opinion. So the question becomes why use them and add more to understand, if there are simpler patterns to use instead.


👤 pornel
The SVN implementation worked pretty seamlessly, almost like a regular subdirectory.

There was no gotcha of a non-recursive clone/checkout. If you've used this feature, your users wouldn't keep getting "broken" checkouts.

There was no gotcha of state split between .gitmodules, top-level .git state, and submodule's .git, and the issues caused by them being out of sync.

There was no gotcha of referencing an unpushed commit.

Submodules are weirdly special and let all of their primitive implementation details leak out to the interface. You can't just clone the repo any more, you have to clone recursively. You can't just checkout branches any more, you have to init and update the submodules too, with extra complications if the same submodules don't exist in all branches. You can't just commit/push, you have to commit/push the submodules first. With submodules every basic git operation gets extra steps and novel failure modes. Some operations feel outright buggy, e.g. rebase gets confused and fails when an in-tree directory has been changed to a submodule.

Functionality provided by submodules is great, but the implementation feels like it's intentionally trying to make less-than-expert users feel stupid.


👤 Groxx
Submodules are just complicated because Git makes no decisions at all about how they should behave, beyond "never make a decision that could lose data".

So you have to understand the tradeoffs and make every decision at every step. It's the safe option.

Like, what happens if you remove a submodule between revisions? Git won't remove the files, you could have stuff in there. So it just dangles there, as a pile of changed files that you now have to separately, manually remove or commit, because it's no longer tracked as a submodule. And then repeat this same kind of "X could sometimes be risky, so don't do it" behavior for dozens of scenarios.

All of which is in some ways reasonable, and is very much "Git-like behavior". But it's annoying, and if you don't really understand it all it seems like it's just getting in your way all the time for no good reason. Git has been very very slowly improving this behavior in general, but it's still taking an extremely conservative stance on everything, so it'll probably never be streamlined or automagic - making one set of decisions implicitly would get in the way of someone who wants different behavior.


👤 arjvik
What's the mental model for the use of a git submodule?

I've always thought of them as a way to "vendor" a git repository, i.e. declare a dependency on a specific version of another project. I thought they made sense to use only when you're not actively developing the other project (at least within the same mental context). If you did want to develop the other project independently, I thought it best to clone it as a non-submodule somewhere else, push any commits, then pull them down into the submodule.


👤 Pathogen-David
As many others in this thread have stated, the main issue is they have fairly poor UX and if you aren't used to them they can be pretty annoying. They especially have quirks when they're removed from (or moved within) an existing Git repository.

One thing I haven't seen mentioned in this thread though is that they force an opinion of HTTPS vs SSH for the remote repository.

If a developer usually uses SSH, their credential manager might not be authenticated over HTTPS (if they even have one configured at all!) If they usually use HTTPS, they might not even have an SSH keypair associated with their identity. If they're on Windows setting up SSH is generally even higher friction than it is on Linux.

For someone just casually cloning a repository this is a lot of friction right out of the gate, and they haven't even had to start dealing with deciphering your build instructions yet!

-------

Personally I still use Git submodules despite their flaws because they mesh well with my workflow. (This is partially due to the fact that when I transitioned from Hg to Git it was at a job that used them heavily.)

The reality is every solution to bringing external code into your project (whether it's using submodules, subtrees, tools like Josh, scripts to clone separately, IDE features for multi-repo work, ensuring dependencies are always packages, or just plain ol' copy+pasting) all have different pros and cons. None of them are objectively bad, none are objectively best for all situations. You need to determine which workflow makes the most sense for you, your project, and your collaborators.


👤 tommyjl
I recently started using git subtree[0] instead of dealing with all the problems with git submodules, and have been very happy with the experience so far. It does copy every file into your repository, though.

[0]: https://github.com/git/git/blob/master/contrib/subtree/git-s...


👤 FrenchyJiby
Beyond how hard to use they may or may not be, my personal hatred of git submodules is about bypassing your normal dependency management system. See 12 Factors on Dependencies[1].

I've not seen many uses of submodules that weren't better served by adding the package from pypi/npm/crates/...

[1]: https://12factor.net/dependencies


👤 jayd16
The security model seems to be "terrible UX defaults that you turn to unsafe defaults instead."

You end up with a lot of gotchas instead of them just working.

The mental model of juggling multiple repos in a non-atomic way also violates the rule of least astonishment. Working with read only submodules smooths this part out at least.

GUI support is slowly getting better at least.


👤 TillE
It's really annoying that submodules give you a detached head by default, so working on a submodule within a project is prone to mistakes. Otherwise they've been fine for me.

👤 JoshTriplett
The biggest reason I find git submodules painful: a "commitlink" object in a git tree does not count as a reference to that commit or anything that commit references, for the purposes of garbage collecting the repository or pushing and pulling changes. You can't have the only reference in your repository to a given commit be a commitlink within another tree.

I'd like to jettison the entire model of "reference another repository that you may or may not have", along with the `.gitmodules` file as anything other than optional metadata, and instead have them be a fully integrated thing that necessarily comes along with any pull/push/clone.


👤 howinteresting
The problem with submodules is that they're read-write. Read-only submodules would be completely fine.

👤 anonymoushn
If you work on blorp which contains openresty which contains luajit, and you have a patch to luajit that you are making because it enables some work in the end product, you need to make a commit to luajit, make a commit to openresty with your changes, make another commit openresty to change the luajit version, make a commit to blorp with your changes, and make another commit to blorp to change the openresty version. You will create 3 code reviews none of which actually contain all of your changes together. Your coworkers have decent odds of not being able to build their software because they don't have a keybind for `git submodule update --init --recursive` yet.

👤 TrianguloY
I don't use submodules, but I do use git repositories inside other git repositories, and let IntelliJ manage them both simultaneously as if they were two different projects. Works fine.

I once enabled it as submodule to test. It adds nothing and from that moment any change in the child creates a change in the parent, which for my use case is totally unnecessary (I want both of them to be independent, even if hierarchically one is inside the other).

Submodules are probably a good option to have libraries that you rarely touch, so you can update/modify them as with a maven/gradle project. For most other user cases submodules make more problems that advantages.


👤 WanderPanda
Whats up with git submodules refusing to clone/update/checkout submodules while having all their files showing up as deleted? I encounter this quite a lot and the solution seems to be a git submodule sync --recursive (or something like that) but I don't get why I run into this in the first place? Probably related to forgetting --recurse-submodules when cloning but what do I know?

👤 lobocinza
Git submodules are cool but can be confusing because people are already used to their language package manager. They also add overhead as changes frequently have to be pulled/pushed downstream/upstream. But in cases where it makes sense to use it, it's a great tool. Eg: theme that is reused in 3 sites is in it's own repo and is a submodule in each site.

👤 Too
A lot of people here complain about the complex UX. This a big problem but is something you get used to and can live with.

An even bigger problem is when you start substituting a dependency manager with submodules. It has no way to deal with transitive dependencies or diamond-dependencies. What are you going to do when lib A->B->D and A->C->D? Your workspace will now have two duplicate checkouts of D and any update to D requires commiting 3 repos in sequence to update the hashes. If you are really unlucky there can only be one instance of D running on the system but the checkouts differ.

The correct way to deal with this is to only have one top level superproject where all repos, even transitive ones are vendored and pinned. The difficulty is knowing if your repo really is the top level, or if someone else will include it in an even bigger context. Rule of thumb would be that your superproject shouldn’t itself contain any code, only the submodules.


👤 AceJohnny2
Git submodules also don't interact well with worktrees [1]. Do not try to change the branch of a worktree that has a submodule.

[1] https://git-scm.com/docs/git-worktree


👤 cppforlife
i never found myself struggling with submodules, but at times i found myself just slightly annoyed (especially when having to remove/replace submodules), especially when they are used for simpler use cases.

i actually ended up creating https://carvel.dev/vendir/ for some of the overlapping use cases. aside from not being git specific (for source content or destination), its entirely transparent to consumers of the repo as they do not need to know how some subset of content is being managed. (i am of course a fan of committing vendored content into repos and ignore small price of increasing repo size).


👤 Izkata
They act almost the same as pinned-revision svn externals, which people don't really seem to have a problem with. The biggest difference I can think of is needing a special command to pull in the submodules, where svn pulls its externals automatically.

👤 andix
I always ran into issues when switching branches, merging or rebasing. And then you have to figure out what’s going on. If you’re not used to work with submodules, that’s the moment where you have to learn it. And a lot of people get overwhelmed then.

👤 al2o3cr
A couple things off the top of my head:

* some folks had to deal with a LOT of submodules back in the day; for instance, it wasn't uncommon to have a dozen+ in your "vendor/plugins" directory in a Rails 1.x app. More submodules, more problems

* sometimes submodules get used to "decompose a monolith" into a confusing tangle of interdependent repos. Actually changing stuff winds up requiring coordination across multiple repos (and depending on the org, multiple teams) and velocity drops to zero. Eventually somebody declares "submodules SUCK!!!!!one!!!" and fixes things by re-monolithing...


👤 dathinab
They are unergonomic!

Annoying to setup and keep in sync "correctly" (for given project, EDIT: especially edits).

Sure, this depends a bit on the other tooling and for what reason you use.

This doesn't meant they are hard or complicated, but the moment the defaults do not do what most times is needed and you can't change the default (in the project, instead of user settings) i.e. need to do additional manual steps all the time some people will hate it many will dislike it.


👤 dekhn
Because git is a version control system that is so in love with its data structure, it can't find time to make the rest of the system coherent or useful.

👤 kazinator
Instead of submodules, you can just fech whatever repo you want into your repo. Create a tracking branch for it which moves the stuff into a subdirectory, and then merge that to your master.

Moreover, when you do the initial fetch, you can limit the depth. That will save space if you don't care for the full history of that repo.


👤 sto_hristo
They have their uses and misuses. People misuse them a lot, get burned, they blame them, and then they hate them.

If you lack a dependency mgmt system, work mostly solo or in very small and tightly coordinated team and just need some githubbed project to make yours work, submodules may be the right tool for you.


👤 zbuf
If submodules were really slick, they'd become the primary way to manage build-time dependencies.

👤 senorsmile
They're very useful. git submodule update --init --recursive will cover 99% of the time. There are weird ways of working with updates to submodules, but for the most part everything just works.

👤 mkl95
Git submodules aren't so bad, but they are a pretty leaky abstraction.

👤 bjt2n3904
We've just transitioned from using submodules at work to subtrees.

I hate them slightly more than submodules. Git describe? Totally broken. Everything else is a tangled mess of junk.

But what armchairhacker says is SPOT ON.


👤 tkuraku
Git submodules aren't perfect, but they can definitely be useful. I use them all the time and for my use case they are a good solution. They do take some getting used to though.

👤 normaltool
I don't think most people have a complete idea about git submodules. but I think it's easier to say bad directly because everyone around him is vilifying it.

👤 neallindsay
I think Git submodules are good, but they have a very narrow set of use-cases. Sometimes people use submodules when what they really want is a sub-tree merge.

👤 sascha_sl
Submodules aren't bad. But in a world where I have to explain that running revert on the 250 megs of jar files someone comitted isn't a fix, and where people often just delete and re-clone entire repos because they don't know what's going on, they incur a heavy support burden on the few people that know how to use them.

You know, the people that already carry everyone else through their job.


👤 dustingetz
mutable pointers in the middle of your immutable lineage. completely broken model

👤 truffdog
Pointers are hard.

👤 jlokier
I would love to hear from people who've been using the "git subtree" command instead. Any good experiences?

---

My colleagues and myself lost work a few times when working on a project whose top-level Makefile / build script ran some "git submodule" commands from time to time, when they detected that a submodule appeared to be out of date.

Those commands wiped work in progress, because of submodules' tendancy to leave junk around when switching to-level branches, or the set of submodules changed, or the same with recursion over vendor submodules. That junk caused a few end-users to end up building and running incorrect code, or to get unexpected build errors, so the policy was for the build scripts to wipe it.

In other words, policy was to prioritise the experience for end-users who cloned the repo and occasionally updated or switched branches.

Unfortunately that meant it clobbered work in progress by devs occasionally. If you were careful it wouldn't happen, but if you left a trail of small changes and forgot about them, and did a git pull or such to get on with other work, intending to still find those work in progress changes later, they'd sometimes be gone. Such changes were things like those which needed to edit inside and outside a module (e.g. for API changes), or improvements to a submodule's code that there was no hurry to finish or commit, changes to add diagnostics, spotted improvements to tests, etc.

Often when those changes were wiped, the dev wouldn't notice for a while. Then later, "eh, I thought I had already done this before...".

My solution was to stop using the standard build target, and remember to use "git submodule" myself at appropriate times such as when switching branches and updating. That way I never lost any work, but it was not how I was "supposed" to work.

The team discussed improvements to the auto-update-clean-and-hard-reset commands, to make it check more carefully so it wouldn't run as often. But the problem remained, and that refinement made the build options rather ugly, a kind of "make REALLY_REALLY_DONT_UPDATE=1" sort of thing. Sensible defaults that Just Work™ for everyone were never found.

I also found submodules annoyingly time-consuming when making many changes that spanned inside and outside modules or across modules. The dance backward and forward to make a PR in the module to something that's compatible with before-and-after, then a PR in what uses it, then a PR clean up the module, perhaps multiple iterations needed, with each step having a potentially slow review cycle in between, over GitHub's slow web interface as well. Understandable for stable APIs shared among multiple projects, but pointless busywork (and worse commit logs) for small changes that could be a single PR in a monorepo.

(ps. Please see first line of this comment.)


👤 polski-g
I recommend git subrepo

👤 withinboredom
If you think git submodules are bad/confusing, wait until you see subtrees.