HACKER Q&A
📣 voorloopnul

How to handle 500+ repositories in GitHub?


Searching for ways to handle a large number of repositories I stumbled in this thread: https://github.community/t/structuring-repositories-or-organisations/817

Apparently Gitlab and Bitbucket give you a feature to group repositories into projects while Github is lacking something similar.

How companies using Github handle their repositories when there are 100's of them?


  👤 sverhagen Accepted Answer ✓
We are currently moving from Bitbucket to GitHub (hurrah!), and since we were "assigned" a single organization in GitHub by our corporate parent, we are also looking at importing a lot of repositories into a single organization. And the inability to further organize these repositories has been a big disappointment for an otherwise good experience with GitHub. Pretty much what this thread is about!

We end up telling our teams to use the different search functions to help in making sense of the madness: search by language (automatically detected by GitHub), team, type (forks or other), topics and names. We have a decent process to make sure the appropriate topics are set for repositories (we manage all non-forks through Terraform), and to make sure that names include a useful prefix. Though I don't think we'll ever stop debating the right prefixes for our repository names.

It's a "poor man's" ordeal, that's for sure. Organizations are probably the right way to go, if you can.

We're still in the move, so too soon to tell, but I am also wondering if we aren't worried about a non-issue. Because while Bitbucket has "groups", I don't think I have ever consciously used those groups while working in Bitbucket (nor have we ever set up the Bitbucket groups very well for our organization anyway).


👤 prepend
My org has about 500 repos split across 3 orgs. There’s a “main” org, an OSS org, and then splinters for people who are willing to pay for and manage their own.

For the main org, we have templates that projects typically follow where they include info in the readme for what group and individuals manage and we require that they add a tag for the specific center and encourage tags for projects.

So it’s sort of possible to search for a tag within the org to see all the projects but it’s janky and confusing.

Some groups create “housekeeping repos” that is just a repo with docs that link out and describe all their projects. They use repo instead of wiki as GitHub wikis are kind of a pain to manage (must have specific permissions and can’t just fork and send a PR). So that group uses the housekeeping repo as the link they give out to new team members, etc.

For the OSS org projects, we also have a portal repo that builds a github.io portal site that shows cards for each repo and allows searching and sorting. The OSS org doesn’t use the tagging scheme because the public doesn’t really know or care about our internal org names. We have about 175 projects in our OSS org.

Note, we also run GitLab community edition internally and actually have subgroups and stuff. But since GitLab requires internal network access GitHub use is growing since GitHub is in the cloud and doesn’t require VPN. The GitLab license costs are much higher than GitHub and not really compatible with our dev style. We have lots of non-devs and the ratio is probably 3:1 of non-dev:dev and GitLab makes us license everyone the same so we can’t pay $1000/year so a PM can update readmes and project cards.


👤 szszrk
There are github organizations. I checked this year and it allowed to create one for free.

https://docs.github.com/en/organizations/collaborating-with-...


👤 sfgweilr4f
I'm having trouble with 30 or so repos. I shudder to think what 500 or more is like. Search every time? kind of not good.

Maybe there is a place for an "index" repo that holds a set of github pages that acts as an index into all the repos and groups them via that page instead of just using search.


👤 brarsanmol
I believe you can do this multiple ways, GitHub allows you to create Teams and Projects and then it also allows you to add Topics to repositories.

1) If you don't want to have access control on your repositories you can simply create a project and link it multiple repositories to it.

2) In a similar vein, another approach without access control is adding a topics to your repositories and then searching them in the GitHub search bar using *topic: your-topic".

3) Finally, if you need to have access controls or permissions on your repositories you can create teams and assign that team repositories.

I am not a professional or corporate user, so please forgive me if any of this not 100% accurate.


👤 ecesena
I started a new project with monorepo, but soon the number grew when I had to share smaller private sub-projects with smaller teams. I wish there was a way to only share a directory or so.

On the monorepo, I learned a couple interesting things. For example with npm packages, even if you host multiple packages in a single monorepo you can still track dependents for each individual lib [1]. Well done Github.

[1] https://github.com/.../.../network/dependents


👤 tossaway9000
430+ repos here, corporate setting, its not great but we use some custom scripts (validate users, groups, which repos are allowed to be public), strict documentation, and quarterly auditing. We have a lot of strict requirements around branch protections and the API for branch protections leaves a lot to be desired.

We've found that setting up new repos in a strictly documented manner has been the best way to approach it, we also have some github actions that run periodically to run some sanity checks across repos.

We're a terraform shop but we had countless issues with the terraform-github-provider but, maybe its improved the last year or so.

Also, Github has no "protections" around tagging, this really hurts us as we want to move to tags and releases for versioning but don't have a way to require multiple approvals before cutting an artifact that can be promoted so we have to wrap some customization and processes around it.


👤 cik
Unfortunately we ended up giving in and using multiple GitHub organizations. There doesn't seem to be a better answer like repository tagging or labeling. It's not ideal, and has definitely resulted in lost issues, duplicated effort and the like. However - it's the best answer we have.

👤 noufalibrahim
Not exactly your requirement but I heavily use Github Classroom while teaching and it generates a ton of repositories (one per exercise per student). I name them consistently and wrote a few command line scripts to grade, delete etc. them.

👤 shoo
Probably need to think through the concrete use cases for "handle".

e.g. if there are requirements related to tracking issues in some uniform way in software projects that have a many:many relationship with source repos, a possible solution is to use some other system for issue tracking, don't try to track issues in github.

If there's a requirement to uniformly configure access controls / branch protection etc to hundreds of git repos, you could use terraform or roll your own automation using the github API combined with whatever you can enforce at the github org level.


👤 bramblerose
In the process of moving from GitLab to GitHub. We just use naming conventions for this: all repositories are called Company.Group.Product, where Group roughly corresponds to the old GL group.

👤 dbg31415
One feature I REALLY like about ZenHub is that you can build a project board with multiple repositories on it. Easy to tie tickets from the back-end repo to the front-end repo, or whatever you need. Wish GitHub would build out multi-repo projects... or just buy ZenHub. Nice of GitHub to build out their own project boards in recent years, but it's never been as good as some of the 3rd Party tools that are out there.

https://www.zenhub.com/


👤 captn3m0
For managing them, we have a wrapper over terraform-github-provider[0]. There's also ghconf[1].

Unfortunately, no solutions for grouping them.

[0]: https://registry.terraform.io/providers/integrations/github/...

[1]: https://github.com/optile/ghconf


👤 jjice
For those who are in organizations that have near 500 repos, why do many? Is it shear size of the company, or is it organizing individual internal libraries into different repos (seems like good practice), or something else? I've only ever worked for small companies where we have max 5 real repos, and I work at a company with a mono repo right now.

Large scale systems are very interesting to me, but I have to say, I don't mind the mono repo.


👤 elboulangero
I use this command-line tool: https://myrepos.branchable.com/

👤 Sevii
I work in an org with a similar setup where there is no name-spacing on git repositories. We end up prefixing the project name with our org name.

ex repo name is: $OrgName + $ProjectName

Honestly, it is not a great solution because the org name has changed a few times and we have repos under 3-4 different naming schemes.


👤 Mennny
I built this command line tool to manage settings and configuration of an org with ~300 repositories: https://github.com/svenmuennich/github-commander

👤 rurban
I have currently 375 repos under my user, plus about 20 more under different orgs. What's the exact problem to solve? Just avoiding to pay per user per org?

I even have some crontab's to update all issues via git bug bridge locally, so that I don't need an internet connection to github.com to work on tickets. Clicks are not easily automated, so I avoid them for my workflows. Same for almost daily rebasing. (all feature and bugfix branches are automatically rebased to master, when master moves). This is 90% automatic.

All my repos are public, I pay nothing. I find that better than paying for it and keeping them private.


👤 giantg2
Maybe take a variation of the old COBOL playbook to number things in an extensible way - prefix repo names so that they are organized by it.

👤 markuman123
with ansible