HACKER Q&A
📣 breck

How would you scale a site backed by Git to a million contributors?


If you were to build a community site, say something like Wikipedia or Reddit, but powered by Git, how would you shard things?

Imagine the specs would be: 10 million files. 1 million users committing changes.

What would be some good resources to read?


  👤 cxr Accepted Answer ✓
This didn't get any traction, so I'll take the opportunity to ask why.

> Wikipedia [...] powered by Git

Why? When I think of what could stand to be fixed with Wikipedia (ignoring the social problems and focusing on the technical), I'd say that reving wikitext by starting with a profile of Markdown and growing it to handle wikis' concerns would be a worthwhile high-priority project, or aggressively refactoring Mediawiki to a non-PHP language, but I'm not seeing why replacing their database with Git would be even medium- or low-priority. What's the benefit? (I can see plenty of downsides, aside from just the upfront cost.)

Having said that, if you were interested in something Git-like, but not necessarily Git, I'd point to the Dat protocol and the body of work (inc. identifying challenges scaling and suggested solutions) that currently exist. Are you familiar with it?