HACKER Q&A
📣 ak39

List of biggest software companies by lines of code


Anyone compiled a list of largest software companies with their total lines of code? (Or any other metric)


  👤 shoo Accepted Answer ✓
microsoft: in 2017, the Windows git repo contained ~3.5 million files & used 300GB when checked out as a git repo [1], In 2020, the Office git repo contained ~3 million files [2].

facebook: In 2011, the main site had around 9.2 million lines of code "excluding numerous backend services" [3]. In 2014 the android codebase was 4 million lines of code and the main site without backend code was 62 million lines of code [3]. In 2014, when facebook were talking about scaling mercurial, the main source repo was described as "many times larger than even the Linux kernel, which checked in at 17 million lines of code and 44,000 files in 2013". Around 2018 (?) facebook started using "eden", a fork of mercurial [5].

If we make the working assumption that expected lines-of-code / file of 2013 linux kernel is representative of all repos, then that gives around 386 lines-of-code / file.

loc(microsoft) >= loc(windows in 2017) + loc(office in 2020) ≈ 386 loc / file * (3.5m + 3m) ≈ 2.5 billion lines of code

The linked blog posts from microsoft & facebook talking about scaling version control systems are fairly interesting!

[1] https://devblogs.microsoft.com/bharry/the-largest-git-repo-o... [2] https://devblogs.microsoft.com/devops/introducing-scalar/ [3] https://www.quora.com/How-many-lines-of-code-is-Facebook [4] https://engineering.fb.com/core-data/scaling-mercurial-at-fa... [5] https://github.com/facebookexperimental/eden


👤 cafard
I have to imagine that IBM leads, given its longevity and all that COBOL out there running financial software. One of Richard Gabriel's essays--admittedly from about 30 years ago--said that in LOC, COBOL has a massive lead over all other languages. And then there's all the code to keep the OSes running.

👤 nikivi
If you mean closed source companies, most sadly are, then that data isn't available. For open source, you can do a query with GitHub API.