Even today, a not-insignificant number of projects/tools/etc are distributed as source, and you have to build it locally ... sometimes taking hours
Other tools (eg Apache) are distributed both as binaries or source, and you can run the "out of the box" edition, or 'tweak' it a little compiling yourself...but again - it takes forever to get up and running (and heaven help you when an update comes out)
What is it about compilation that is so time-consuming?
Fundamentally, it's a [relatively-simple] translation from one language to another - with most compiled languages having been around for a long time
Surely compiling locally shouldn't be so astronomically time-consuming...should it?
Compiler optimizations are time consuming because they are trying to find solutions to NP-hard problems. Since solving NP-hard problems optimally is intractable compilers use heuristics. So the more optimization the user asks for the more time consuming heuristics are executed. One such problem is register allocation. Your CPU has a finite number of registers that should be used for an almost infinite number of variables. Figuring out a good register allocation is very difficult.
It also gets more difficult because modern compilers are optimizing larger contexts. Rust, C++, and other languages with functional features rely heavily on function inlining. So a function you write that is five lines may internally be expanded by the compiler to hundreds of lines.
AOT compilers that are slow, are slow because they spend a lot of time optimizing code that isn't particularly important (but they have no way to know that).
A good way to see this is by taking a large-ish Java program that does the same thing as a large-ish C++ or Rust program, and comparing the compile times. The Java program will compile much faster, but runtime performance will be roughly comparable post-warmup (unless the program is very sensitive to memory locality, as Java doesn't currently have value types). The reason is that the Java -> bytecode compilation is very simple and involves almost no optimization work, and then the rest of the compile is done at runtime, but it's all profile guided so only the parts of the app that actually benefit from optimization get it.
C++ has its template instantiations with the same problem.
Another aspect is how much more the compilers optimize, compared to the "early" days, modern compilers can often beat human programmers at optimizing assembly in a reasonable time and can do a lot more complex optimizations. In addition to that, you can couple it e.g. with linker optimizations like LTO/BOLT and you have even longer build times.
I think another factor is the more and more dominant static linking model (Rust/Go/Swift). If I compile e.g. my program with the swift stdlib statically linked, it takes around 15 seconds for an incremental build, while using a dynamically linked stdlib it takes around 6 seconds. Sure it is just one benchmark, but static linking nearly everything won't have no impact.
You could have the most efficient possible compiler and it still has to read N files and output X binary objects back to disk. Those will always be bottlenecks that are sometimes extremely visible in wall clock time (even on the fastest SSDs sometimes, once you account also for OS paging and caching and background processing like anti-virus tools and other security audits).
One of the reasons that I stopped working on it was because of how slow it became, so I might be able to contribute to answering your question.
Initially, when the compiler was simpler, it was actually much faster. I was able to do some meaningful proof of concept demos with it like compiling a small microkernel, and compiling most of its own source code. Of course, the natural thing to do is to make it so that could cross-compile itself and run in the browser, and that's where it became terribly slow, which required more code to optimize, and the new code that was added to make it faster in the long term made it much slower in the short term.
To start with, if you think of a simple piece of code like this:
if ( 1 ) { putc('a'); }
This is only a 23 byte character program, so why should it be slow to compile? Well, the first stage of parsing this program involves tokenization. In this short program, I count 16 different 'tokens' (including the whitespace). If you want to have even the simplest data structure to describe one of your 'tokens', that only contains a single pointer to an offset in the program, then you will need to consume 16 pointers just for the tokens. On a 64 bit machine, you'll have 8 byte pointers, and 16 * 8 = 128 bytes, just for the pointers into the byte array of the program! And we haven't even started talking about the memory overhead of all the other things you'll need to describe about these tokens in your token object.So, now we already have a memory overhead that is more than 5 times as big as the program, but we also have to build the parse tree, control flow graphs, linker objects etc. and you also have to pull in a mess of header files, bloated libraries etc. If you're wasteful with memory in the compiler, you can easily run out of memory from compiling a few megabytes of source code. Being more intelligent with memory management requires copying memory around a lot, which also adds to the latency.
So, now you need to think about optimizing your memory use, and do 'smarter' things that trade memory usage for CPU. Plus, you're likely to also start needing free/delete a lot from heap memory which is a system call and therefore slower than a call within your program. By the time you implement all this 'optimization', you compiler has become an incredibly complicated and bloated system that requires even more code to optimize all the opportunities for improvement.
But sometimes it’s advantageous - E.g. it’s possible for a compiler to take advantage of special CPU instructions that that are present on your CPU but cannot be guaranteed are present everywhere - resulting in a faster binary. In some special but relatively rare edge cases (e.g. numerical computing / simulations) these speed ups can be significant.
But there’s heaps of room for improvement, E.g. most compilations have steps that are embarrassingly parallel - so make has an option “-j” that allows you to use multiple CPU cores - this significantly speeds up the compilation of numpy, the trouble is, it also increases memory consumption quite a bit, so make cannot change its default to use all cores as it would OOM a lot of existing build systems.
What what I have seen the vast majority (like > 90%) of automated build systems don’t take advantage of multi core compiles, so that’s why they are dreadfully slow by default. Pythons subsystems are guilty of this, which is why building matplotlib, numpy, scipy, etc from source is very very slow
If you don't need the compiler, you can write machine code in hexadecimal directly, like the old days - that's really "coding" a program. But as a practical matter, we do need it - we want more organized methods of coding with more static checks. And then it's a question of, well, how much? Can we automate even more of it?
And we don't know those answers but we keep pushing the boundary a little farther as hardware progresses, and we also have a gradual creeping up of source code size as more and more features work their way down the stack towards being shared, reused code, so compilers always find ways to be slower. Work in an older compiler on newer hardware and it'll feel fast.
Some languages (I suppose language pairs from source to target) compile dramatically faster than others.