How do CPUs handle bad transistors?
I've read that current CPUs have more than a hundred million transistors per square millimeter. I can't imagine that every single one of those works perfectly and will stay working perfectly for the entire lifetime of the CPU.
How do we design CPUs that don't die or stop working properly when one out of a hundred million transistors fails?
That's a great question with a rather interesting answer.
The difference between CPUs actually may not be due to different designs - but due to manufacturing errors.
Intel for example might have a production line only for "tier 1" processors (e.g. i7) but during manufacturing, some of the transistors, for some reason, fail to function. During quality tests they can check which transistors fail and reprogram the microcode to use only the good transistors. Then you end up with a lower tier processor (e.g. i5, i3)
Lots of really good information here:
https://www.google.com/amp/s/www.techspot.com/amp/article/18...
Modern designs sometimes have some duplication of functional units that allow the final chip to be configured in a workable manner even if a few of the units don't turn out right. But otherwise, getting all of those transistors working right is the big challenge of chip design. Yes, they all have to work. Yields on new processes are often very low, only a fraction of the devices made actually working.
Cache has extra capacity so bad parts can be mapped out during the testing process. If there's a fault in a core the entire core is disabled. A fault in the uncore will probably cause the entire chip to be scrapped.
Transistors are actually very reliable, especially on a silicon wafer where they're protected from the elements. Unless there's a voltage spike somehow, they're unlikely to go bad.
That's why called IC, integrated circuit
either not one at all or entirely outright break down
The keyword is "mercurial core", isn't it?