I don't know how to find and fix things like: excessive page faults, L1/L2 cache misses, branch mispredicts, context switches etc. What you might call "mechanical sympathy."
For those with these skills, how did you learn? How would you recommend someone develop this skillset today?
As with all things, practice is an essential part of improving!
Then, there's learning from some real achievements. Fast inverse square root, or the 55GB/s Fizzbuzz example: https://codegolf.stackexchange.com/questions/215216/high-thr...