Other performance work has dependencies tending to be tied to a database, networked services, etc that's not so bite-sized and sharable.
My early descent into programming was constantly battling with performance getting a 1.79 MHz Atari 8-bit to do interesting things in a 1/30 second. Maybe retrocomputing and writing software for vintage machines or recreated new machines that work like them might be a good challenge.
Real-world performance on modern hardware is often a lot more dull and consists of finding the inner loops, parallelizing what you can, organizing the data to be cache friendly, then getting the loop fast at instruction level. You don't learn the same lessons from that because the majority of the codebase is "dark" and run too infrequently to show up as a bottleneck.
(a supercomputer is a device that turns your compute-bound job into io-bound; a mainframe turns io-bound jobs into compute-bound)