Thank you in advance
Have you looked into ML compilation?
https://github.com/merrymercy/awesome-tensor-compilers
IMO there is low hanging fruit in the space between high performance ML compilers/runtimes and the actual projects people use. If you practice porting projects you use to these frameworks, that would give you a massive performance edge.