I am pretty sure it is just OpenBlas compiled with special options + PyTorch using that, but who knows.
There doesn't seem to be a ton of affordable options. I see the Altra has ASIMD\NEON but not SVE.
If I have a budget of ~3000, is one of their systems probably the best choice? There are several standard formfactor motherboards and even combo kits available at normal outlets.
I'm not _MARRIED_ to ARM, but ~60w idle, 200w tdp when it's busy - that can be cooled quietly, and performance looks decent.
Personally, I'd use a high-end Apple M3 box or an expensive x64 rig.