Ideally what I would like is some sort of progression for problems that may benefit from application of SIMD. From reading case studies online, it seems like there's lots of little tricks and stuff, almost like there's a different way of thinking about types, values and operations on them that you have to know.
Lear how to write algorithms using vectorization only (without if-s and for-s).
Look into OpenCL / CUDA programming models.
Then it will be much easier to learn native SIMD programming for specific ISAs.
---
https://en.wikipedia.org/wiki/Array_programming
https://en.wikipedia.org/wiki/Category:Array_programming_lan...
then compare before / after using the profiler
learn about the intrinsics as you work on it