Each explanation is so dramatically different from one another as well.
I feel like its another infamously difficult to explain topic like "monads".
I am desperately waiting for a 3Blue1Brown video on transformers to hopefully resolve this ambiguity.
I am looking for a visual intuition, and something that tries to answer common questions and ambiguities that arise, and explains the history and why we do things this way.
The best approach I found currently is Serrano.Academy https://www.youtube.com/watch?v=UPtG_38Oq8o&pp=ygUUdHJhbnNmb3JtZXIgbmV0d29ya3M%3D. They try to visualize things in 2 dimensions with examples and show the linear transformations.
Karpathy had a unique way of conceptualizing it as a directed graph with a "communication phase" which further confused me.
For such a historic topic, I think we need a better explanation!