Are MPI implementations simply a wrapper on shared memory for intra-node
HPC noob here: Skimming through Open MPI implementation, it seems like it uses SM for intra-node communication? Isn't one of the points of MPI to use message queues for such communication, which differentiates it from shared memory based approaches?
The culture of MPI does not value simplicity of implementation but rather optimizing every possible case so I would expect MPI to have an optimized implementation if two processes are on the same node.
No.
MPI is an interface, free to use a wide range of implementations.