Has anybody explored diffusion models as a basis for LLMs?
If you think of the characters as pixels, then you should be able to apply a similar process, right?
It is, as always, not quite that easy, because what is the equivalent of smooth continuous Gaussian noise for an ASCII character? What does a letter like 'z' jitter to/from?
But here is a bibliography of some relevant papers on diffusion models for discrete data which you might find useful: https://gwern.net/doc/ai/nn/diffusion/discrete/index