HACKER Q&A
📣 max_

What is the attention/transformers model in plain English?


What's the simplest intuitive explanation for the attention/transformer model without using AI jargon?


  👤 benob Accepted Answer ✓
The model tries to predict what the next word will be given a context, typically a question to be answered or a text to be continued. It does so by looking at how each pair of words from the context contribute to this prediction. That's the first layer, but in a second layer, it looks at each pairs of outputs from the first layer, effectively assessing the contribution of a pair of pairs. It does so recursively layer after layer, gathering the contribution of pairs of pairs of pairs... Of course, by reading lots and lots of texts and trying to predict what comes next, the model is "trained" to only look at relevant pairs, and use the context efficiently.

👤 jstx1
Without the technical language whatever explanation you get won't be what these models actually do. You need the technical terms to describe it precisely.

👤 abudabi123
Is there a glossary like the one for Unicode?