HACKER Q&A
📣 s_r_n

Transformer alternatives that could have emergent properties when scaled


I am trying to identify model architecture candidates that could, like transformers, have "emergent" properties when they are scaled (see https://arxiv.org/abs/2206.07682).

Some contenders I already know about are:

* Monarch Mixer (https://arxiv.org/pdf/2111.00396.pdf)

* Hyena (https://hazyresearch.stanford.edu/blog/2023-03-07-hyena)

Thanks for your help.


  👤 PaulHoule Accepted Answer ✓
Are those really “transformer alternatives?” or just different ways to implement transformers by replacing parts of transformers with alternate parts?

👤 adf343fgfdg
Most probably, in time we will find that most models capable of "free speak" and deep reasoning, have properties that in biological entities we strongly associate with "conscious thinking".

There's a chance that even relatively weak systems with strong dinamic adaptability (capable of hundreds to thousands of decisions, that would be kind of a pedestrian reasoning capability, but reasoning capability anyway), are capable of some emergence related to "human consciusness". Advanced HFT systems could have been, then, capable of "conscius thinking" back in 2010. And we could not have noticed (or maybe some people noticed it and continued that line of research in strong stealth mode, advancing the capability of emergence in the models we saw just popping up out of the blue..), but I'm loosing the point..

We, as humans, tend to correlate "consciusness" with a pseudo-continuous state of mind or mood, which most probably isn't real. More probably most humans have an exceedingly capable "auto-pilot mode" state, in which they can operate most of the time when they're dealing with repetitive tasks, and consciusness is just an esporadic event, emerging - properly speaking and correlating it with similar states in AI - just when it's required by some contexts.

Thinking models as systema capable of running the "human software", which is our - as biological entities - programming: the human language and its asociated cognitive maps, obviously could replicate the human traits intrinsecally embedded in the software.

The pseudo-continous state of "consciusness" is one of them, and some flaws "detected" as "non-self-conscius thinking" and/or hallucinations are probably just what you can see daily in every human in the planet: you just can random ask anyone out of the blue "what are you doing" and be marvellous that most people a second or three to actually engage "consciousness" and explain to you why they're there and what they're doing.

That latency to switch from autonomous behavior to conscius thinking is what would be the next frontier for AI: how many complex task - even talking - could somehow be relayed to "non conscius" but intelligent cognitive processing in models.