HACKER Q&A
📣 olivierestsage

What do people mean when they say transformers aren't fully understood?


I am not a computer scientist (let alone an ML specialist), just one of many interested observers trying to understand how LLMs work. I am familiar with many of the sources that are recommended for coming to terms with the transformer model, such as "The Illustrated Transformer."

However, in online discussions about transformers -- including on this site -- I frequently see references made to some kind of mysterious, unknown element; basically, the idea that "we don't REALLY know how transformers produce such good results, at least in detail."

My question is: is that claim truly accurate? I am struggling to understand how a technology that is not fully understood, at least by specialists in the ML field, could be harnessed to such great effect. Thank you!


  👤 rvz Accepted Answer ✓
> I frequently see references made to some kind of mysterious, unknown element; basically, the idea that "we don't REALLY know how transformers produce such good results, at least in detail." My question is: is that claim truly accurate?

It is the same reason why almost no one trusts LLMs to fully pilot a plane with zero humans on board.

These systems have a low amount of trust for these use-cases because they fundamentally lack understanding and it is on the basis of being unable to explain themselves transparently. Even when LLMs attempt to, they are able to convince the untrained eye with their wrong answers. Not even the AI researchers can explain why some of these AI systems get confused over bad input or a adversarial input.

There is research into trying to 'understand' these systems for decades but just falls short into demonstrating any consistent evidence of showing transparency in these black-box systems such as LLMs; making them unaccountable.