Who got the insight that autocomplete can yield intelligent chatbots?

Question

Anyone know?

coldtea · Accepted Answer

Note that, even though it's referenced that way in "layman" articles, it's not "autocomplete" as in the thing we use in a word processor or chat app or IDE. For starters, the latter might be built (and is commonly been built) with totally different algorithms than LLM and what GPT does.That said, it does "complete" or "predict" a next word or continuation phrase, so it can be though as belonging to the same basic concept of "how to complete what was given", but so is Vim's crude "control-N/P" which just finds tokens starting with the same characters that it was given.

wsgeorge · Answer

Read the GPT papers in order. It's interesting how they got here from there, but there's also a lot of related work that sort of hinted at it.

PaulHoule · Answer

I was working for a startup roughly 7 years ago that was hoping to hook up neural networks to graph databases and turn clinical notes into database records. We pulled out 70,000 abstracts of case reports from pubmed and trained a character-model LSTM to write fake case reports. I was hoping to go from this to a model that could do information extraction tasks but the founder was interested in messing around with word embeddings, which at the time I thought was a way to guarantee you'd fail before you got started because critical knowledge would always be encoded in out-of-dictionary words and if you lost that... you lost.
Not long after I was working for another startup that was developing CNN models for classification, we were watching fasttext and BERT when they came out and one thing I was impressed with was the subword features which keep the benefits of words but don't fail catastrophically for the out-of-dictionary case.
Now the transformer models are pretrained out masking out random words from the input which is better than "predict the next word" because it is bidirectional and the unidirectional approach causes all sorts of problems. (In text generation, for instance, the LSTM starts out in a very small part of the state space and decides if the patient had cancer or not not by starting out with a latent state the way the patient did or the way the person writing the case report did, but because of the letters that were randomly chosen.)
Somebody might fine-tune that kind of model to do information extraction or classification and I've got a pretty good picture of how the pretraining of a transformer works and how to fine-tune it to do that.
The chatbots are trained by human feedback reinforcement learning to play a character, be helpful, be agreeable even when they refuse to write Hitler speeches, that's another stage of training past "autocomplete".