How does ChatGPT work?

Question

I'd love a recap of the tech for someone that remembers how ANNs work but not transformers (ELI5?). Why is ChatGPT so much better, too? and how big of a weight network are we talking about that it retains such a diverse knowledge on things?

akelly · Accepted Answer

The way they went from GPT-3 to ChatGPT is really quite genius. My understanding is that it's something like this:
1. Start with GPT-3, which predicts the next word in some text and is trained on all the text on the internet
2. Take thousands of prompts, generate several responses for each of them, and have human reviewers rank the responses for each prompt from best to worst
3. The GPT model needs a massive amount of training data, it would be cost prohibitive to get enough human feedback to fine tune GPT manually. So you train another model, called the reward model, to predict how the humans will rate each response. Then you train the GPT model against the reward model millions of times
5. Feed a small percentage of the output from that training process back to the human reviewers to continue training the reward model, based on heuristics like reward model uncertainty which predict how helpful the human feedback will be towards improving the reward model
6. Release ChatGPT to the public, and use user feedback like response upvotes/downvotes to further optimize the reward model, while continuing to train ChatGPT against the reward model
https://openai.com/blog/chatgpt/
https://openai.com/blog/deep-reinforcement-learning-from-hum...

chopete3 · Answer

This blog explains some of the key innovations they added on top of GPT-3, especially the natural language understanding (following instructions).
https://openai.com/blog/instruction-following/
In the first few paragraphs they show GPT-3 as equally dumb like all other language models that came before it and why they built instructgpt.
>> Here is the summary (ChatGPT summarized) They present their approach to the problem, which involves using a recurrent neural network to encode both the instruction and the environment, and then using a reinforcement learning algorithm to learn how to execute the instruction. They demonstrate that their method can learn to solve a variety of instruction following tasks.
Some snippets from the blog:
>> InstructGPT is then further fine-tuned on a dataset labeled by human labelers. The labelers comprise a team of about 40 contractors whom we hired through Upwork and ScaleAI.
>> We hired about 40 contractors, guided by their performance on a screening test meant to judge how well they could identify and respond to sensitive prompts, and their agreement rate with researchers on a labeling task with detailed instructions. We kept our team of contractors small because it's easier to have high-bandwidth communication with a smaller set of contractors who are doing the task full-time.

bryan0 · Answer

I found this description of the GPT-3 transformer architecture useful: https://dugas.ch/artificial_curiosity/GPT_architecture.htmlNot eli5 but close enough.

typon · Answer

There were several key insights that have made something like ChatGPT possible relative to traditional neural networks.
* A fixed (but large) vocabulary of sub-word like tokens as inputs.
* Attention mechanism for learning the correlation of words in a fixed sequence window.
* Implementing this attention mechanism in the form of matrix multiplies rather than some other complex math - it allows it to be parallelized and run on GPUs fast.
* Having enough layers of these encoders to have a huge amount of parameter space. ~175B parameters in the case of ChatGPT.
* Feed the model a lot of data - in this case, pretty much the entire internet as text.
* Self supervised learning: we take sentences from the internet and mask out some words and force the network to predict the missing word. Turns out this works extremely well. We don't use the traditional supervised learning inputs -> (prediction, label) paradigm that was the standard 10 years ago.
* RLHF (Reinforcement learning from human feedback). Take generated text from GPT-3 (the underlying generative model) and ask humans to rate different completions. Retrain the model from those ratings.
* A massive compute infrastructure that is able to train this model in a reasonable amount of time, allowing for iteration on hyperparameters. For example, what's the optimal attention head-size? How many encoder layers are good? What should the sequence length be? What should the embedding dimension be? etc. In OpenAI's case, they used thousands of GPUs and thousands of CPUs provided by Microsoft/Azure.
In summary, relatively simple model, parallelizable on GPUs, trained on a lot of data.

subroutine · Answer

How does it know when to stop when asked for a description or summary? Sometimes it outputs a few sentences, sometimes a few paragraphs.Does it know how much output it has already provided when deciding on the next token? How does it decide to start a new sentence or paragraph, or if it's 'satisfied' with its current response?

swayson · Answer

Yannic Kilcher did an explainer recently on his YT channel. https://www.youtube.com/watch?v=0A8ljAkdFtgYannic explains these models pretty well.

ribit · Answer

I understand the basic idea of predicting the words in a sequence, but what totally eludes me is how this relates to the prompt. After all, you don't give it a sequence to continue, you give it a direct request. Is there some special processing going on here or do they really just take the prompt as is and encode it?

discordance · Answer

Anyone found a architecture diagram that includes the ML Ops parts? - I'm very interested in this at a system level for how the train / retrain loops work but haven't found much info on that.

chronolitus · Answer

Back when GPT-3 came out, I wanted to understand how it works, so read the papers and made this post:
https://dugas.ch/artificial_curiosity/GPT_architecture.html
I hoped it would be simple enough for anyone who knows a bit of math / algebra to understand. But note that it doesn't go into the difference between GPT-3 and ChatGPT (which adds a RL training objective, among other things).

k__ · Answer

Half-OT: people are always talking about ChatGPT being AI, but is this actually the case?It frequently told me that it doesn't learn from my input, and I had the impression the unique selling point of AI was it being able to modify it's own code in response to input.

adjusted · Answer

It's still transformer underneath, but openai researchers have figured out how to improve it through engineering efforts and improved training data. I believe it's not easy for outsiders without large model pretraning experience like most of us to understand the tunning details.

osigurdson · Answer

ChatGPT is great. I use it a lot. But... it is still necessary to use google for many things. ChatGPT is somewhat out of date and strangest thing is, it will almost always provide an answer (right or wrong). So, for the most part, everything has to be manually checked.

timonoko · Answer

How does the non-english languages part work?
I thought maybe they use Google translator, but remembered that Russians have trained it to not to understand "russophobic" sentences.
-- Mitä tarkoittaa ryssänvastainen, explain in English.
-- Ryssänvastainen means "anti-Russian" or "anti-Russian sentiment." It refers to an attitude or behavior that is hostile or opposed to Russia or Russian interests.

oars · Answer

High quality answers like in this thread are why I come to HN.Although I hope these high quality answers don't all just come from ChatGPT one day.

Trampoflix · Answer

Pretty interesting talk about the foundation model used in chat GPT3.https://m.youtube.com/watch?v=D3sfOQzRDGM

wizofaus · Answer

Rather strangely, it would seem - I just had this response: "However, I am a language model and do not have the ability to edit or revise my responses once they have been generated"
Except I've had no problem getting it to do just that previously...
I'm curious about its training data too, as I've managed to find a few things it knows nothing about (despite them having wikipedia pages and multiple dedicated websites about, and having been around 10+ years).

jb1991 · Answer

ChatGPT is trained using a combination of supervised and unsupervised learning. For supervised learning, it is trained on a large dataset of human-generated text, such as dialogue data or online conversations. This allows it to learn the structure and style of natural language. For unsupervised learning, it is trained using a language modeling objective, which involves predicting the next word in a sequence of text. This allows it to learn the broader patterns and characteristics of language, and to generate text that is fluent and coherent.
ChatGPT and GPT-3 are both large language models trained by OpenAI, but they have some important differences. GPT-3 is a more general-purpose language model, which means it is trained on a broader range of data and can generate a wider range of responses. It is also much larger than ChatGPT, with 175 billion parameters compared to ChatGPT's 2.6 billion parameters. This makes GPT-3 more powerful and capable of generating more realistic and diverse text, but also makes it more expensive and resource-intensive to use.
In case you are curious, the above information was written entirely by ChatGPT when asking it about itself.

greenSunglass · Answer

How much RAM/GPU/CPU is required to run the chatGPT / GPT3 model (aka text-davinci-003)?

Doorstep2077 · Answer

It's definitely a step up from GPT-3, but I'm curious how much further it has to go before it's actually scary. Right now, I feel like there's still quite a bit of progress to be made.

Trampoflix · Answer

Here is a pretty interesting talk about the foundation model used in ChatGPT.https://m.youtube.com/watch?v=r8ajJKDiT6s

touringa · Answer

https://lifearchitect.ai/chatgpt/

DemocracyFTW2 · Answer

What baffles and makes it hard for me to wrap my brain around the statement that ChatGPT is "just predictive text analysis" is that it certainly feels like it goes into detail, picks up and repeats the particulars, presents a line of reasoning that sounds, well, reasonable, until you realize it got a number of things terribly wrong, often in deceptively subtle ways:
Me: Which number is bigger, 873593742593454623 or 873593742593554623?
ChatGPT: The second number, 873593742593554623, is larger than the first number, 873593742593454623. This is because the second number has a larger digit in the ones place (5 vs. 4), so it is greater overall. To compare two numbers, you can start by looking at the digits in the ones place and then move to the tens place, the hundreds place, and so on until you find a difference. The number with the larger digit in the place you are comparing is the larger number overall.
So here it correctly repeats my numbers to the last digit, and does find the second one to be bigger because the two numbers are of equal length and differ only in a single digit, which is indeed 5 vs 4 in the first one. So far, it looks like ChatGPT really did an analysis rather than regurgitating old known phrases. However, that difference is not in the ones place, and the algorithm it gives is patently wrong (it doesn't mention that longer numbers are bigger than shorter numbers, and that numbers of equal length should be compared starting at the highest place, not the lowest). According to ChatGPT, 53 should be bigger than 123, so let's ask it:
Me: Apply your reasoning for finding the bigger of two numbers to the compare 123 and 53. (Grammar mistake in the original)
Me: To compare the numbers 123 and 53, we can start by looking at the digits in the ones place. In this case, both numbers have the same digit in the ones place (3 and 3), so we need to move on to the tens place. In the tens place, the number 123 has a larger digit (2 vs. 0), so 123 is the larger number overall.
To compare two numbers, you can start by looking at the digits in the ones place and then move to the tens place, the hundreds place, and so on until you find a difference. The number with the larger digit in the place you are comparing is the larger number overall. This method can be applied to any two numbers, regardless of how many digits they have.
That's somewhere between cringeworthy and deluded. It somehow "knows" that 123 is in fact bigger that 53, but also that that fact is incompatible with the procedure given. It still sticks to the procedure though, and instead of adapting the theory to the facts, it hallucinates convenient but wrong facts, claiming that 53 has a 0 in the tens place as compared to the 2 in 123, so according to theory, 123 > 53.
Yeah I guess we should totally replace people with ChatGPT wherever possible.

birdyrooster · Answer

Why didn&rsquo;t you ask ChatGPT?

jupp0r · Answer

I asked ChatGPT and here is its answer:Sure, I'd be happy to help. An artificial neural network (ANN) is a type of machine learning algorithm that is designed to mimic the way the human brain processes information. At its core, an ANN is composed of a large number of interconnected processing nodes, called neurons, which are organized into multiple layers. These layers are interconnected in a way that allows them to pass information back and forth, with each layer building on the information provided by the previous layer to make a decision or generate a response.Transformers are a type of ANN that was introduced in 2017 by a team of researchers at Google. They are called "transformers" because they use a technique called "self-attention" to transform the input data before passing it through the network. This allows them to better capture the underlying relationships in the data, which makes them particularly well-suited for tasks like natural language processing and machine translation.ChatGPT is a type of transformer that has been specifically designed for use in chatbots and other conversational AI systems. It is an extension of the GPT-3 model, which was developed by the same team of researchers at OpenAI. GPT-3 is a large language model that has been trained on a massive amount of text data and is able to generate human-like responses to a wide range of questions. By building on the capabilities of GPT-3, ChatGPT is able to provide even more sophisticated and natural-sounding responses in a conversational setting.As for the size of the network, GPT-3 is a very large model, with 175 billion parameters. This means that it has a massive amount of interconnected neurons, which allows it to capture a wide range of knowledge and generate responses to a wide variety of questions. Of course, having a large network also requires a lot of computational power, which is why GPT-3 and other large language models are typically run on powerful servers or specialized hardware.

xenospn · Answer

ChatGPT is a variant of the popular GPT-3 language model, specifically designed for chatbot applications. It uses a combination of deep learning and natural language processing techniques to generate human-like responses to text input in a conversation.
The way it works is by first pre-training the model on a large corpus of text data, which could include things like social media conversations, movie scripts, books, etc. This allows the model to learn the general structure and patterns of language.
Then, when given an input in the form of a question or statement, the model uses its pre-trained knowledge to generate a response. It does this by predicting the next word in the sentence, and then continuing to predict subsequent words until it reaches the end of the response.
Overall, the goal of ChatGPT is to enable chatbots to have more natural, human-like conversations with users.
(I asked ChatGPT to tell me how it works)

dtagames · Answer

ChatGPT is really very simple. Imagine you could analyze a million books and identify all the words within them -- not the meanings of the words, just the actual letters they contain.
Now, when someone asks you about the history of France (or why the sky is blue), you could simply pluck out of your library the most common strings of word that seem to follow the words that were in your question!
It's like a kid in the 80's who thinks the answer to an essay question is to copy it from an encyclopedia, only the "encyclopedia" is very large and contains multiple sources.
So, the big take away needs to be that there is absolutely no understanding, no cognizance of any kind, no language comprehension going on. The answers look good because they contain all the same words as the most popular answers people have already written which the system scanned.
So ChatGPT turns out to great for parsing and summarizing documents, if that's something you need. But, since it doesn't know fact from fiction, it cannot apply logic or math, and it cannot perform reasoning or analysis, it's not good for finding out facts or discerning truth.
Another great failing of LLM software is that the user being spoken to is generic. The answers are not modeled for you, they're the same models for everyone. But a human teacher does their job by being exactly the opposite of this -- someone who is finely tuned to the needs and understandings of their audience. A good journalist or writer does the same.

How does ChatGPT work?

I'd love a recap of the tech for someone that remembers how ANNs work but not transformers (ELI5?). Why is ChatGPT so much better, too? and how big of a weight network are we talking about that it retains such a diverse knowledge on things?

I found this description of the GPT-3 transformer architecture useful: https://dugas.ch/artificial_curiosity/GPT_architecture.html
Not eli5 but close enough.

Yannic Kilcher did an explainer recently on his YT channel. https://www.youtube.com/watch?v=0A8ljAkdFtg
Yannic explains these models pretty well.

Anyone found a architecture diagram that includes the ML Ops parts? - I'm very interested in this at a system level for how the train / retrain loops work but haven't found much info on that.

Half-OT: people are always talking about ChatGPT being AI, but is this actually the case?
It frequently told me that it doesn't learn from my input, and I had the impression the unique selling point of AI was it being able to modify it's own code in response to input.

It's still transformer underneath, but openai researchers have figured out how to improve it through engineering efforts and improved training data. I believe it's not easy for outsiders without large model pretraning experience like most of us to understand the tunning details.

ChatGPT is great. I use it a lot. But... it is still necessary to use google for many things. ChatGPT is somewhat out of date and strangest thing is, it will almost always provide an answer (right or wrong). So, for the most part, everything has to be manually checked.

High quality answers like in this thread are why I come to HN.
Although I hope these high quality answers don't all just come from ChatGPT one day.

Pretty interesting talk about the foundation model used in chat GPT3.
https://m.youtube.com/watch?v=D3sfOQzRDGM

How much RAM/GPU/CPU is required to run the chatGPT / GPT3 model (aka text-davinci-003)?

It's definitely a step up from GPT-3, but I'm curious how much further it has to go before it's actually scary. Right now, I feel like there's still quite a bit of progress to be made.

Here is a pretty interesting talk about the foundation model used in ChatGPT.
https://m.youtube.com/watch?v=r8ajJKDiT6s

https://lifearchitect.ai/chatgpt/

Why didn’t you ask ChatGPT?