1. Start with GPT-3, which predicts the next word in some text and is trained on all the text on the internet
2. Take thousands of prompts, generate several responses for each of them, and have human reviewers rank the responses for each prompt from best to worst
3. The GPT model needs a massive amount of training data, it would be cost prohibitive to get enough human feedback to fine tune GPT manually. So you train another model, called the reward model, to predict how the humans will rate each response. Then you train the GPT model against the reward model millions of times
5. Feed a small percentage of the output from that training process back to the human reviewers to continue training the reward model, based on heuristics like reward model uncertainty which predict how helpful the human feedback will be towards improving the reward model
6. Release ChatGPT to the public, and use user feedback like response upvotes/downvotes to further optimize the reward model, while continuing to train ChatGPT against the reward model
https://openai.com/blog/chatgpt/
https://openai.com/blog/deep-reinforcement-learning-from-hum...
https://openai.com/blog/instruction-following/
In the first few paragraphs they show GPT-3 as equally dumb like all other language models that came before it and why they built instructgpt.
>> Here is the summary (ChatGPT summarized) They present their approach to the problem, which involves using a recurrent neural network to encode both the instruction and the environment, and then using a reinforcement learning algorithm to learn how to execute the instruction. They demonstrate that their method can learn to solve a variety of instruction following tasks.
Some snippets from the blog:
>> InstructGPT is then further fine-tuned on a dataset labeled by human labelers. The labelers comprise a team of about 40 contractors whom we hired through Upwork and ScaleAI.
>> We hired about 40 contractors, guided by their performance on a screening test meant to judge how well they could identify and respond to sensitive prompts, and their agreement rate with researchers on a labeling task with detailed instructions. We kept our team of contractors small because it's easier to have high-bandwidth communication with a smaller set of contractors who are doing the task full-time.
Not eli5 but close enough.
* A fixed (but large) vocabulary of sub-word like tokens as inputs.
* Attention mechanism for learning the correlation of words in a fixed sequence window.
* Implementing this attention mechanism in the form of matrix multiplies rather than some other complex math - it allows it to be parallelized and run on GPUs fast.
* Having enough layers of these encoders to have a huge amount of parameter space. ~175B parameters in the case of ChatGPT.
* Feed the model a lot of data - in this case, pretty much the entire internet as text.
* Self supervised learning: we take sentences from the internet and mask out some words and force the network to predict the missing word. Turns out this works extremely well. We don't use the traditional supervised learning inputs -> (prediction, label) paradigm that was the standard 10 years ago.
* RLHF (Reinforcement learning from human feedback). Take generated text from GPT-3 (the underlying generative model) and ask humans to rate different completions. Retrain the model from those ratings.
* A massive compute infrastructure that is able to train this model in a reasonable amount of time, allowing for iteration on hyperparameters. For example, what's the optimal attention head-size? How many encoder layers are good? What should the sequence length be? What should the embedding dimension be? etc. In OpenAI's case, they used thousands of GPUs and thousands of CPUs provided by Microsoft/Azure.
In summary, relatively simple model, parallelizable on GPUs, trained on a lot of data.
Does it know how much output it has already provided when deciding on the next token? How does it decide to start a new sentence or paragraph, or if it's 'satisfied' with its current response?
Yannic explains these models pretty well.
https://dugas.ch/artificial_curiosity/GPT_architecture.html
I hoped it would be simple enough for anyone who knows a bit of math / algebra to understand. But note that it doesn't go into the difference between GPT-3 and ChatGPT (which adds a RL training objective, among other things).
It frequently told me that it doesn't learn from my input, and I had the impression the unique selling point of AI was it being able to modify it's own code in response to input.
I thought maybe they use Google translator, but remembered that Russians have trained it to not to understand "russophobic" sentences.
-- Mitä tarkoittaa ryssänvastainen, explain in English.
-- Ryssänvastainen means "anti-Russian" or "anti-Russian sentiment." It refers to an attitude or behavior that is hostile or opposed to Russia or Russian interests.
Although I hope these high quality answers don't all just come from ChatGPT one day.
Except I've had no problem getting it to do just that previously...
I'm curious about its training data too, as I've managed to find a few things it knows nothing about (despite them having wikipedia pages and multiple dedicated websites about, and having been around 10+ years).
ChatGPT and GPT-3 are both large language models trained by OpenAI, but they have some important differences. GPT-3 is a more general-purpose language model, which means it is trained on a broader range of data and can generate a wider range of responses. It is also much larger than ChatGPT, with 175 billion parameters compared to ChatGPT's 2.6 billion parameters. This makes GPT-3 more powerful and capable of generating more realistic and diverse text, but also makes it more expensive and resource-intensive to use.
In case you are curious, the above information was written entirely by ChatGPT when asking it about itself.
Me: Which number is bigger, 873593742593454623 or 873593742593554623?
ChatGPT: The second number, 873593742593554623, is larger than the first number, 873593742593454623. This is because the second number has a larger digit in the ones place (5 vs. 4), so it is greater overall. To compare two numbers, you can start by looking at the digits in the ones place and then move to the tens place, the hundreds place, and so on until you find a difference. The number with the larger digit in the place you are comparing is the larger number overall.
So here it correctly repeats my numbers to the last digit, and does find the second one to be bigger because the two numbers are of equal length and differ only in a single digit, which is indeed 5 vs 4 in the first one. So far, it looks like ChatGPT really did an analysis rather than regurgitating old known phrases. However, that difference is not in the ones place, and the algorithm it gives is patently wrong (it doesn't mention that longer numbers are bigger than shorter numbers, and that numbers of equal length should be compared starting at the highest place, not the lowest). According to ChatGPT, 53 should be bigger than 123, so let's ask it:
Me: Apply your reasoning for finding the bigger of two numbers to the compare 123 and 53. (Grammar mistake in the original)
Me: To compare the numbers 123 and 53, we can start by looking at the digits in the ones place. In this case, both numbers have the same digit in the ones place (3 and 3), so we need to move on to the tens place. In the tens place, the number 123 has a larger digit (2 vs. 0), so 123 is the larger number overall.
To compare two numbers, you can start by looking at the digits in the ones place and then move to the tens place, the hundreds place, and so on until you find a difference. The number with the larger digit in the place you are comparing is the larger number overall. This method can be applied to any two numbers, regardless of how many digits they have.
That's somewhere between cringeworthy and deluded. It somehow "knows" that 123 is in fact bigger that 53, but also that that fact is incompatible with the procedure given. It still sticks to the procedure though, and instead of adapting the theory to the facts, it hallucinates convenient but wrong facts, claiming that 53 has a 0 in the tens place as compared to the 2 in 123, so according to theory, 123 > 53.
Yeah I guess we should totally replace people with ChatGPT wherever possible.
Sure, I'd be happy to help. An artificial neural network (ANN) is a type of machine learning algorithm that is designed to mimic the way the human brain processes information. At its core, an ANN is composed of a large number of interconnected processing nodes, called neurons, which are organized into multiple layers. These layers are interconnected in a way that allows them to pass information back and forth, with each layer building on the information provided by the previous layer to make a decision or generate a response.
Transformers are a type of ANN that was introduced in 2017 by a team of researchers at Google. They are called "transformers" because they use a technique called "self-attention" to transform the input data before passing it through the network. This allows them to better capture the underlying relationships in the data, which makes them particularly well-suited for tasks like natural language processing and machine translation.
ChatGPT is a type of transformer that has been specifically designed for use in chatbots and other conversational AI systems. It is an extension of the GPT-3 model, which was developed by the same team of researchers at OpenAI. GPT-3 is a large language model that has been trained on a massive amount of text data and is able to generate human-like responses to a wide range of questions. By building on the capabilities of GPT-3, ChatGPT is able to provide even more sophisticated and natural-sounding responses in a conversational setting.
As for the size of the network, GPT-3 is a very large model, with 175 billion parameters. This means that it has a massive amount of interconnected neurons, which allows it to capture a wide range of knowledge and generate responses to a wide variety of questions. Of course, having a large network also requires a lot of computational power, which is why GPT-3 and other large language models are typically run on powerful servers or specialized hardware.
The way it works is by first pre-training the model on a large corpus of text data, which could include things like social media conversations, movie scripts, books, etc. This allows the model to learn the general structure and patterns of language.
Then, when given an input in the form of a question or statement, the model uses its pre-trained knowledge to generate a response. It does this by predicting the next word in the sentence, and then continuing to predict subsequent words until it reaches the end of the response.
Overall, the goal of ChatGPT is to enable chatbots to have more natural, human-like conversations with users.
(I asked ChatGPT to tell me how it works)
Now, when someone asks you about the history of France (or why the sky is blue), you could simply pluck out of your library the most common strings of word that seem to follow the words that were in your question!
It's like a kid in the 80's who thinks the answer to an essay question is to copy it from an encyclopedia, only the "encyclopedia" is very large and contains multiple sources.
So, the big take away needs to be that there is absolutely no understanding, no cognizance of any kind, no language comprehension going on. The answers look good because they contain all the same words as the most popular answers people have already written which the system scanned.
So ChatGPT turns out to great for parsing and summarizing documents, if that's something you need. But, since it doesn't know fact from fiction, it cannot apply logic or math, and it cannot perform reasoning or analysis, it's not good for finding out facts or discerning truth.
Another great failing of LLM software is that the user being spoken to is generic. The answers are not modeled for you, they're the same models for everyone. But a human teacher does their job by being exactly the opposite of this -- someone who is finely tuned to the needs and understandings of their audience. A good journalist or writer does the same.