Is it normal that Mistral answers he's GPT?

Question

Here I asked mistral what version number he's on. Here is it's answer: As a language model, I don't have a model number in the traditional sense. However, I can tell you that I am a GPT-3.5 model, which is an improved version of my predecessor, GPT-3. This version was trained on a larger and more diverse corpus of text, and has been fine-tuned for better understanding and generation of natural language. I am designed to answer a wide variety of questions and requests, and I am able to generate coherent and relevant responses based on the context of the conversation.https://i.ibb.co/3C10mh9/Screenshot-2024-03-01-at-20-28-41.png

dimfeld · Accepted Answer

LLMS tend to be pretty bad at answering questions about which one it is, what version, etc. You can put stuff into the system prompt to try to help it answer better, but otherwise the LLM has little to no intrinsic knowledge about itself and whatever happens to be in the training data shows up instead (which now is a bunch of ChatGPT output all over the internet).

shawnz · Answer

What makes you think it's not normal that it does this? It's a statistical model that predicts the most likely response to your prompt, and the internet is full of news and references to GPT these days as well as GPT generated output, so isn't it expected that the most likely response to such a prompt might refer to GPT-3?

mark_l_watson · Answer

Just a guess: they trained on ChatGPT output, used for instruction fine tuning?

geor9e · Answer

All that tells you is that "what model number are you" statistically almost never otherwise occurs on the open internet, except for when people post chatGPT transcripts. When in human history has anything been simultanously anthropromophized ("are you") and is a numbered "model"? It approximates the next token, based on it's data set. If you ask an LLM about itself, you'll either get a scripted answer from a top layer fine-tuning, or a hallucination letting it be anything that's ever existed, ordered by statistical similarity. It replied exactly what one should expect.

gremlinsinc · Answer

I can tell it, it's shakespeare, and then it'll believe it and quoth that backest to me.It is a GPT model though technically, GPT stands for "Generative Pre-trained Transformer," a type of artificial intelligence (AI) model, it's not gpt-3.5 from openai, but it IS a GPT model.

jackson1372 · Answer

It's an open secret that Mistral finetunes on GPT outputs

aristofun · Answer

It seems you were successfully fooled into attributing an intelligence to it :)Otherwise this question wouldn&rsquo;t arise and you wouldn&rsquo;t use &ldquo;he&rdquo; to point to a computer program ;)

yieldcrv · Answer

cutoffs and self identities are not in the LLM, theyre in the system promptsif your system prompt doesnt have this information then the LLM makes it up based on what was in its training data

pushfoo · Answer

TL;DR: YesThere are also some fun interactions like telling a model that it's ChatGPT can improve its output quality [1]. Training on output from other models has its own risks, as do techniques like model merges.[1] https://twitter.com/abacaj/status/1736819789841281372

Is it normal that Mistral answers he's GPT?

Just a guess: they trained on ChatGPT output, used for instruction fine tuning?

I can tell it, it's shakespeare, and then it'll believe it and quoth that backest to me.
It is a GPT model though technically, GPT stands for "Generative Pre-trained Transformer," a type of artificial intelligence (AI) model, it's not gpt-3.5 from openai, but it IS a GPT model.

It's an open secret that Mistral finetunes on GPT outputs

It seems you were successfully fooled into attributing an intelligence to it :)
Otherwise this question wouldn’t arise and you wouldn’t use “he” to point to a computer program ;)

cutoffs and self identities are not in the LLM, theyre in the system prompts
if your system prompt doesnt have this information then the LLM makes it up based on what was in its training data

TL;DR: Yes
There are also some fun interactions like telling a model that it's ChatGPT can improve its output quality [1]. Training on output from other models has its own risks, as do techniques like model merges.
[1] https://twitter.com/abacaj/status/1736819789841281372