Why is GPT4 better than the other major LLMs?

Question

Having used GPT4, PaLM and Claude, it is quite clear to me that GPT4 is an order of magnitude better than these other LLMs. Google, Anthropic and everyone else are investing oodles of resources and the best talent to catch up so why are they (seemingly) not able to? There is a general sentiment that LLMs have no lasting moat but OpenAI seems to (for now) have one in terms of a better product. The big question is why?

makin · Accepted Answer

There is a theory that GPT-4's secret sauce is a combination of commissioned high quality training data (which they don't really hide) and an unknown implementation of Mixture-of-Experts [1].
No proof, of course, as the other comments have said, they won't share. The Open AI name is still ironic. I've also still not seen another successful attempt of MoE by any other company, which you would expect to if it was true.
[1] https://hkaift.com/the-next-llms-development-mixture-of-expe...

truetraveller · Answer

I believe it's because it is heavily massaged. On two fronts:
1) By engineers, who employ tricks on the input, and perhaps the output. The input especially. So, when you type into ChatGPT, that input gets parsed using non-LLM techniques and/or heuristics, etc. This is primarily to get the semantics right. Non-LLM techniques can actually be pretty powerful, but the synergy of non-LLM and LLM is incredible.
2) By human farms, who essentially "upvote" and/or add corrections to ChatGPT results, and feed this back into the system. See OpenAI Kenyan workers (I believe the Kenyan workers, on paper, were for "moderation", but nothing stops them from also upvoting/correcting).

kwant_kiddo · Answer

I think one clear difference is that they are just so focused on shipping compared to the others, and they gain many of the benefits that comes with that.
I think it is reflected in both Greg and Sam that they really want to ship, and this have made a positive feedback loop into the team and what talent they have been able to acquire but also build up.
Another point might be that AI chatbots are a first-movers market. Even if Grok turned out to be much better I would still miss some of the UI features that ChatGPT provide along with my chat-history.
In regards to their fast shipping I think it is also reflected in their tech-stack. I suspect from reading their job posts (I might be very wrong here) that they started just coding everything in Python and the tooling/ecosystem that goes along like FastAPI/Django etc. maybe a bit C++/CUDA for the training. Then when they needed to scale they migrated from Python to Rust in the more critical areas of the codebase. They clearly also have a monorepo mentioned from [1].
if you look through the their career-page the job description of a software engineer for developer-productivity [1] mentions "Our current environment relies heavily on Python, Rust, and C++" also "Are a proficient Python programmer, with experience using Rust in production" I found an earlier one where they mentioned that their backend was written in Python. "Proficiency with some backend language (we use Python)" [2]:
1:https://openai.com/careers/software-engineer-developer-produ...
2:https://openai.com/careers/software-engineer-leverage-engine...

ilaksh · Answer

I don't think it's 10 times better. It is better. But I think it comes down to the size of the model and the training/training techniques. OpenAI seems to have invested a lot in human reinforcement feedback. Plus they have ways to do automated reinforcement I think. Also Google and Anthropic are basically deliberately holding back their strongest models because they are too expensive and/or they are worried about safety or something.And note that there are new versions of models from Anthropic that have just released or could release within a few months.

DantesKite · Answer

If we knew, the other LLM's would be better.The correct answer is: Nobody outside of OpenAI technical staff currently knows.

f0e4c2f7 · Answer

I don't think this is the only factor but I suspect part of it is because GPT-4 had access to better datasets.
After ChatGPT came out a lot of the places GPT was assumed to be trained on (reddit, twitter) started closing their APIs.
This alone represents a pretty significant moat.

phlakaton · Answer

Define "order of magnitude better." Faster? Safer? Bigger buffer? More accurate? More comprehensible? Better-crafted text?
From what I've seen on Claude discussion forums, Claude users generally assert that GPT4 requires a lot more manual handling due to a smaller buffer, and produces weird, long-winded answers that are now the stuff of memes. Whatever beef they have with Claude's safety features, the input/output peculiarities of Claude make it well suited for the kinds of writing they want it to do.
You should consider whether the LLMs are really comparable in the way you want them to be comparable.

iJohnDoe · Answer

This an anecdotal.I tried LM Studio the other day, which is a cool project, by the way.Downloaded a dozen different LLM based on the recommendations and reviews and size and loss of accuracy, etc. None of them were able to help with a project I&rsquo;m currently working on. They were all so grossly wrong or plain weird responses. I have the source data for the project I&rsquo;m working on, so I know what&rsquo;s correct and what isn&rsquo;t.GPT-4 is accurate and very helpful with the project. It&rsquo;s light years ahead on everything I&rsquo;ve used it for.

og_kalu · Answer

I mean, one thing people miss/forget is that we still don't have any LLM with the estimated compute budget of 4 that has failed to match it. Palm, Claude etc are all lower.If Gemini fails to at least match 4 then we can start speculating on a secret sauce.

Why is GPT4 better than the other major LLMs?

If we knew, the other LLM's would be better.
The correct answer is: Nobody outside of OpenAI technical staff currently knows.

I don't think this is the only factor but I suspect part of it is because GPT-4 had access to better datasets.
After ChatGPT came out a lot of the places GPT was assumed to be trained on (reddit, twitter) started closing their APIs.
This alone represents a pretty significant moat.

I mean, one thing people miss/forget is that we still don't have any LLM with the estimated compute budget of 4 that has failed to match it. Palm, Claude etc are all lower.
If Gemini fails to at least match 4 then we can start speculating on a secret sauce.

Why is GPT4 better than the other major LLMs?

If we knew, the other LLM's would be better.The correct answer is: Nobody outside of OpenAI technical staff currently knows.

I don't think this is the only factor but I suspect part of it is because GPT-4 had access to better datasets.After ChatGPT came out a lot of the places GPT was assumed to be trained on (reddit, twitter) started closing their APIs.This alone represents a pretty significant moat.

I mean, one thing people miss/forget is that we still don't have any LLM with the estimated compute budget of 4 that has failed to match it. Palm, Claude etc are all lower.If Gemini fails to at least match 4 then we can start speculating on a secret sauce.

If we knew, the other LLM's would be better.
The correct answer is: Nobody outside of OpenAI technical staff currently knows.

I don't think this is the only factor but I suspect part of it is because GPT-4 had access to better datasets.
After ChatGPT came out a lot of the places GPT was assumed to be trained on (reddit, twitter) started closing their APIs.
This alone represents a pretty significant moat.

I mean, one thing people miss/forget is that we still don't have any LLM with the estimated compute budget of 4 that has failed to match it. Palm, Claude etc are all lower.
If Gemini fails to at least match 4 then we can start speculating on a secret sauce.