No proof, of course, as the other comments have said, they won't share. The Open AI name is still ironic. I've also still not seen another successful attempt of MoE by any other company, which you would expect to if it was true.
[1] https://hkaift.com/the-next-llms-development-mixture-of-expe...
1) By engineers, who employ tricks on the input, and perhaps the output. The input especially. So, when you type into ChatGPT, that input gets parsed using non-LLM techniques and/or heuristics, etc. This is primarily to get the semantics right. Non-LLM techniques can actually be pretty powerful, but the synergy of non-LLM and LLM is incredible.
2) By human farms, who essentially "upvote" and/or add corrections to ChatGPT results, and feed this back into the system. See OpenAI Kenyan workers (I believe the Kenyan workers, on paper, were for "moderation", but nothing stops them from also upvoting/correcting).
I think it is reflected in both Greg and Sam that they really want to ship, and this have made a positive feedback loop into the team and what talent they have been able to acquire but also build up.
Another point might be that AI chatbots are a first-movers market. Even if Grok turned out to be much better I would still miss some of the UI features that ChatGPT provide along with my chat-history.
In regards to their fast shipping I think it is also reflected in their tech-stack. I suspect from reading their job posts (I might be very wrong here) that they started just coding everything in Python and the tooling/ecosystem that goes along like FastAPI/Django etc. maybe a bit C++/CUDA for the training. Then when they needed to scale they migrated from Python to Rust in the more critical areas of the codebase. They clearly also have a monorepo mentioned from [1].
if you look through the their career-page the job description of a software engineer for developer-productivity [1] mentions "Our current environment relies heavily on Python, Rust, and C++" also "Are a proficient Python programmer, with experience using Rust in production" I found an earlier one where they mentioned that their backend was written in Python. "Proficiency with some backend language (we use Python)" [2]:
1:https://openai.com/careers/software-engineer-developer-produ...
2:https://openai.com/careers/software-engineer-leverage-engine...
And note that there are new versions of models from Anthropic that have just released or could release within a few months.
The correct answer is: Nobody outside of OpenAI technical staff currently knows.
After ChatGPT came out a lot of the places GPT was assumed to be trained on (reddit, twitter) started closing their APIs.
This alone represents a pretty significant moat.
From what I've seen on Claude discussion forums, Claude users generally assert that GPT4 requires a lot more manual handling due to a smaller buffer, and produces weird, long-winded answers that are now the stuff of memes. Whatever beef they have with Claude's safety features, the input/output peculiarities of Claude make it well suited for the kinds of writing they want it to do.
You should consider whether the LLMs are really comparable in the way you want them to be comparable.
I tried LM Studio the other day, which is a cool project, by the way.
Downloaded a dozen different LLM based on the recommendations and reviews and size and loss of accuracy, etc. None of them were able to help with a project I’m currently working on. They were all so grossly wrong or plain weird responses. I have the source data for the project I’m working on, so I know what’s correct and what isn’t.
GPT-4 is accurate and very helpful with the project. It’s light years ahead on everything I’ve used it for.
If Gemini fails to at least match 4 then we can start speculating on a secret sauce.