HACKER Q&A
📣 surrTurr

What's the best self hosted/local alternative to GPT-4?


Constant outages and the model seemingly getting nerfed[^1] are driving me insane. Which viable alternatives to GPT-4 exist? Preferably self-hosted (I'm okay with paying for it) and with an API that's compatible with the OpenAI API.

[^1]: https://news.ycombinator.com/item?id=36134249


  👤 wokwokwok Accepted Answer ✓
There is literally no alternative.

You’re stuck with openai, and you’re stuck with whatever rules, limitations or changes they give you.

There are other models, but specifically if you’re actively using gpt-4 and find gpt-3.5 to be below the quality you require…

Too bad. You’re out of luck.

Wait for better open source models or wait patiently for someone to release a meaningful competitor, or wait for openai to release a better version.

That’s it. Right now, there’s no one else letting people have access to their models which are equivalent to gpt-4.


👤 jonathan-adly
I don't know the licensing and all that jazz (even if you self-host for your personal use it shouldn't matter). But, this paper[0] released a week ago claims " 99.3% of the performance level of ChatGPT while only requiring 24 hours of finetuning on a single GPU" (QLORA).

A quick test of the huggingface demo gives reasonable results[1]. The actual model behind the space is here[2], and should be self-hostable with reasonable effort.

0. https://arxiv.org/abs/2305.14314 1. https://huggingface.co/spaces/uwnlp/guanaco-playground-tgi 2. https://huggingface.co/timdettmers/guanaco-33b-merged


👤 catwalk_moto
I lay out the whole LLM landscape in this article: https://medium.com/@damngoodtech/vital-gpt-and-llm-understan.... Even if you aren't a business it might help.

And this spreadsheet shows a pretty comprehensive list of LLMs: https://anania.ai/chatgpt-alternatives/

Currently the "best" ones seem to be Llama and Dolly. Dolly can be used commercially, and Llama cannot, so it's best for personal use.

I myself have been trying to get [the huggingface chat ui](https://github.com/huggingface/chat-ui) running on my own system, but it's finicky. Right now I'm focused on getting immediate income so I can't spend too much effort on it.

Overall, no model gets close to the accuracy of GPT-3 or 4 (though Llama does decently), though I can definitely imagine in 3 years or so open source can match or even exceed the capabilities of OpenAI's model.


👤 TradingPlaces
As people note, you cannot substitute locally for the Azure GPU cloud that GPT-4 runs on. But I believe that will change, and maybe quickly. After years of explosive exponential growth in model size, all of a sudden, small is beautiful.

The precipitating factor is that running large models for research is very expensive, but pales in comparison to putting these things into production. Expenses rise exponentially with model size. Everyone is looking for ways to make the models smaller and run at the edge. I will note that PaLM 2 is smaller than PaLM, the first time I can remember something like that happening. The smallest version of PaLM 2 can run at the edge. Small is beautiful.


👤 weystrom
https://github.com/oobabooga/text-generation-webui/

Works on all platforms, but runs much better on Linux.

Running this in Docker on my 2080Ti, can barely fit 13B-4bit models into 11G of VRAM, but it works fine, produces around 10-15 tokens/second most of the time. It also has an API, that you can use with something like LangChain.

Supports multiple ways to run the models, purely with CUDA (I think AMD support is coming too) or on CPU with llama.cpp (also possible to offload part of the model to GPU VRAM, but the performance is still nowhere near CUDA).

Don't expect open-source models to perform as well as ChatGPT though, they're still pretty limited in comparison. Good place to get the models is TheBloke's page - https://huggingface.co/TheBloke. Tom converts popular LLM builds into multiple formats that you can use with textgen and he's a pillar of local LLM community.

I'm still learning how to fine-tune/train LoRAs, it's pretty finicky, but promising, I'd like to be able to feed personal data into the model and have it reliably answer questions.

In my opinion, these developments are way more exciting than whatever OpenAI is doing. No way I'm pushing my chatlogs into some corp datacenter, but running locally and storing checkpoints safely would achieve my end-goal of having it "impersonate" myself on the web.


👤 davepeck
There are no viable self-hostable alternatives to GPT-4 or even to GPT3.5.

The “best” self-hostable model is a moving target. As of this writing it’s probably one of Vicuña 13B, Wizard 30B, or maybe Guanaco 65B. I’d like to say that Guanaco is wildly better than Vicuña, what with its 5x larger size. But… that seems very task dependent.

As anecdata: my experience is that none of these is as good as even GPT3.5 for summarization, extraction, sentiment analysis, or assistance with writing code. Figuring out how to run them is painful. The speed at which their unquantized variants run on any hardware I have access to is painful. Sorting through licensing is… also painful.

And again: they’re nowhere close to GPT-4.


👤 amilios
How much GPU memory do you have access to? If you can run it, Guanaco-65B is probably as close as you can get in terms of something publicly available. https://github.com/artidoro/qlora. But as other comments mention, it's still noticeably worse in my experience.

👤 DebtDeflation
LLM Leaderboard:

https://chat.lmsys.org/?leaderboard

The short answer is that nothing self hosted can come close to GPT-4. The only thing that comes close period is Anthropic's Claude.


👤 deet
In our experimentation, we've found that it really depends what you're looking for. That is you really need to break down down evaluation by task. Local models don't have the power yet to just "do it all well" like GPT4.

There are open source models that are fine tuned for different tasks, and if you're able to pick a specific model for a specific use case you'll get better results.

---

For example, for chat there are models like `mpt-7b-chat` or `GPT4All-13B-snoozy` or `vicuna` that do okay for chat, but are not great at reasoning or code.

Other models are designed for just direct instruction following, but are worse at chat `mpt-7b-instruct`

Meanwhile, there are models designed for code completion like from replit and HuggingFace (`starcoder`) that do decently for programming but not other tasks.

---

For UI the easiest way to get a feel for quality of each of the models (or, chat models at least) is probably https://gpt4all.io/.

And as others have mentioned, for providing an API that's compatible with OpenAI, https://github.com/go-skynet/LocalAI seems to be the frontrunner at the moment.

---

For the project I'm working on (in bio) we're currently struggling with this problem too since we want a nice UI, good performance, and the ability for people to keep their data local.

So at least for the moment, there's no single drop-in replacement for all tasks. But things are changing every week and every day, and I believe that open-source and local can be competitive in the end.


👤 daryl149
For personal use, check out https://github.com/imartinez/privateGPT. It's lightweight and has lots of momentum from the OS community. There's even an open PR to support huggingface LLMs. For business use, here's some shameless self promotion: https://mirage-studio.io/private_chatgpt. We offer a version that can be hosted on your own GPU cluster.

👤 simonw
The answer to this question changes every week.

For compatibility with the OpenAI API one project to consider is https://github.com/go-skynet/LocalAI

None of the open models are close to GPT-4 yet, but some of the LLaMA derivatives feel similar to GPT3.5.

Licenses are a big question though: if you want something you can use for commercial purposes your options are much more limited.


👤 Gijs4g
> Preferably self-hosted (I'm okay with paying for it)

I'm the founder of Mirage Studio and we created https://www.mirage-studio.io/private_chatgpt. A privacy-first ChatGPT alternative that can be hosted on-premise or on a leading EU cloud provider.


👤 cypress66
Nothing self hosted is even remotely close to gpt 3.5, let alone gpt4.

Wizardlm-uncensored-30B is fun to play with.


👤 MacsHeadroom
Guanaco-65B[0] using Basaran[1] for your OpenAI compatible API.

(You can use any ChatGPT front-end which lets you change the OpenAI endpoint URL.)

[0] https://huggingface.co/TheBloke/guanaco-65B-HF A QLoRA finetune of LLaMA-65B by Tim Dettmers from the paper here: https://arxiv.org/abs/2305.14314

[1] https://github.com/hyperonym/basaran


👤 zorrobyte
What's the best self hosted for ingesting a local codebase and wiki to ask questions of it? Some of the projects linked here have ingest scripts for doc, pdf files; but it'd be cool to ingest a whole git repo and wiki, have a little chat interface to ask questions about the code.

👤 sgd99
Not self-hosted/local but Claude by Anthropic from what I've heard is really good but the API is not publicly available. It's apparently accessible via Poe (https://poe.com)

As for open models, HuggingFace has a nice leaderboard to see which ones are decent: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderb...


👤 f0e4c2f7
Nothing open source is quite as good as GPT-4 yet but the community continues to edge closer.

For general use Falcon seems to be the current best:

https://huggingface.co/tiiuae

For code specifically Replit's model seems to be the best:

https://huggingface.co/replit/replit-code-v1-3b


👤 ijk
"Okay with paying for it" gives you a wide range of options.

Most of the open source stuff people are talking about is things like running a quantized 33B parameter LLaMA model on a 3090. That can be done on consumer hardware, but isn't quite as good at general purpose queries as GPT-4. Depending on your use case and your ability to fine tune it, that might be sufficient for a number of applications. Partcularly if you've got a very specific task.

However, if you're willing to spend, there are bigger models available (e.g. Falcon 40B, LLaMA 65B) that can be run on data server class machines, if you're willing to spend $15-20K.

Will that get you GPT-4 level inference? Probably not (though it is difficult to quantify); will it get you a high-quality model that can be further fine-tuned on your own data? Yes.

For the smaller models, the fine-tunes for various tasks can be fairly effective; in a few more weeks I expect that they'll have continued to improve significantly. There's new capabilities being added every week.

The biggest weakness that's been highlighted in research is that the open source models aren't as good at the wide range of tasks that OpenAI's RLHF has covered; that's partly a data issue and partly a training issue.


👤 CSSer
There is a model that was just released called falcon-40B that is available for commercial user. It outperforms every other open LLM model available today. Buyer beware, however, because the license is custom[1] and has restrictions for "attributable revenues" over $1M/year. I'll leave that for you to interpret as you will.

[0]: https://huggingface.co/tiiuae/falcon-40b-instruct [1]: https://huggingface.co/tiiuae/falcon-40b-instruct/blob/main/...

EDIT: I just realized you seem to be asking for a fully realized, turn-key commercial solution. Yeah, refer to others who say there's no alternative. It's true. Something like this gives you a lot more power and flexibility, but at the cost of a lot more work building the solution as you try to apply it.


👤 captainmuon
I think you have to distinguish between self-hosted to run on CPU (like LLAMA), on consumer GPU or on big GPUs. I find the market currently very confusing.

I'm especially interested since the data center I'm working for is sitting on a bunch of A100 and I get daily requests of people asking for LLMs tuned to specific cases, who can't or won't use OpenAI for various reasons.


👤 anotheryou
Here you can try vicunia (and quite a few others) easily https://chat.lmsys.org/

They also have A/B testing with a leaderboard where vicunia wins for the self-hostable ones: https://chat.lmsys.org/?leaderboard


👤 nabakin
I would monitor and research each of these top models to determine which best fits your use case.

https://lmsys.org/blog/2023-05-25-leaderboard/

https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderb...

https://assets-global.website-files.com/61fd4eb76a8d78bc0676...

https://www.mosaicml.com/blog/mpt-7b

Also keep up to date with r/LocalLLaMA where new best open models are posted all the time.


👤 kertoip_1
You can check out this leaderboard to see a current state of LLM alternatives to GPT4

https://lmsys.org/blog/2023-05-25-leaderboard/

But unfortunately for now it seems there aren't any viable self-hosted options...


👤 AndroTux
https://gpt4all.io/ works fairly well on my 16 GB M1 Pro MacBook. It's certainly not on a level with ChatGPT, but what is?

It's a simple app download and allows you to select from multiple available models. No hacking required.


👤 samwillis
If you want/need to go cpu only then llama.cpp, and the assorted front ends people are building for it, is looking like a good project: https://github.com/ggerganov/llama.cpp

👤 boringuser2
I've gone down this rabbit hole and I want to reaffirm what the other commenters are saying: even if you use a massive model and have the compute to back it up at a reasonable pace (you likely don't), it sucks, can't even hold a candle to GPT 3.5

👤 Veen
It depends what you mean by "viable alternatives" and how much money you are prepared to spend on hardware to self-host. As others have mentioned, you can try llama.cpp and LocalAI, but for most ChatGPT-like applications, you won't get anything like as good results. I've found that using GPT-4 via the OpenAI API is somewhat more reliable than ChatGPT, either via the Playground or via a local chat interface like https://github.com/mckaywrigley/chatbot-ui

👤 RecycledEle
I often worry about aa "The Machine Stops" scenario.

GPT AI actually gives me hope. What if we can store and run an AI in a phone-sized-device that is superior to a similarly sized library of books? Can we have a rugged, solar-powered device that could survive the fall of Civilization and help us rebuild?

It would certainly have military applications in a warfare. Imagine being the 21ct century equivalent of a 1940's US Marine on Guadal Canal who need to know some survival skills. ChatGPT-on-a-phone would be handy if you could keep the battery charged.


👤 0xbadc0de5
I'll +1 the votes for Guanaco and Vicuna running with the Oobabooga text-generation-webui.

With a 4090, you can get ChatGPT 3.5 level results from Guanaco 33B. Vicuna 13B is a solid performer on more resource-constrained systems.

I'd urge the naysayers who tried the OPT and LLaMA models only to give up to note that the the LLM field is moving very quickly - the current set of models are already vastly superior to the LLaMA models from just two months ago. And there is no sign the progress is slowing - in fact, it seems to be accelerating.


👤 vs4vijay
You can find more details here - https://old.reddit.com/r/LocalGPT/

👤 colesantiago
The best self hosted/local alternative to GPT-4 is a (self hosted) GPT-X variant by OpenAI.

No kidding, and I am calling it on the record right here.

OpenAI will release an 'open source' model to try and recoup their moat in the self hosted / local space.

https://www.theinformation.com/briefings/openai-readies-new-...


👤 ludovicianul
This is a good candidate: https://github.com/imartinez/privateGPT

👤 meroes
This is like an artist getting used to Adobe’s products before they’re put behind a wall. And borrowing HN’s attitude to that, you apparently deserve it

👤 FieryTransition
You can fine tune a open source model for your task and achieve better results, at least, instead of just using them directly. But they are still not close to the openai models in generality. Huggingface is the place for exploring models, recently went through a lot of them for my use case, and they are simply not good enough, yet.

👤 born-jre
There is so much parallel progress happening left and right at the same time they are not there yet. When things like sparseGPT and models fine-tuned with data with tool ability (not just instruct data) may be soon we get there, as long as there is progress i am hopeful. Some sort of inference optimized hardware would also help.

👤 danpalmer
> Preferably self-hosted (I'm okay with paying for it)

The big models, if even available, need >100GB of graphics memory to run and would likely take minutes to warm up.

The pricing available via OpenAI/GCP/etc is only effective when you can multi-tenant many users. The cost to run one of these systems for private use would be ~$250k per year.


👤 anon291
I admittedly haven't used GPT-4 yet, but I've replaced several uses of GPT-3 with RWKV on the Raven dataset. I can load it onto my RTX 2060 with 12GB of mem (quantized of course), and use it to whittle down or summarize data for GPT.

👤 MagicMoonlight
OpenAssistant is pretty good. It still has some censorship but nowhere near the levels of commercial models.

It’s actually impressive how good it is considering the limited resources they have.



👤 cl42
Have you tried using GPT-4 via Azure? My understanding is that it's faster and more reliable.

👤 airgapstopgap
There really do not exist any alternatives, self-hosted or not. But more importantly, there may never be, what with the rising tide of AI risks and regulations discourse. It seems that soon training and opensourcing or otherwise making accessible a model of that class will be impossible, even as the cost of its production falls.

👤 leros
Is anyone using a self hosted thing to assist with parsing?

👤 0xferruccio
Buy a tinybox from tiny corp https://tinygrad.org/

👤 Saruto
Falcon 40B

👤 Marlon1788
openai not so open. should rebrand to closedai

👤 Y_Y
You could hire a human to manually respond to the queries