Affordable hardware for running local large language models?

Question

A while back there was a post about running stable diffusion on a Raspberry Pi Zero 2 [1] which was slow but incredibly impressive! And that sparked my curiosity with this question, what is considered affordable hardware for running large language models locally today? I'm aware that there's a lot of work underway to make inference cheap to run at the edge but I'm curious to understand the landscape at present that anyone could purchase. I seen people running models on flagship smartphones but those are more expensive then a mac mini with worse performance.By affordable I mean no greater then the cost of a current gen base model mac mini ($599) but ideally around the price of a raspberry pi 5 ($79) which when searching for budget PC gets mentioned[2]. Both devices have the same amount of ram in my case (8gb) but different performance observed given the importance of memory bandwidth. I mention these two because I've had experience running llama 3 via ollama on both with success although of slower speeds compared to a full workstation with a commodity GPU i.e. RTX 4090 which starts at ($1599). I'm interested in learning about what other devices are out there that people consider cheap and use for LLMs locally.[1]: https://news.ycombinator.com/item?id=38646969[2]: https://www.pcmag.com/picks/the-best-budget-desktop-computers

ysleepy · Accepted Answer

I simply bought 4x32GB ddr4 memory (~200 bucks) for a normal desktop mainboard and a high-thread-count cpu.
You can experiment with a lot of models, it's just going to be slow.
With ddr5 you can even go higher with 48GB modules.
Otherwise I got a 3060 12G, which can be had for 200€ used.
Its a very affordable setup.

instagib · Answer

The energy/compute cost per performance is not a good ratio due to current optimization. Old hardware makes it worse.
Consider making friends with people who have a good desktop or laptop computer to see if you can use it for a little while when visiting and making them a meal or coffee.
If you give up on local, it reduces the cost by using servers.
Give up on performance and allow hallucination for an introduction to llm’s is my only option for budget and local. A very specific spellcheck or similar based llm would be possible on limited hardware.
Iirc, there is a publication on 1.3bit or 1.4bit quantization that someone implemented on GitHub.

angoragoats · Answer

This might technically be outside your budget, but if you happen to have a PC, I highly recommend the RTX 4060 Ti 16GB ($450, or less if on sale). It can easily handle 13B models and is quite fast. You don&rsquo;t need a fancy PC to put it into; anything with a spare PCIe slot and a reasonable sized power supply will work.These cards can easily be found at MSRP because they&rsquo;re not a great improvement over the 3060/4060 8GB for gaming, but the added memory makes them excellent for AI.

roosgit · Answer

About a year ago I bought some parts to build a Linux PC for testing LLMs with llama.cpp. I paid less than $200 for: a B550MH motherboard, AMD Ryzen 3 4100, 16GB DDR4, 256GB NVMe SSD. I already had an old PC case with a 350W PSU and a 256MB video card because the PC wouldn’t boot without one.
I looked today on Newegg and similar PC components would cost $220-230.
From a performance perspective, I get about 9 tokens/s from mistral-7b-instruct-v0.2.Q4_K_M.gguf with a 1024 context size. This is with overclocked RAM which added 15-20% more speed.
The Mac Mini is probably faster than this. However the custom built PC route gives you the option to add more RAM later on to try bigger models. It also lets you add a decent GPU. Something like a used 3060, as one of comments says.

pquki4 · Answer

I think you need to be very clear about what your goal is -- just playing with different "real" hardware, or running some small models as experiments, or trying to do semi-serious work with LLMs? How much do you want to spend on hardware and electricity in the long run, and how much are you willing to "lose"? e.g. if a setup turns out to be not very useful and hard to repurpose it because you already have too many computers, and you need to either sell it or throw it away, what's your limit?Depending on your answer, I suspect you might want to use Google Colab Pro/Paperspace/AWS/vast.ai instead of building your own hardware.

lemonlime0x3C33 · Answer

I have used a raspberry pi for running image classification CNN&rsquo;s, it really depends on the model you are using. Edge IoT AI is making a lot of progress targeting running AI on resource constrained devices.If you have access to the dataset you want you could train one yourself to fit on your target hardware. You could also look at FPGA solutions if you are comfortable working with those. Training locally might take some time but you could use google Codelab to train it.

wokwokwok · Answer

You can use a raspberry pi with 8 GB of ram to run a quantised 7B model (eg. https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF), or any cheap stick pc.
For larger models, or GPU accelerated inference, there is no “cheap” solution.
Why do you think everyone is so in love with the 7B models?
It’s not because they’re good. They’re just ok, and it’s expensive to run larger models.

ilaksh · Answer

What kind of computer do you currently have? Try phi3. It's amazing and under 4GB. If you want something affordable then the one rule is to stay away from Apple products. Maybe a Ryzen. https://www.geekompc.com/geekom-a7-mini-pc-ryzen-7000/AMD Ryzen 7 5800H

nubinetwork · Answer

I'm happy with a 2950x and a Radeon VII... but that costs more than your example Mac mini.

cjbprime · Answer

There are some ARM SBCs with e.g. 32GB RAM and an NPU for under $300, such as the Orange Pi 5 Plus, but I'm guessing refurbished Apple Silicon hardware is the best answer for the price.

p1esk · Answer

what is considered affordable hardware for running large language models locally today?I&rsquo;d say &ldquo;under $20k&rdquo; is considered affordable. In comparison, a single H100 server is $250k. At least if you want to run decent models (>70B) at bearable speeds (>1t/s).Your optimal choice today is Mac Studio with 192GB of unified memory (~$7k). But it will be too slow to run something like llama 400B.

espinielli · Answer

maybe LLaMA helps https://justine.lol/oneliners/