Build spec for home LLM box?

Question

Ive been out of the loop a bit on running models at home (with Ollama and such).I want to build an air cooled home server to run the bigger parameter models (like llama3 70b which is 40gb with quantization to 4bits). It seems like running 2 3090s or 4090s is the way to go for this.1) Does Ollama support loading the model across multiple gpus?2) Anyone have a general parts list that I can copy that works well? Id prefer to go with 3 gpus but I feel like cooling may be an issue.

Patrick_Devine · Accepted Answer

Yes, Ollama does support running on multiple GPUs. The best bang for your buck though is probably a Mac Studio for a couple reasons:
* Unified memory effectively will let you run much larger models without requiring you to add in more GPUs * GPUs will require loading the model from system memory which is always going to be slower than macOS w/ metal * Fitting multiple GPUs (especially 4090s) into a case is difficult, and motherboards that can support it are expensive (such as https://www.amazon.com/ASUS-Pro-WS-Motherboard-Server-Grade/...)
The one benefit of the 4090s is that it _should_ in theory be faster, but YMMV.