HACKER Q&A
📣 Ms-J

Local LLM's


I've been wanting to run LLM's locally and it looks like there is a huge amount of interest from others as well to finally run and create our own chat style models.

I came across https://github.com/jmorganca/ollama in a wonderful HN submission a few days ago. I do have a Macbook Pro M1 that was top of the line in 2022, the only problem is I have Debian on it as I use Linux.

Could someone point me in the right direction for a beginner like my self on how to run for example Wizard Vicuna Uncensored locally on Linux? I would very much appreciate it, thanks for reading.


  👤 version_five Accepted Answer ✓
Llama.cpp and you can download one of the quantized models directly from "thebloke" on HF. I can't 100% vouch for it because I have no idea how it builds under linux on apple silicon, I'd be very interested to know if there are any issues and how well it uses the processor.

https://github.com/ggerganov/llama.cpp https://huggingface.co/TheBloke

You should be able to at least run the 7B and probably the 13B.

For reference, I can run the 7B just fine on my 2021 Lenovo laptop with 16GB ram (and ubuntu 20.04)


👤 Ms-J
Thanks all for replying, I'm sorry I didn't realize that there were replies until now. The advice is great and I'll see if I can get some of the models I referenced running under Linux now and will report back with a write up on how it was achieved if successful.

👤 Patrick_Devine
Ollama does work on Linux, it's just that we haven't (yet) made it work with GPUs other than Metal. We'll get there soon, but we're a small team and wanted to make sure everything was working well before adding more platforms.

You can build it yourself with `go build .` if you've cloned the repository.


👤 brucethemoose2
Koboldcpp (a nice frontend for llama.cpp) is The Way.

You really want to run OSX though, as its not very fast without Metal (or Vulkan). Also, you need a relatively high memory M1 model to run the better llama variants.


👤 fsmv
I believe to get the M1 efficiency for LLMs they use the Metal API which I don't think will work on Linux. You may have to dual boot to use it for ML.

👤 gorenb
I use Ubuntu only on my computer, and Oobagooba text generation web ui really helped. I hope this helps you!

👤 smoldesu
There shouldn't be any real roadblocks in your setup. If you can find an inferencing tool with ARM support, you should be good to go.