Local LLM's

Question

I've been wanting to run LLM's locally and it looks like there is a huge amount of interest from others as well to finally run and create our own chat style models.I came across https://github.com/jmorganca/ollama in a wonderful HN submission a few days ago. I do have a Macbook Pro M1 that was top of the line in 2022, the only problem is I have Debian on it as I use Linux.Could someone point me in the right direction for a beginner like my self on how to run for example Wizard Vicuna Uncensored locally on Linux? I would very much appreciate it, thanks for reading.

version_five · Accepted Answer

Llama.cpp and you can download one of the quantized models directly from "thebloke" on HF. I can't 100% vouch for it because I have no idea how it builds under linux on apple silicon, I'd be very interested to know if there are any issues and how well it uses the processor.
https://github.com/ggerganov/llama.cpp https://huggingface.co/TheBloke
You should be able to at least run the 7B and probably the 13B.
For reference, I can run the 7B just fine on my 2021 Lenovo laptop with 16GB ram (and ubuntu 20.04)

Ms-J · Answer

Thanks all for replying, I'm sorry I didn't realize that there were replies until now. The advice is great and I'll see if I can get some of the models I referenced running under Linux now and will report back with a write up on how it was achieved if successful.

Patrick_Devine · Answer

Ollama does work on Linux, it's just that we haven't (yet) made it work with GPUs other than Metal. We'll get there soon, but we're a small team and wanted to make sure everything was working well before adding more platforms.You can build it yourself with `go build .` if you've cloned the repository.

brucethemoose2 · Answer

Koboldcpp (a nice frontend for llama.cpp) is The Way.You really want to run OSX though, as its not very fast without Metal (or Vulkan). Also, you need a relatively high memory M1 model to run the better llama variants.

fsmv · Answer

I believe to get the M1 efficiency for LLMs they use the Metal API which I don't think will work on Linux. You may have to dual boot to use it for ML.

gorenb · Answer

I use Ubuntu only on my computer, and Oobagooba text generation web ui really helped. I hope this helps you!

smoldesu · Answer

There shouldn't be any real roadblocks in your setup. If you can find an inferencing tool with ARM support, you should be good to go.