Recommendations for Local LLMs in 2024: Private and Offline?

Question

I'm in search of a local LLM that can run completely offline for processing personal documents. Key requirements include privacy (no data leaves my machine) and performance (efficient with large datasets). Any recommendations for open-source / commercial solutions that fit the bill in 2024? Also, what's the current state of local LLMs&mdash;are: Are they practical and useful, or still facing significant limitations?

dvt · Accepted Answer

> Are they practical and useful, or still facing significant limitations?They are. Working on a product using a fine-tuned Mistral-7B-Instruct-v0.2 model and it's pretty mind-blowing. Works flawlessly on my RTX3090 and serviceable on my M1 MBP as well. I'm building in Rust (using the candle crate), but for personal usage Python is probably the better choice since it's easier to get up and running.

xyc · Answer

We recently added support for local document chat in RecurseChat (https://recurse.chat), including chatting with PDFs and markdown. You can see a demo here: https://twitter.com/chxy/status/1777234458372116865
RAG happens all locally (local embedding model and local vector db).
The app is secured by Mac App Sandbox, meaning it only have access to your selected file in the system dialog or drag and dropped files. If you use a local LLM, everything works offline.

shreezus · Answer

I run local LLM's (Mistral-7B-Instruct-v0.2) using LM Studio (Ollama works well too I believe) and host a local server on my Mac. I can hit the endpoints the same way you would with OpenAI's chat completions API, and can trigger it inline across my other applications using MindMac.

theolivenbaum · Answer

> I'm in search of a local LLM that can run completely offline for processing personal documents. Key requirements include privacy (no data leaves my machine) and performance (efficient with large datasets). Any recommendations for open-source / commercial solutions that fit the bill in 2024? Also, what's the current state of local LLMs&mdash;are: Are they practical and useful, or still facing significant limitations?We've added support for it in our app if you wanna give it a try: https://curiosity.ai

Havoc · Answer

You need suitable hardware (ideally a 3090, 4090 or an Apple M device with a decent amount of mem).
Then set up software - ollama for easy mode (but less control) or text-generation-webui for more control.
After that you can just try models. The subreddit /r/localllama has whatever is flavour of the week. The Mixtral model at like Q3 quantization is probably a good starting point

fcautomation · Answer

You can do this with Lamma2. There are multiple ways to compile it unless you use python. If you don't have familiarity with cpp i would just stick to python and save yourself time. Buy a big PC that can handle it.

runjake · Answer

Use ollama and browse the available models, download some, and try them out. Ollama is a llama.cpp front end.https://ollama.ai

hollowpython · Answer

Are there any which can generate consistent characters (especially faces)?

nickdothutton · Answer

My kingdom for a local LLM supporting my trusty Intel MacPro and AMD RX6900 XT!

datascienced · Answer

jan.ai