HACKER Q&A
📣 virgildotcodes

How do you use local LLMs productively?


I've been periodically testing the strongest reported models as they come out, and which can fit on my 32GB M1 Max. I've yet to find one that I feel is genuinely useful.

My latest attempts were with 4 bit quants of Qwen 3.5, both 9b and 35B.

Both, on my very first query, something along the lines of "sup dog" or "how does beer a compare to beer b" led to an endless loop of thinking that I eventually had to manually stop in each case.

And yet I keep seeing passing comments about people using local LLMs to be productive.

Just curious what your strategies are, what the usecases are, and anything I may be missing.


  👤 Cytoplast3528 Accepted Answer ✓
I think only Claude Sonnet/Opus, GPT 5.2+, Minimax M2.5 are useful. They are all nearly impossible to self-host, unfortunately.

👤 andsoitis
Lots of conversation on this topic yesterday: https://news.ycombinator.com/item?id=47363754

👤 CamperBob2
Qwen 3.5 was plagued by some premature quant releases and unclear/incomplete guidelines for the sampling parameters. Especially if you are having looping problems, make sure you are using the very latest model files, executables, and recommended params.

If all the stars are aligned, Qwen 3.5 will not exhibit outright looping, although it will still burn more thinking tokens than some other models. There are ways to tone down the overthinking or disable it entirely, though, and the models are still quite capable when configured that way.