My latest attempts were with 4 bit quants of Qwen 3.5, both 9b and 35B.
Both, on my very first query, something along the lines of "sup dog" or "how does beer a compare to beer b" led to an endless loop of thinking that I eventually had to manually stop in each case.
And yet I keep seeing passing comments about people using local LLMs to be productive.
Just curious what your strategies are, what the usecases are, and anything I may be missing.
If all the stars are aligned, Qwen 3.5 will not exhibit outright looping, although it will still burn more thinking tokens than some other models. There are ways to tone down the overthinking or disable it entirely, though, and the models are still quite capable when configured that way.