Is the next big thing locally running coding agents?

Question

There's extreme price escalation on part of Anthropic, with token spend now approaching levels that have made many-an-enterprise scratch their heads.At the same time, judging by opensource advances (E.g. Qwen 3.6 27B), hosting a smart enough local LLM on 16GB VRAM (or equivalent) is increasingly becoming a reality. Lastly, I see most coding to be of intermediate difficulty, not beyond.Seems to me it's a matter of time that people shift to free Claude Code type experiences, powered by local LLMs.What do you think?

giwook · Accepted Answer

This seems like an obvious progression imo though I think very much subject to change. Open weight models will become better, and memory prices will return to normal prices in a couple years (hopefully).
That being said I think an unpredictable variable here is how the companies building frontier models respond to what should be a noticeable inflection point in consumers turning towards locally hosted open weight models.
There is also a significant amount of compute that is being built out as we speak that should in theory reduce costs for providers of frontier models but that's a whole other can of worms.
Despite all of the very impressive open weight models that are available to us today, Anthropic and OpenAI continue to remain steps ahead of the competition. Most of the biggest and brightest minds in AI are working at frontier labs. It's not hard to foresee that these labs continue to maintain their edge given the amount of expertise and brainpower they've assembled.
Assuming frontier models continue to maintain their edge, even if it's on a subset of tasks (e.g. reasoning, judgment, planning), I see a convergence towards a hybrid workflow where both frontier and local models are used for specific tasks. e.g. Claude for reasoning, planning, judgment, with intelligent routing to cheap/free models tuned for certain tasks.

jonahbenton · Answer

There are many markets. Qwen 3.6 27b at a high enough quant is good enough for many use cases. But enterprise-consumed tokens come with legal/data protection agreements. They have just gotten comfortable with BYOD- there is no BYOD equivalent set of practices and protections for local LLMs (BYOLLM). So some enterprises are getting back into prem GPU capacity.

damnitbuilds · Answer

I got Qwen 3.6 running locally on 12GB VRAM.

It went:

  AI: "I see you are building a Django project. How can I help?"

  Me: "When I click on the Reload button, it does not set the reload option correctly. Fix this"

     <10 minutes>

  AI: "I see you are building a Django project. How can I help?"

Needs more tweaking of the context window, I think.

Seriously, I agree that this is the future, when OpenAI et al have gone bust.