If you've recently used AI tools for professional coding work, tell us about it.
What tools did you use? What worked well and why? What challenges did you hit, and how (if at all) did you solve them?
Please share enough context (stack, project type, team size, experience level) for others to learn from your experience.
The goal is to build a grounded picture of where AI-assisted development actually stands in March 2026, without the hot air.
The suggestions are correct about 40% of the time, so I'm actually surprised when they're right, rather than becoming reliant on them. It saves me maybe 10 minutes a day.
another teammate added a length check to an input field, and his request was merged near instantly, even though it had zero unit testing. this team is incredibly cooked in the long term, i just need to ensure that i survive the short term somehow.
The manager & a senior dev on my first day told me to "Don't try to write code yourself, you should be using AI". I got encouraged to use spec-driven development and frameworks like superpowers, gsd, etc.
I'm definitely moving faster using AI in this way, but I legitimately have no idea what the fuck I am doing. I'm making PRs I don't know shit about, I don't understand how it works because there is an emphasis on speed, so instead of ramping up in a languages / technologies I've never used, I'm just shipping a ton of code I didn't write and have no real way to vet like someone who has been working with it regularly and actually has mastered it.
This time last year, I was still using AI, but using it as a pair programming utility, where I got help learn to things I don't know, probe topics / concepts I need exposure to, and reason through problems that arose.
I can't control the direction of how these tools are going to evolve & be used, but I would love if someone could explain to me how I can continue to grow if this actually is the future of development. Because while I am faster, the hope seems to be AI / Agents / LLMs will only ever get better and I will never need to have an original thought or use crtical thinking.
I have just about 4 years of professional experience. I had about 10 - 12 months of the start of my career where I used google to learn things before LLMs became sole singular focus.
I wake up every day with existential dread of what the future looks like.
Where it consistently fails: anything involving the interaction between systems. If a bug spans a queue producer and its consumer, or the fix requires understanding how a frontend state change propagates through API calls to a cache invalidation - the model gives you a confident answer that addresses one layer and quietly ignores the rest. You end up debugging its fix instead of the original issue.
My stack: Claude Code (Opus) for investigation and bug triage in a ~60k LOC codebase, Cursor for greenfield work. Dropped autocomplete entirely after a month - it interrupted my thinking more than it helped.
At work, the devs up the chain now do everything with AI – not just coding – then task me with cleaning it up. It is painful and time consuming, the code base is a mess. In one case I had to merge a feature from one team into the main code base, but the feature was AI coded so it did not obey the API design of the main project. It also included a ton of stuff you don’t need in the first pass - a ton of error checking and hand-rolled parsing, etc, that I had to spend over a week unrolling so that I could trim it down and redesign it to work in the main codebase. It was a slog, and it also made me look bad because it took me forever compared to the team who originally churned it out almost instantly. AI tools are not good at this kind of design deconflicting task, so while it’s easy to get the initial concept out the gate almost instantly, you can’t just magically fit it into the bigger codebase without facing the technical debt you’ve generated.
In my personal projects, I get to experience a bit of the fun I think others are having. You can very quickly build out new features, explore new ideas, etc. You have to be thoughtful about the design because the codebase can get messy and hard to build on. Often I design the APIs and then have Claude critique them and implement them.
I think the future is bleak for people in my spot professionally – not junior, but also not leading the team. I think the middle will be hollowed out and replaced with principals who set direction, coordinate, and execute. A privileged few will be hired and developed to become leaders eventually (or strike gold with their own projects), but everyone in between is in trouble.
The biggest challenge right now is keeping up with the review workload. For low stakes projects (small single-purpose HTML+JS tools for example) I'm comfortable not reviewing the code, but if it's software I plan to have other people use I'm not willing to take that risk. I have a stack of neat prototypes and maybe-production-quality features that I can't ship yet because I've not done that review work.
I mainly work as an individual or with one other person - I'm not working as part of a larger team.
I also use it as a final check on all my manually written code before sending it for code review.
With all that said, I have this weird feeling that my ability to quickly understand and write code is no longer noticeable, nor necessary.
Everyone now ships tons of code and even if I do the same without any LLM, the default perception will be that it has been generated.
I am not depressed about it yet, but it will surely take a while to embrace the new reality in its entirety
Professionally, I have had almost no luck with it, outside of summarizing design docs or literally just finding something in the code that a simple search might not find. I am yet to successfully prompt it and get a working commit.
Non-professionally, it's amazing how well it does on a small greenfield task.
A couple "win" examples: add in-text links to every term in this paragraph that appears elsewhere on the page, plus corresponding anchors in the relevant page parts. Or, replace any static text on this page with any corresponding dynamic elements from this reference URL.
Lose examples: constant, but edit format glitches (not matching searched text; even the venerable Opus 4.6 constantly screws this up), unnecessary intermediate variables, ridiculously over-cautious exception-handling, failing to see opportunities to isolate repeated code into a function, or to utilize an existing function that exactly implements said N lines of code, etc.
For throw away code, I might let the agent do some stuff. For example, we needed to test timing on DNS name resolution on a large number of systems to try and track down if that was causing our intermittent failures. I let an agent write that and was able to get results faster than if I did it myself, and I ultimately didn’t have to care about the how… I just needed something to show to the network team to prove it was their problem.
For larger projects that need to plugin to the legacy code base, which I’ll need to maintain for years, I still prefer to do things myself, using AI here and there as previously mentioned to help with little things. It can also help finding bugs more quickly (no more spending hours looking for a comma).
I had an agent refactor something I was making for a larger project. It did it, and it worked, but it didn’t write it in a way that made sense to my brain. I think others on my team would have also had trouble supporting it too. It took something relatively simple and added so many layers to it that it was hard to keep all the context in my head to make simple edits or explain to someone else how it worked. I might borrow some of the ideas it had, but will ultimately write my own solution that I think will be easier for other people to read and maintain.
Borrowing some of these ideas and doing it myself also allows me to continue to learn and grow, so I have more tools in my tool belt. With the DNS thing that was totally vibe coded, there were some new things in there I hadn’t done before. While the code made sense when I skimmed through it, I didn’t learn anything from that effort. I couldn’t do anything it did again without asking AI to do it again. Long-term, I think this would be a problem.
Other people on my team have been using AI to write their docs. This has been awful. Usually they don’t write anything at all, but at least then I know they didn’t writing anything. The AI docs are simply wrong, 100% hallucinations. I have to waste time checking the doc against the code to figure that out and then go to the person that did it to make them fix it. Sometimes no doc is better than a bad doc.
What works:
-Just pasting the error and askig what's going on here.
-"How do I X in Y considering Z?"
-Single-use scripts.
-Tab (most of the time), although that doesn't seem to be Claude.
What doesn't:
-Asking it to actually code. It's not going to do the whole thing and even if, it will take shortcuts, occasionally removing legitimate parts of the application.
-Tests. Obvious cases it can handle, but once you reach a certain threshold of coverage, it starts producing nonsense.
Overall, it's amazing at pattern matching, but doesn't actually understand what it's doing. I had a coworker like this - same vibe.
To be clear, this is not vibecoding. I have a strong sense of the architecture I want, and explicitly keep Claude on the desired path much like I would a junior programmer. I also insist on sensible unit and E2E test coverage with every incremental commit.
I will say that after several months of this the signalling between UI components is getting a bit spaghettilike, but that would’ve happened anyway, and I bet Claude will be good at restructuring it when I get around to that.
I also work in a giant Rails monolith with 15 years of accumulated cruft. In that area, I don’t write a whole lot, but CC Opus 4.6 is fantastic for reading the code. Like, ask “what are all the ways you can authenticate an API endpoint?” and it churns away for 5 minutes and writes a nice summary of all four that it found, what uses them, where they’re implemented, etc.
"Implement JWT token verification and role checking in Spring Boot. Secure some endpoints with Oauth2, some with API key, some public."
C# and Java are so old, whatever solutions you find are 5 years out of date. Having an agent implement and verify the foundation is the perfect fit. There's no design, just ever-chaning framework magic. I'd do the same "Google and debug" cycle, but 10 times slower.
I'm learning all the time and it's fun, exasperating, tremendously empowering and very definitely a new world.
I find it the most exciting time for me as a builder, I can just get more things done.
Professionally, I'm dreading for our future, but I'm sure it will be better than I fear, worse than I hope.
From a toolset, I use the usual, Cursor (Super expensive if you go with Opus 4.6 max, but their computer use is game changing, although soon will become a commodity), Claude code (pro max plan) - is my new favorite. Trying out codex, and even copilot as it's practically free if you have enterprise GitHub. I'm going to probably move to Claude Code, I'm paying way too much for Cursor, and I don't really need tab completion anymore... once Claude Code will have a decent computer use environment, I'll probably cancel my Cursor account. Or... I'll just use my own with OpenClaw, but I'm not going to give it any work / personal access, only access to stuff that is publicly available (e.g. run sanity as a regular user). Playing with skills, subagents, agent teams, etc... it's all just markdowns and json files all the way down...
About our professional future:
I'm not going to start learning to be a plumber / electrician / A/C repair etc, and I am not going to recommend my children to do so either, but I am not sure I will push them to learn Computer Science, unless they really want to do Computer Science.
What excites me the most right now is my experiments with OpenClaw / NanoClaw, I'm just having a blast.
tl;dr most exciting yet terrifying times of my life.
I just got started using Claude very recently. I have not been in the loop how much better it got. Now it's obvious that no one will write code by hand. I genuinely fear for my ability to make a living as soon as 2 years from now, if not sooner. I figure the only way is to enter the red queen race and ship some good products. This is the positive I see. If I put 30h/week into something, I have productivity of 3 people. If it's a weekend project at 10h/week, I have now what used to be that full week of productivity. The economics of developing products solo have vastly changed for the better.
One of the things that has helped the most is all the documentation I wrote inside the repository before I started using AI. It was intended for consumption by other engineers, but I think Cursor has consumed it more than any human. I've even managed to make improvements not by having AI update it, but asking AI "What unanswered questions do you have based on reading the documentation?" It has helped me fill in gaps and add clarity.
Another thing I've gotten a ton of value with is having it author diagrams. I've had it create diagrams with both the mermaid syntax and AWSDAC (Diagram-as-Code). I've always found crafting diagrams a painstaking process. I have it make a first pass by analyzing my code + configuration, then make corrections and adjustments by explaining the changes I want.
In my own PRs, I have been in the habit of posting my Cursor Plan document and Transcript so that others can learn from it. I've also encouraged other team members to do the same.
I feel bad for any teams that are being mandated to use a certain amount of AI. It seems to me that the only way to make it work is by having teams experiment with it and figure out how to best use it given their product and the team's capacity. AI is like a pair of Wile-E-Coyote rocket skates. It'll get you somewhere fast, but unless you've cleared the road of debris and pointed in exactly the right direction, you're going to careen off a cliff or into a wall.
This fellow is one of the few mature software engineers I have ever met who is rigorously and consistently productive in a very challenging mature code base year in and year out. or WAS .. yes this is from coughgooglecough in California
I have to very much be in the loop and constantly guiding it with clarifying questions but it has made running multiple projects in parallel much easier and has handled many tedious tasks.
Mostly using Gemini Flash 3 at a FAANG.
It works really well (using Claude Code and Opus 4.6 primarily). Incremental changes tend to be well done and mostly one-shotted provided I use plan mode first, and larger changes are achievable by careful planning with split phases.
We have skills that map to different team roles, and 5 different skills used for code review. This usually gets you 90% there before opening a PR.
Adopting the tool made me more ambitious, in the sense that it lets me try approaches I would normally discard because of gaps in my knowledge and expertise. This doesn't mean blindly offloading work, but rather isolating parts where I can confidently assess risk, and then proceed with radically different implementations guided by metrics. For example, we needed to have a way to extract redlines from PDF documents, and in a couple of days went from a prototype with embedded Python to an embedded Rust version with a robust test oracle against hundreds of document.
I don't have multiple agents running at the same time working on different worktrees, as I find that distracting. When the agent is implementing I usually still think about the problem at hand and consider other angles that end up in subsequent revisions.
Other things I've tried which work well: share an Obsidian note with the agent, and collaboratively iterate on it while working on a bug investigation.
I still write a percentage of code by hand when I need to clearly visualise the implementation in my head (e.g. if I'm working on some algo improvement), or if the agent loses its way halfway through because they're just spitballing ideas without much grounding (rare occurrence).
I find Elixir very well suited for AI-assisted development because it's a relatively small language with strong idioms.
Professionally I hardly use the tools for coding, since I’m in an architecture role and mostly write design docs and do reviews. And I write the occasional prototype.
I have started building tools to integrate copilot (Opus) better with $CORP. This way I can ask it questions across confluence and github.
Leveraging Claude for a project feels very addictive to me. I have to make a conscious effort to stop and I end up working on multiple projects at the same time.
Stack: go, python Team size: 8 Experience, mixed.
I'm using a code review agent which sometimes catches a critical big humans miss, so that is very useful.
Using it to get to know a code base is also very useful. A question like 'which functions touch this table' or 'describe the flow of this API endpoint' are usually answered correctly. This is a huge time saver when I need to work on a code base i'm less familiar with.
For coding, agents are fine for simple straightforward tasks, but I find the tools are very myopic: they prefer very local changes (adding new helper functions all over the place, even when such helpers already exist)
For harder problems I find agents get stuck in loops, and coming up with the right prompts and guardrails can be slower than just writing the code.
I also hates how slow and unpredictable the agents can be. At times it feels like gambling. Will the agents actually fix my tests, or fuck up the code base? Who knows, let's check in 5 minutes.
IMO the worst thing is that juniors can now come up with large change sets, that seem good at a glance but then turn out to be fundamentally flawed, and it takes tons of time to review
People having easy access to LLMs makes this job much harder. LLMs can create what looks at the surface like expert-written code, but suffers from below-the-surface issues that will reveal themselves as intermittent issues or subtle bugs after being deployed.
Inexperienced devs create huge commits full of such code, and then expect me to waste an entire day searching for such issues, which is miserable.
If the models don't improve significantly in the future, I expect that most high-stakes software teams will fire all the inexperienced devs and have super-experienced engineers work with the bots directly.
It’s a lot of fun for exploring ideas. I’ve built things very fast that I would not have done at all otherwise. I have rewritten a huge chunk of semi-outdated docs into something useful with a couple of Prompts in a day. Claude does all the annoying dependency update breaks the build kinds of things. And the reviews are extremely useful and a perfect combination with human review as they catch things extremely well that humans are bad at catching.
But in the production codebase changes must be made with much more consideration. Claude tends to perform well ob some tasks but for other I end up wasting time because I just don’t know up front how the feature must look so I cannot write a spec at the level of precision that claude needs and changing code manually is more efficient for this kind of discovery for me than dealing with large chunks of constantly changing code.
And then there’s the fact that claude produces things that work and do the thing described in the prompt extremely well but they are always also wring in sone way. When I let AI build a large chunk of code and actually go through the code there’s always a mess somewhere that ai review doesn’t see because it looks completely plausible but contains some horrible security issue or complete inconsistency with the rest of the codebase or, you know, that custom yaml parser nobody asked for and that you don’t want your day job to depend on.
I run a small lab that does large data analytics and web products for a couple large clients. I have 5 developers who I manage directly, I write a lot of code myself and I interact directly with my clients. I have been a web developer for long enough to have written code in coldfusion, php, asp, asp.net, rails, node and javascript through microsoft frontpage exports, to jquery,to backbone, angular and react and in a lot of different frameworks. I feel this breadth of watching the internet develop in stages has given me a decent if imperfect understanding of many of the tradeoffs that can be made in developing for the web.
My work lately is on an analytics / cms / data management / gis platform that is used by a couple of our clients and we've been developing for a couple of years before any ai was used on it all. Its a react front end built on react-router-7 that can be SPA or SSR and a node api server.
I had tried AI coding a couple times over the past few years both for small toy projects and on my work and it felt to me less productive than writing code by hand until this January when I tried Claude Code with Opus 4.5. Since then I have written very few features by hand although I am often actively writing parts of them, or debugging by hand.
I am maybe in a slightly unique place in that part of my job is coming up with tasks for other developers and making sure their code integrates back, I've been doing this for 10 years plus, and personally my sucess rate with getting someone to write a new feature that will get use is maybe a bit over 50%, that is maybe generous? Figuring out what to do next in a project that will create value for users is the hard part of my job whether I am delegating to developers or to an AI and that hasn't changed.
That being said I can move through things significantly faster and more consistently using AI, and get them out to clients for testing to see if they are going to work. Its also been great for tasks which I know my developers will groan if I assign to them. In the last couple months I've been able to
- create a new version of our server that is free from years of cruft of the monorepo api we use across all our projects. - implement sqlite compatablity for the server (in addition to original postgres support) - Implement local first sync from scratch for the project - Test out a large number of optimization strategies, not all of which worked out but which would have taken me so much longer and been so much more onerous the cost benefit ratio of engaging them would have been not worth it - Tons of small features I would have assigned to someone else but are now less effort to just have the AI implement.
I think the biggest plus though is the amount of documentation that has accrued in our repo since using starting to use these tools. I find AI is pretty great at summarizing different sections of the code and with a little bit of conversation I can get it more or less exactly how I want it. This has been hugely useful to me on a number of occasions and something I would have always liked to have been doing but on a small team that is always under pressure to create results for our clients its something that didn't cross the immediate threshold of the cost benefit ratio.
In my own use of AI, I keep the bottleneck at my own understanding of the code, its important to me that I maintain a thorough understanding of the codebase. I couple possibly go faster by giving it a longer leash, but that trade off doesn't seem wise to me at this point, first because I'm already moving so much faster than I was very recently and secondly because it doesn't seem very far from the next bottleneck which is deciding what is the next useful thing to implement. For the most part, I find the AI has me moving in the right direction almost all the time but I think this is partly for me because I am already practiced in communicating the programmers what to implement next and I have a deep understanding of the code base, but also because I spend more than half of the time using AI adding context, plans and documentation to the repo.I have encouraged my team to use these tools but I am not forcing it down anyone's throat, although its interesting to give people tasks that I am confident I could finish much quicker and much more to my personal taste than assigning it. The reactions from my team are pretty mixed, one of the strongest contributors doesn't find a lot of gains from it. One has found similar productivity gains to myself, others are very against it and hate it.
I think one of the things it will change for me is, I can no longer just create the stories for everyone, learning how to choose on what to work on is going to be the most important skill in my opinion so over the next couple months I am going to be shifting so everyone on my team has direct client interactions and I am going to try to shift away from writing stories to having meetings where I help them decide on their own what to work on. Still part of the reason that I can afford to do this is because I can now get as much or more work done than I was able to with my whole team at this time last year.
That's a big difference in one way, and I am optimistic that the platform I am working on will be a lot better and able to compete with large legacy platforms that it wouldn't have been able to compete with in the past, but still it just tightens the loop of trying new things and getting feedback and the hardest part of the business is still communication with clients and building relationships that create value.
Last year I was working on implementing a pretty big feature in our codebase, it required a lot of focus to get the business logic right and at the same time you had be very creative to make this feasible to run without hogging to much resources.
When I was nearly done and worked on catching bugs, team members grew tired of waiting and starting taking my code from x weeks ago (I have no idea why), feeding it to Claude or whatever and then came back with a solution. So instead of me finishing my code I had to go through their version of my code.
Each one of the proposals had one or more business requirements wrong and several huge bugs. Not one was any closer to a solution than mine was.
I had appreciated any contribution to my code, but thinking that it would be so easy to just take my code and finishing it by asking Claude was rather insulting.
They simplify discrete tasks. Feature additions, bug fixes, augmenting functionality.
They are incapable of creating good quality (easily expandable etc) architecture or overall design, but that's OK. I write the structs, module layout etc, and let it work on one thing at a time. In the past few days, I've had it:
- Add a ribbon/cartoon mesh creator
- Fixed a logical vs physical pixel error on devices where they were different for positioning text, and setting window size
- Fixed a bug with selecting things with the mouse under specific conditions
Overall, great tool! But I think a lot of people are lying about its capabilities.
Treat it like an intern, give it feedback, have it build skills, review every session, make it do unit tests. Red green refactor. Spend time up front reviewing the plan. Clearly communicate your intent and outcomes you want. If you say "do x" it has to guess what you want. If you say "I want this behaviour and this behaviour, 100% branch unit tested, adhearing to contributing guidelines and best practices, etc" it will take a few minutes longer, but the quality increases significantly.
I uninstalled vscode, I built my own dashboard instead that organizes my work. I get instant notifications and have a pr review kick off a highly opinionated or review utilizing the Claude code team features.
If you aren't doing this level of work by now, you will be automated soon. Software engineering is a mostly solved problem at this point, you need to embed your best practices in your agent and keep and eye on it and refine it over time.
Important things I've figured out along the way:
1. Enable the agent to debug and iterate. Whatever you'd do to test and verify after you write your first pass at an implementation, figure out a way for an agent to do it too. For example: every API call is instrumented with OpenTelemetry, and the agent has a local collector to query.
2. Make scripts or skills to increase the reliability of fallible multi-step processes that need to be repeated often. For example: getting an oauth token to call some api with the appropriate user scopes for the task.
3. Continually revise your AGENTS.md. I'll often end a coding session by asking the agent whether there's anything from this session that should be captured there. That adds more than it removes, so every few days I'll compact it by having an agent reword the important stuff for conciseness and get rid anything obvious from implementation.
I have a lot of worry that I will end up having to eventually trudge through AI generated nightmares since the major projects at work are implemented in Java and Typescript.
I have very little confidence in the models' abilities to generate good code in these or most languages without a lot of oversight, and even less confidence in many people I see who are happy to hand over all control to them.
In my personal projects, however, I have been able to get what feels like a huge amount of work done very quickly. I just treat the model as an abstracted keyboard-- telling it what to write, or more importantly, what to rewrite and build out, for me, while I revise the design plans or test things myself. It feels like a proper force multiplier.
The main benefit is actually parallelizing the process of creating the code, NOT coming up with any ideas about how the code should be made or really any ideas at all. I instruct them like a real micro-manager giving very specific and narrow tasks all the time.
This year I grudgingly bit the bullet and began using AI tools, and to my dismay they've been a pretty big boon for me, in this case. Not just for code generation - they're really good at probing the monolith and answering questions I have about how it works. Before I'd spend days pouring over code before starting work to figure out the right way to build something or where to break in, pinging people over in India or eastern Europe with questions and hoping they reply to me overnight. AI's totally replaced that, and it works shockingly well.
When I do fall back on it for code generation, it's mostly just to mitigate the tedium of writing boilerplate. The code it produces tends to be pretty poor - both in terms of style and robustness - and I'll usually need to take at least a couple of passes over it to get it up to snuff. I do find this faster than writing everything out by end in the end, but not by a lot.
For my personal projects I don't find it adds much, but I do enjoy rubber ducking with ChatGPT.
In my day job I’m currently a PM/operations director at a small company. We don’t have programmers. I have used AI to build about 12 internal tools in the past year. They’re not very big, but provide huge productivity gains. And although I do not fully understand the codebase, I know what is where. Three of these tools I’m now recreating based on our usage and learnings.
I have learned a ton about all kinds of development concepts in a ridiculously short timeframe.
When it comes to personal projects I'm feeling extremely unmotivated. Things feel more in reach and I've probably built ten times the number of throwaway projects in the past year than I have in previous years. Yet I feel no inspiration to see those projects through to the end. I feel no connection to them because I didn't build them. I have a feeling of 'what's the point' publishing these projects when the same code is only a few prompts away for someone else too. And publishing them under my name only cheapens the rest of my work which I put real cognitive effort into.
I think I want to focus more on developing knowledge and skills moving forward. Whatever I can produce with an LLM in a few hours is not actually valuable unless I'm providing some special insight, and I think I'm coming to terms with that at the moment.
- a web-based app for a F500 client for a workflow they’ve been trying to build for 2 years; won the contract
- built an iPad app for same client for their sales teams to use
- built the engineering agent platform that I’m going to raise funding
- a side project to do rough cuts of family travel videos (https://usefirstcut.com, soft launch video: https://x.com/xitijpatel/status/2026025051573686429)
I see a lot of people in this thread struggling with AI coding at work. I think my platform is going to save you. The existing tools don’t work anymore, we need to think differently. That said, the old engineering principles still work; heck, they work even better now.
So now a lot of different parts of the company are trying to replicate their workflow. The process is showing what works, you need to have AI first documentation (readme with one line for each file to help manage context), develop skills and steering docs for your codebase, code style, etc,. And it mostly works!
For me personally, it has drastically increased productivity. I can pick up something from our infinitely huge backlog, provide some context and let the agent go ham on fixing it while i do whatever other stuff is assigned to me.
I am a data engineer maintaining a big data Spark cluster as well as a dozen Postgres instances - all self hosted.
I must confess it has made me extremely productive if we measure in terms of writing code. I don't even do a lot of special AGENTS.md/CLAUDE.md shenanigans, I just prompt CC, work on a plan, and then manually review the changes as it implements it.
Needless to say this process only works well because: A) I understand my code base. B) I have a mental structure of how I want to implement it.
Hence it is easy to keep the model and me in sync about what's happening.
For other aspects of my job I occasionally run questions by GPT/Gemini as a brainstorming partner, but it seems a lot less reliable. I only use it as a sounding board. I does not seem to make me any more effective at my job than simply reading documents or browsing github issues/stack overflow myself.
- Think about requirement
- Spend 0-360 minutes looking through the code
- Start writing code
- Realize I didn't think about it quite enough and fix the design
- Finish writing code
- Write unit tests
- Submit MR
- Fix MR feedback
Until recently no LLM was able to properly disrupt that, however the release of Opus 4.5 changed that.
Now my workflow is:
- Throw as much context into Opus as possible about what I want in plan mode
- Spend 0-60 minutes refining the plan
- Have Opus do the implementation
- Review all the code and nitpick small things
- Submit MR
- Implement MR feedback
I'm building out large multi-repo features in a 60 repo microservice system for my day job. The AI is very good at exploring all the repos and creating plans that cut across them to build the new feature or service. I've built out legacy features and also completely new web systems, and also done refactoring. Most things I make involve 6-8 repos. Everything goes through code review and QA. Code being created is not slop. High quality code and passes reviews as such. Any pushback I get goes back in to the docs and next time round those mistakes aren't made.
I did a demo of how I work in AI to the dev team at Math Academy who were complete skeptics before the call 2 hours later they were converts.
My prompts end to be in the pattern of "I am looking to implement These days I'm on Claude Code, and I do that first part in Plan mode, though even a few months ago on earlier, not-as-performant models and tools, I was still finding value with this approach. It's just getting better, as the company is investing in shared skills/tools/plugins/whatever the current terminology is that is specific to various use cases within the code base. I haven't been writing so much code directly, but I do still very much feel that this is my code. My sessions are very interactive -- I ask the agent to explain decisions, question its plans, review the produced code and often revise it. I find it frees me up to spend more time thinking through and having higher level architecture applied instead of spending frustrating hours hunting down more basic "how does this work" information. I think it might have been an article by Simon Willison that made the case for there being a way to use AI tooling to make you smarter, or to make you dumber. Point and shoot and blindly accept output makes you dumber -- it places more distance between you and your code base. Using AI tools to automate away a lot of the toil give you energy and time to dive deeper into your code base and develop a stronger mental model of how it works -- it makes you smarter. I keep in mind that at the end of the day, it's my name on the PR, regardless of how much Claude directly created or edited the files.
I find the most use from it as a search engine the same way I’d google “x problem stackoverflow”.
When I was first tasked with evaluating it for programming assistance, I thought it was a good “rubber duck” - but my opinion has since changed. I found that if I documented my goals and steps, using it as a rubber duck tended to lead me away from my goals rather than refine them.
Outside of my role they can be a bit more useful and generally impressive when it comes to prompting small proof of concept applications or tools.
My general take on the current state of LLMs for programming in my role is that they are like having a junior engineer that does not learn and has a severe memory disorder.
I have also done the agentic thing and built a full CLI tool via back-and-forth engagement with Claude and that worked great - I didn't write a single line of code. Because the CLI tool was calling an API, I could ask Claude to run the requests it was generating and adjust based on the result - errors, bad requests etc, and it would fairly rapidly fix and coalesce on a working solution.
After I was done though, I reckon that if instead of this I had just done the work myself I would have had a much smaller, more reliable project. Less error handling, no unit tests, no documentation sure, but it would have worked and worked better - I wouldn't need to iterate off the API responses because I would have started with a better contract-based approach. But all of that would have been hard, would have required more 'slow thinking'. So... I didn't really draw a clean conclusion from the effort.
Continuing to experiment, not giving up on anything yet.
Some senior people that were in the AI pilot, have been using this for a while, and are very into it claimed that it can open PRs autonomously with minimum input or supervision (with a ton of MD files and skills in repos with clear architecture standards). I couldn't replicate this yet.
I'm objectively happy to have access to this tool, it feels like a cheat code sometimes. I can research things in the codebase so fast, or update tests and glue code so quickly that my life is objectively better. If the change is small or a simple bugfix it can truly do it autonomously quicker than me. It does make me lazier though, sometimes it's just easier to fire up claude than to focus and do it by myself.
I'm careful to not overuse it mostly to not reach the montlhy cap, so that I can "keep it" if something urgent or complex comes my way. Also I still like to do things by hand just because I still want to learn and maintain my skills. I feel that I'm not learning anything by using claude, that's a real thing.
In the end I feel it's a powerful tool that is here to stay and I would be upset if I wouldn't have access to it anymore, it's very good. I recently subscribed to it and use it on my free time just because it's a very fun technology to play with. But it's a tool. I'm paid because I take responsability that my work will be delivered on time, working, tested, with code on par with the org quality standards. If I do it by hand or with claude is irrelevant. If i can do it faster it will likely mean I will receive more work to do. Somebody still has to operate Claude and it's not going to be non-technical people for sure.
I genuinely think that if anyone still believes today that this technology is only hype or a slop machine, they are in denial or haven't tried to use a recent frontier model with the correct setup (mostly giving the agent a way to autonomously validate it's changes).
- create unit tests and benchmark tests that required lots of boiler plate , fixtures
- add CI / CD to a few projects that I didn't have motivation to
- freshen up old projects to modern standards (testing, CI / CD, update deps, migrations/deprecations)
- add monitoring / alerting to 2 projects that I had been neglecting. One was a custom DNS config uptime monitor.
- automated backup tools along with scripts for verifying recovery procedure.
- moderate migrations for deprecated APIs and refactors within cli and REST API services
- auditing GCP project resources for billing and security breaches
- frontend, backend and offline tiers for cloud storage management app
https://burakku.com/blog/tired-of-ai-coders/
I think the addendum to that is that I've since left.