If LLMs are so useful, why haven't we seen any spike in productivity?

Question

If LLM do actually help engineers become significantly more productive what could explain that, for instance, in the open source community:- We are not fixing bugs faster- We are not developing features faster- We haven't seen an explosion of new projects- We haven't seen an explosion of vulnerabilities being discoveredMaybe I am missing something but to me everything looks the same (except for an increasing amount of useless customer service chatbots and garbage LLM generated books on Amazon)Edit: Unfortunately this submission was demoted for some reason but thanks for all the comments.

proc0 · Accepted Answer

Writing code is the easiest part of the process (relatively speaking). Figuring out the requirements, working with stakeholders to drive consensus, and understanding user needs is the bulk of the work.LLMs will certainly lower the entry barriers for new programmers, and might also create a new solopreneur economy because of it. Now non-technical people with ideas can start prototyping and raise money, but would soon need engineers to grow the product.

sixhobbits · Answer

Technology spreads slowly. google docs is an instant 50x productivity increase for any legal process and yet a few years ago I saw an advocate's mind blown by a simple demonstration of simultaneous editing from two people in the same affidavit.
For him, the norm is still to redline a document on paper, and have his secretary add those changes to the original digital document and have that sent over to the opposing team for the same treatment.
I don't have strong opinions about LLMs' coding ability (though compared to the other comments so far I am more on the "LLMs are pretty good at creating software from natural language descriptions" side) but even assuming that LLMs can give programmers a 50x productivity increase, I'd assume it would take 10-50 years for industry and processes to evolve to take advantage of that increase.

leshokunin · Answer

Kudos for raising an empirical point rather than looking at the aspirations of the tech. It's hard to have that kind of look.
Jury's still out. It will take time until we have enough post mortems to tell if it is doing the job and how it's affecting things.
I do agree that if it was so good, we'd see practical applications ib more meaningful ways than just anecdotal tricks or lots of low quality content.

dewey · Answer

Turns out the bottleneck of engineering isn't related to what goes on in the editor.

Tepix · Answer

What statistics are you referring to when making these claims?
Github hosts only 20% public repositories. Perhaps open source developers are less likely to have Github Copilot paid out of their own pocket?
Why do you expect "an explosion of new projects" with perhaps 20% of increased productivity? What percentage of open source developers are using LLMs for increased productivity when working on open source? If it's merely 20%, we'd see a 4% increase, something that's hardly noticeable.

dave4420 · Answer

My employer pays GitHub ~&pound;10/month for me to use GitHub&rsquo;s copilot. This is tiny compared to what they pay me.It unlocks a small amount of extra productivity, but not that much. Yet still enough to be worth it.My position is that they are useful but not massively useful, yet.

Mc91 · Answer

LLMs have been getting better - they were all pretty poor for my programming purposes a year or so ago, recently Perplexity (even the non-Pro version) and GPT4 have been helpful, and 4o is even better. I have been posting Leetcode hard problems into 4o and getting sensible outputs, something I didn't even try previously. Sometimes I do have to have it go through a few iterations, and I give it various qualifications (like keep to such-and-such time and space complexity or better). My usual instruction is to make the class or function more and more compact while keeping to the same functionality and time/space complexity.
I got 4o to give me a 33 line, relatively simple and understandable bidirectional BFS Kotlin function for this Leetcode problem which Perplexity (non-Pro) and GPT4 could solve, but not as well as 4o - https://leetcode.com/problems/word-ladder
Of course, even though these are Leetcode hard level problems, they are well-defined and relatively self-contained. I work at a Fortune 100 company and 99% of the time I can pound out the CRUD I do in my sleep - the difficulties I encounter are distractions, the CI server having some problem, the ticket/story I am working out not being fully specified and the PM is MIA that day, all teams are working on the feature at the same time and I need to find out what feature flags to have set and which test headers have been agreed on, the PM has asked me to work on something but some of what he says does not make sense in context so I have to ask for clarification etc. Then there's the meta-game of knowing what to prioritize, with one important component being what will make my manager happy so I get a good yearly review, and what I need to prioritize may differ from what my PM says to prioritize, or even more complexly, what my manager says to prioritize, but doesn't really mean.

j0hnyl · Answer

I actually think all of the things you listed are absolutely happening.

returnInfinity · Answer

Youtube, Instagram are flooded with AI generated content.
I believe AI will be useful in Game Dev. AI voice acting, AI face generation. This way all the NPCs will be unique. Possibly AI layout generation.
I don't think using AI to generate script is great use case. It can be used to generate ideas. But still we need human creativity to make great games.

tbrownaw · Answer

> significantly more productiveTry a couple percent. More if you type slowly (magic autocomplete). More if you're doing something where you need to search q&a fora a lot.

viraptor · Answer

Apart from a few things already mentioned, I don't believe developers are really trying to engage with the LLM development. Some are just completely opposed to it ethically, some think it's not good enough and don't try, some don't care, some were not satisfied before. There's quite a few people using copilot, but that's the most basic and simple approach.
I don't personally know anyone trying to use more fancy tools like agents or ide-integrated helpers. They're not perfect by any means and you actually need to learn how to use them well, but the difference is massive. I've definitely saved some hours when developing smaller scope tools. It's not a time save that would drastically change my total productivity, but... it exists and it's going to increase in the future. And it requires upfront investment into the tooling and learning that few people seem to be interested in.
But even given current issues, how can you tell there hasn't been an improvement? How would you be able to tell across all the open source in the world?

me_here_alone · Answer

Many big companies are investing in AI to replace people, not fix problems. Try getting anything internal from a human anymore. It's all internal bots. It's easier to sell when you can say "We can replace 90% of your HR department" vs "It will help you find bugs and develop features faster". I'm a bit cynical, but I see it happening everyday.

spacebanana7 · Answer

What data are you using for those metrics? A 5% improvement in the time to fix bugs or develop features might not be immediately obvious.

thiht · Answer

As a developer, I do try to use LLMs in my daily work, and I get a slight productivity boost, but I wouldn&rsquo;t say it&rsquo;s a spike. It helps me write tests faster, it helps me write pure functions faster, and it helps me autocomplete some documentation faster. It also helps me go faster with 1-time fixes, for db data migration queries or stuff like that.That said, LLMs in code editor come with some kind of "hyperactivity" which I find really unpleasant. They&rsquo;re too "in your face", make the code move a lot and sometimes make it a bit harder to focus than without LLM. They can also be extremely frustrating and result in productivity loss, for example when they generate code that&rsquo;s slightly wrong and you need to take some time to fix it. It&rsquo;s harder than just writing the code.

halfcat · Answer

Bottlenecks most often stem from lack of clarity.
The business doesn’t have clarity on what they are trying to achieve. Or they don’t have clarity on what’s important, and constantly change priority (and both of these can cause the most talented engineer to spin their wheels).
LLMs can help gain clarity, the same way a coach, consultant, or therapist can help you work through a scenario. But it’s only as effective as the work you’re willing to put into that endeavor.
So it comes down to:
* Nothing has changed regarding the nature of human work ethic
* Most people don’t want to be a programmer. The idea that ”everyone’s a programmer now” is no different than saying ”everyone’s a carpenter now” because power tools exist. Most people don’t want to do that kind of work and are happy to pay someone else to do it.

twojacobtwo · Answer

Profits? I mean, I'm guessing the trend can't be wholly disconnected from the massive layoffs in the tech sector over the last year or so. Probably not the primary driver, but companies are always looking to maximize profits and, in the short-term, what faster way is there than by making cuts?
If a business sees a 15% productivity boost coming, especially with no easy plan in place to to utilize it fully for equivalent profit, someone near the top is already thinking that quick cuts could be an immediate 15% increase in reported profits for next quarter (in a 1:1 scenario).
I'm being a bit simplistic, but I think the general idea of business maximizing profits over output stands (or easy short-term thinking over more difficult long-term planning).

adpirz · Answer

- Some early studies have shown modest gains (can try and link later)
- It’s still very early. LLMs have only been publicly available for 2 years, copilots a little less than that.
- It’s mostly anchored on cold starts ie I’m creating something from scratch. Leveraging LLMs in existing and mature codebases is definitely going to pick up.
- The majority of devs aren’t really using these tools or using them to their full ability. It takes a lot of fiddling to understand the limits and strengths, but when you do, you basically stop writing code and write more prose.
I will be surprised if in ten years even a quarter of your keyboard inputs will be towards code directly vs directing your friendly coding robot.

franciscop · Answer

How do you expect to "see those"? I have e.g. started using LLM (in a limited manner) to help me write TS Definitions for my OSS projects. I've also used gen AI to create art for multiple of my projects as well. But those haven't "unlocked" my productivity 10x, they have just allowed me to have a bit more of free time/less headaches.But I still love programming and will mostly continue to do so when it's for fun, which is most of my OSS. For me it's like saying "why do you do woodworking when you can outsource it to some Chinese shop?" when it defeats the point.

grahamj · Answer

The technology still in its infancy and people's ability to harness it even moreso.Patience.

n_ary · Answer

There are indeed productivity gains, but those are more scattered to quantify.
Here are some significant productivity gains I get from Mistral/Phind/ChatGPT/office-internal-llm daily.
- throw a messy shell script and ask it to refactor it(works 80% of the time)
- put a sample xml/json/yaml and ask it to generate the class/struct (code generation)
- ask questions and it gives immediate response with example more well suited to my need (previously took time to go into SO/Reddit/SE etc and scroll through several posts, docs or even waste time reading blogspams )
- ask questions about specific topic and get immediate response and citations(this is inhouse trained model) instead of fighting with broken search or ocean of messy documents in Confluence/Notion/Gitlab Pages and what not
- rubber duck when brainstorming a problem(it can sometimes lead to interesting outcomes)
- prepare a bash script to do something and then I simply modify/correct/refine it to fit my needs
- questions about trivial stuff
- generate boiler plates
- generate a throw away project to try something fast
- convert from one language to another(need to work with different teams using different languages such as TS/Java/C++/Scala/Python/Shell/Rust/Erlang etc)
- write a polite email(or response to) which I can copy paste and send when I am too occupied with something else
- documentation of specific feature of something which would take a lot of digging in the original docs
- generate a pure self-contained html/css prototype to send to our UI/UX team to give them an idea of particular concept
- summarize large block of text into bullet forms(useful for presentations)
- get summaries of popular books(because chatgpt has indeed trained on a lot of them somehow!)
- translate a text to another language(works well when it does but still needs some corrections)
Most of these activities save me a lot of time which would previously need some big time investments.

pogue · Answer

According to Gartner, there is increased speed in tasks. However, they claim that the workers will use that time for leisure instead of working harder, what they term as "productivity leakage". So much for AI making our lives easier, right?
Source:
https://www.theregister.com/2024/09/09/gartner_synmposium_ai...

kledru · Answer

My personal experience suggests that* we are not developing features faster, but we have time for asking questions we had no time to ask before. More ambitious architectures and designs.* we are fixing bugs faster and we are producing less bugs (because of better designs).* not everybody is happy about less bugs.* we discover more vulnerabilities. Again, not everybody is happy about that, they just want new features, not new knowledge of vulnerabilities and technical debt.

CuriouslyC · Answer

On the B2C side, there's a disconnect because the app developer is typically on the hook for inference costs. This incentivizes them to cut corners so as to minimize the cost of that AI, rather than use it creatively and use high quality models. The apps that let users bring their own keys are much more innovative in this space, but the amount of friction involved in transferring keys keeps most apps from adopting that approach.

bluGill · Answer

They are useful in niches. I got mad at copilot suggesting the wrong thing and went to an ide without it the other day. AST based code completion works much better, copilot gave answers that looked good but were wrong. Worse I needed to hit tab in one place and it kept completing something worng instead of just adding the tab needed there.

glimshe · Answer

Speak for yourself - I'm coding at least 30% faster than before because of LLMs. I don't use it for generating code, but as a reference and for brainstorming ideas. Your expectations are perhaps too high by wanting to see an "explosion", but the productivity increase was very clear in my case.

throwawa14223 · Answer

I think you're correct. The positives from LLMs are an illusion but the drawbacks are real but insidious.

fire_lake · Answer

The productivity studies that most people cite were done by GitHub, which has an obvious agenda to promote Copilot. The productivity gains are very marginal and much less than claimed.

alphabettsy · Answer

Writing code has never been the barrier to productivity for me. It&rsquo;s all the other businesses and development processes plus distractions.

patagnome · Answer

outside of the company walls as the context for empirical measure:
1. is the web becoming more [accessible](https://abilitynet.org.uk/news-blogs/inaccessible-websites-k... http://useragentman.com/wcag-wishlist/)?
2. are the web pages getting [faster](https://www.nngroup.com/articles/the-need-for-speed/) and lighter?
3. is it righting wrongs about existing non-performant [code](https://www.webperf.tips/tip/cached-js-misconceptions/)?
4. is it encouraging [smaller](https://dyf-tfh.github.io/)?
5. is it promoting historical [insights](https://qntm.org/clean)?
6. is it popping [bubbles](https://www.youtube.com/watch?v=Y7YAXUWG820)?
7. is it encouraging the correct interpretations of actual [innovators](https://mamund.site44.com/articles/objects-v-messages/index....)?
8. is it minimizing or eliminating [traps](https://www.gnu.org/philosophy/javascript-trap.html)? (also see the W3C's Web Sustainability Guideline's on javascript fallbacks)
9. is it avoiding the "[wars](https://tanzu.vmware.com/content/blog/framework-wars-now-all...)"?
10. is it shedding the object-form? https://dreamsongs.com/ObjectsHaveFailedNarrative.html

heavyset_go · Answer

Just wait until we release the next model!

orph · Answer

Everyone is a decent programmer now who can solve nearly any problem with help from LLM.

If LLMs are so useful, why haven't we seen any spike in productivity?

Turns out the bottleneck of engineering isn't related to what goes on in the editor.

My employer pays GitHub ~£10/month for me to use GitHub’s copilot. This is tiny compared to what they pay me.
It unlocks a small amount of extra productivity, but not that much. Yet still enough to be worth it.
My position is that they are useful but not massively useful, yet.