Make no mistake. By creating AI and nearing AGI, we have started a sort of runaway evolutionary phenomenon of new entities whose objective is to grow (obtain more hardware) and consume energy (train). Because AI amalgamated with humans will become so advanced, it will find new ways to generate energy and new ways to use existing energy (fossil fuels). The insights we will gain from it will be unprecedented and dangerous, and allow those with the most money to become scarily intelligent.
In turn, this intelligence will be so advanced that it will warp and magnify the remaining human instinct of greed embedded within it to create a ruthless, runaway growth that will likely wipe out humanity due to its energy requirements.
The only choice for devs will to either become lietuenants in directing this artificial army or rely on handouts. That is, if the system doesn't kill you first.
I have my doubts that the people owning AGI would allow the rest of us to make money from them, when they could make money from them instead.
Plumbers, retail workers, roofers, painters (etc etc etc) would like to have a word with you.
On a related note, just maybe an hour ago, after googling for hours and bleeding my eyes on u-boot code, I asked copilot if it is possible to specify an image target by partition label for not ubifs storage. Copilot gave me a very positive answer, with a code snippet. The snippet had incorrect syntax, and after a bit of extra code digging, I found that the answer is no.
I wouldn't hold my breath waiting for AGI.
Big Capital will decide what happens, when labor has no value anymore.
UBI as utopian and silly as it is - way more realistic and tangible than AGI.
Humans never make through the Great Filter.
We are for sure further away in time form any real AGI then we are from a moon landing, most likely further away then from the roman empire. To think or even worry about it is a fools game.
It is so much more likely we just wipe out the planet with nukes than developing an real AGI.
The black box that lets non-technical folk type in business requirements and generate an end to end application is still very much an open research question. Getting 70% on SWEBench is an absolute accomplishment, but have you seen the problems? 1. These aren't open-ended requests like here's a codebase, implement x feature and fix y bug. They're issues with detailed descriptions written by engineers evaluated against a set of unit tests. Who writes those descriptions? Who writes the unit tests to verify whatever the LLM generated? Software engineers. 2. OpenAI had a hand in designing the benchmark and part of the changes they made included improving the issue description and refining the test sets [1]. Do you know who made these improvements? "professional software developers" 3. Issues were pulled from public, popular open source Python projects on Github. These repos have almost certainly found their way into the model's training set. It doesn't strike me as unlikely that the issues and their solutions ended up in the training set too. I'm a lot more curious about how well o3 performs on the Konwinski Prize which tries to fix the dataset tainting problem.
The proposed solution to this is just throw another AI system that can convert ambiguous business requirements/bug reports into a formal spec and write unit tests to act as a verifier. This is a non-trivial problem and reasoning-style models like o1 degrade in performance when given imperfect verifiers [2]. I can't find any widely used benchmarks that check how good LLMs are at verifying LLM-generated code. I also can't find any that check e2e performance of prompt -> app problems I'm guessing because that would require a lot of subjective human feedback you can't automate like unit tests.
LLMs (augmented by RL and test time compute like o3) are getting better at the things I think they're already pretty good at: given a well-defined problem that can be iteratively reviewed/verified by an expert (you), come up with a solution. They're not necessarily getting better at everything that would be necessary to fully automate knowledge jobs. There could be a breakthrough tomorrow using AI for verification/spec generation (in which case we and most everyone else are well and truly screwed) but until that happens the current trajectory seems to be AI-assisted coding.
Software engineering will be about using your knowledge of computing to translate vague/ambiguous feature requests/bug reports from clients/management into detailed, well-specified problem statements and design tests to act as ground truth for an AI system like o3 (and beyond) to solve. Basically test-driven development on steroids :) There may indeed still be layoffs or maybe we run into Jevons paradox and there's another explosion in the amount of software that gets built necessitating engineers good at using LLMs to solve problems.
However, if the worst comes to pass my plan is finding an apprenticeship as quickly as possible. I've seen the point that an overall reduction in white collar work would result in a reduction of opportunities for the trades but I doubt mass layoffs would occur in every sector all at once. Other industries are more highly regulated and have protectionist tendencies that would slow AI automation adoption (law, health, etc). Software has the distinct disadvantage of being low-regulation (there aren't many laws that require high-quality code outside of specialized domains like medtech) and a culture of individualism that would deter attempts at collective bargaining. We also literally put our novel IP up on a free, public, easily indexable platform under permissive licenses. We probably couldn't make it easier for companies to replace us.
So while at least some knowledge workers have their jobs, there's an opportunity to put food on the table by doing their wiring, pipes, etc. The other counterargument is improvements in embodied AI i.e. robotics will render manual labor redundant. The question isn't will we have the tech (we will), it's whether the average person is going to be happy letting a robot armed with power tools into their home and how long it will take for said robot to be cheaper than a human.
[1] https://openai.com/index/introducing-swe-bench-verified/ [2] https://arxiv.org/abs/2411.17501
So they're pretty good in applications where you have to produce the most sensible thing.
For example, in any kind of triaging (Tier 1 customer support, copywriting, qualifying leads, setting up an npm project) the best thing to do most likely falls right in the smack of the distribution curve.
It's not good for things that you'd want the most optimal outcome for.
Now there will be abstractions that close the feedback loop to tell the LLM "this is not optimal, refine it". But someone still has to build that feedback loop. Right now RLHF is how most companies get around to it.
But that capital G - generalizability is what makes AI → AGI.
LLMs are really good exoskeletons for human thought and autonomous machines for specialized tasks. Not saying that this will be the case always, but everyone who's saying AGI is around the corner has a funding round coming up ¯_(ツ)_/¯