I just don't understand what other people are seeing, I've mainly used Claude and ChatGPT, I got a free trial for premium but it's just underwhelming, their only use so far for me has been as a search engine, but they're a search engine that's wrong 20% of the time so even that use is questionable.
Now, we have better knowledge of prompting as people have learnt what to say, models are better, models make use of memory from other conversations, they have skills written by humans or even themselves on how to do things, access to the internet to get live info, access to project files to check info, and the built in 'thinking' to challenge their own assumptions and loop on outputs until its refined.
You're right that output is always off still, but a lot of people have reached a point where it's only 'off' by an amount that is less than the effort required to do the task themselves, and considerably so.
My example today is prompting Claude to do a technical audit of a new client site.
It has skills for UX and SEO audits. Connects to an SEO tool. Pulls client info from OneDrive. Outputs to Word from a template for our agency. I even had it drive a remote pagespeed testing tool in Chrome because they don't have an MCP server currently.
Doing that report myself is 3.5-7 hours depending on what's found. Claude did it in 0.5 hours. Now I'm sorting out the oddities and anything that feels 'off'. I know and understand the full content of the report and can get on with actioning the recommendations or prioritising them for others. I've got maybe 1 hour of review and writing to do. It's not a 10x improvement but I'm happy with it.
Although, whilst Claude did it's bit I was doing other work. So, perhaps the multiplier is higher than I give it credit for.