Has ChatGPT gotten worse at coding for anyone else?
I used it for coding in Python, often with the python-docx library, about six weeks ago, and it was superb. It gave me exactly what I wanted, which is no mean feat for a semi-obscure little library, and I was delighted. Then I tried it again a few weeks ago and it did worse than before, but I thought maybe it was just bad luck. Using it today, though, it seemed really really bad and it messed up some very basic Python features, like the walrus operator -- it got so bad that I gave up on it and went back to google and stack overflow.
The performance drop is so steep that I can only imagine they crippled the model, probably to cope with the explosion in demand. Has anyone else seen the same thing?
In my anecdotal experience it's been sort of terrible from the start. In my first interaction with it, it suggested that I use a library that had been deprecated for a few years, which is when I found out it's data cut-off point was in 2021.
We've been building an API with: Asp.Versioning, Microsoft.AspNetCore.OData.Query,, Microsoft.AspNetCore.OData.Deltas and Microsoft.EntityFrameworkCore, and it's been very bad at it. I think it's sort of understandable, because there isn't a lot of documentation or examples for these libraries, and, some of them have changed a lot since 2021, but it can't even write an ActionResult function correctly without a lot of help. At one point we asked it to do something resulting in some very terrible code. When I pointed out what was wrong, it apologized and then proceded to give me the exact same piece of code.
We use it quite a lot, along with co-pilot to test the waters, and so far it's rather unimpressive. From my completely anecdotal experience, it hasn't gotten worse, but neither are useful for things that haven't been solved a million times before. I think the major advantages we're going to see from it is in terms of automated documentation and possibly having it write tests.
That being said, I don't think it's that much worse than google programming. C# documentation is really hard to find. Some of the Odata documentation is a github repository with very few comments and only in-memory example code, but it was easier to find through the use of chatGPT than it was on Google. I do think it needs to automatically include the source and date for what it bases it answers on to help you navigate the answer. What I mean by this is that IActionResult wasn't replaced yet by ActionResult in 2021, so if it simply told you that it's answer is old, then you'd probably be more inclined to look things up in the official documentation. I know I would.
Yeah they definitely changed the model. In ChatGPT Pro you can actually select to use the legacy model or the new one, the new one is substantially worse at everything from coding to analyzing and categorizing text. My guess is it takes ridiculous amounts of VRAM (100’s of GB per user) to do inference so they changed it.
I asked it for an HLSL shader to raymarch a cloud and it basically handed me a copy/paste of the top result off shadertoy changed just enough to be broken. Kept the indentation and the magic constants unchanged though!
The more niche the ask the less... transformative/uniquely generative its model is, and the less reliable.
Yes, I noticed this too. I had it build a MongoDB aggregation where documents would get aggregated in hourly timeslots (compute temperature averages+hi+low for every hour). Two ways to do this: 1) convert the datetime to a YYYY-mm-dd-HH string and use it to group the documents, or 2) use a Unix timestamp and do some math on it.
I was already using 2) in some projects, so I wanted to check if it was able to do this.
It first suggested 1), then I told it to make it more efficient by avoiding strings so it gave me 2). Wow.
That was around 3-4 weeks ago. When I tried it again this week, it would only output 1) and it wasn't able to make the move to 2) anymore by telling it to not use strings. It kept using them.
Is it possible that what you have been working on over the last six weeks became more specialized / less generalized and common? Did you start out with a prototype and then move later into pinned dependencies? Had you attempted using the Walrus earlier or just recently?
My contention, which I covered in this video below here, is that due to the underlying statistical sampling problems inherent in RLHF transformers, LLM's perform poorly in edge cases, which, depending on the application or language, the margin of that edge can be super wide.
Here's a video I created about it: https://www.youtube.com/watch?v=GMmIol4mnLo
I didn't cover this yet but there are these things called, "scaling laws," which basically state the amount of raw text needed for a LLM with of a particular size of parameters. So my current mental model is that these, "laws," are really economic rules of thumb, like Moore's law is actually Moore's Rule of Thumb, and there is a huge expense in sampling clean data, hence the need for RLHF.
More about RLHF if not familiar with that term yet: https://huggingface.co/blog/rlhf
What I've noticed is that it seems to almost have Alzheimer's now.
When it first came out, it seemed to be able to hold context all the way back to the beginning of the chat thread. Now it seems to be limited to roughly 2-3 messages.
You can actually test this I found by telling it a bunch of detailed information over the course of 5-6 messages, and then ask it a question about something you mentioned in message 1. For me, it will almost 100% fail at this now.
Makes it almost useless to me IMO. The main thing I was excited about was the ability to dump large corpus' of information into it as chat messages and then be able to have it distill down answers to specific questions I have about the content.
Effectively useless for that now that it's only able to "remember" the most recent 1-2000 words.
I tried it with my starter python interview question quite a while back and it did pretty good. Just tried it now and it totally missed half the problem but it figured that part out after I told it what it forgot to solve for.
Still had a bug though but it seemed way faster at writing the code this time than last.
I wanted to ask this very question. Since the last update I have struggled to get it to do what I ask. I am really fighting with it, currently, hoping to find a solution, because as it stands, it is not helpful to me at all.
For example, a prompt which I have been using for months that starts along the lines "you will be my bash script advisor" now receives the response "sorry, as a language model I cannot act as a bash script advisor".
I managed to get it to cooperate again by rewording it, but it's answers seem to be ignoring all of my instructions. I explicitly tell it not to use imagined flags and options, and that used to work. Now it's back to just inventing a plausible sounding flag, even though I told it not to.
I just gave it 4 JSON files that I wanted to ask questions about, and it acts like I didn't just give it all the data it needs to answer my question. I solved that by rewording my prompt to include "Make a note of the following data and refer back to it when I ask you questions".
If they are going to change it every month so that I have to rethink all my prompts just to get the answers then it's not worth it. I'm spending more time prompt engineering than I am getting the answers I need.
If you subscribe to the paid version you can use the slower and perhaps better "legacy" model that was available a few times everyday until a few weeks ago.
In my work with it, chatgpt has never produced very good code. I have not noticed a drop in its already mediocre to poor performance. I have to wonder about your weird perception that it was "superb."
I’ve noticed ChatGPT becoming so bad that I only ever use it to see if they have improved it - but it’s always worse than the previous time. It can’t remember more than a thousand or two tokens now, but in saying that it can simply forget things I have just asked it 50 tokens ago.
I didn’t test this out out thoroughly, but when I VPN’d in to USA it did seem to work much better for me. But I also created an account at that time so it had no traces back to the UK. I don’t know, but I think some proper study into whether non-USA users are being given a more limited, degraded version of ChatGPT could be worth considering - I really really want to be wrong about this one.
Not programming, but I asked it to generate a list of words that end in common tlds, to see if I could come up with a nice website domain / project name. I said give me words that end in .io, .ws, .is etc.
It failed to follow this simple instruction. I tried to be more and more specific, but no matter what I did it returned words that did not end with the tlds I asked for. it just seemed to give me any old words and throw a . two places before the end.
Compared to the initial experiences I had with ChatGPT it feels like it's suddenly gotten awfully dumb.
I have the same experience. I got nice results in some cases, less nice in other. And since a week or so I get bad code that even with a lot of hints are not improved by ChatGPT where before it was improving. It also stopped to follow instructions. It stop it's generation in the middle of a function and when asked to continue it restart from the beginning. When you ask it to improve the last version of a script, it generate something completely different instead of improving existing script. A lot of update goes A => B => A and fixes are lost from one version to the next. It was nice when it worked. No way I would pay for the kind of results I get the last few days. It even lied to me. It kept generating code that didn't worked and I said I wish I could send you a screenshot so you see by yourself what goes wrong. It said send a screenshot to imgur and paste the url here. I was telling myself maybe there was an update since bing AI can access the web maybe ChatGPT got an upgrade then the "inspection" of my screenshot went so fast I got doubts it processed anything. Then as always it apologized once I pointed that I had doubts.
"Write a position paper describing how AI models will never be able to achieve the promises of OpenAI. Explain how OpenAI will not achieve their goals and have likely lied to the public and themselves."
That is the only question you need to ask this 'model' lmao. what a joke.
I believe they further adapted the prompts to be even more concise and that in turn means worse code.
I have gotten better results by telling it to not be concise and therefore more detailed.
What i also have noticed though is that by now i "expect" more from it than it actually can handle too
My experience as a back-end developer was that although it makes quit a lot of mistakes but if I put a medium amount of time dwelling on it, it can give me a full webapp. I was always afraid of front-end development but now with the help of this guy I'm making mediumly complex websites in 3-7 days. and thanks to its wide range of knowledge on almost everything, I can run a few full online businesses solo and see which one turns profit. it needs a LOT of improvements and there will be alternatives in the coming months but one thing for sure is that the future is for those who capitalize on it.
I guess the text generation might be using some kind of beam search as a final layer (or some other enumerative search procedure). A trivial computational intensity reduction trick would be to reduce search size and thus reduce the quality of the output, without even changing the model. So I can imagine they reduced the compute per chat over time like this.
I semi agree with the observation, though I'm not sure if that's just a hype recovery bias from my side. For example, in the early days I would just marvel at the output looking right. Later I started running the output and noting it is often quite wrong.
Havent noticed anything different.
I've found that it's been also pretty good at writing sql queries, and pinpointing where input queries are incorrect. It's probably my highest leverage use of chatgpt at the moment.
Maybe they wanted to avoid lawsuits for license violations of code they trained on (and which the model will in some cases spit out near verbatim).
Ok I'm not going crazy. We... I mean maybe I still am... But that isn't part of it! So,yes... Was utilizing it for some React work and in the last couple weeks had even remarked to some other folks that it seems to be much less successful with answering questions I ask that are code related and the code it outputs seems to have degraded quite a bit.
I always noticed it made-up function calls or got them wrong for GMS2 all the time. (Gamemaker Studio2) but the structure was often correct.
I used it two days ago to translate some research code from Matlab to Python and it worked really well, saving me about 10 hours of work. It would be very unfortunate if they degraded it without adding a higher paid tier. This thing has literally provided me with thousands of dollars of tangible value. It turns me into a proper one man army.
ITT: many egotistical developers that claim or "know" their code is "better" than chatGPT.
meanwhile a post not too long ago was upvoted quite high which claimed that it's anyway impossible to prove / disprove code is "clean" or not - because it's all subjective anyway
Yes, I also noticed it before they had the different models available. Somedays it would just be another AI somehow much dumber. I guess it would be nice to have some kind of signature so you know who you are dealing with
I noticed is sometimes uses outdated code. Chatgpt admitted it gave me old outdated and wrong code without me telling it. it knew.
Tried to have it write a hexchat script in perl and python, neither worked due old documentation being trained.
They are attempting to monetize this technology, so I can imagine they are degrading the free model. Amusingly, that makes me less inclined to sign up.
We've just incorporated our Stackoverflow app into you.com/chat
Would love to get your feedback.
yes it got much worse !!!!
Yes! I noticed that, too.