Anyone else having the same experience?
GPT4 in my experience has been much more consistent with it's results. It also doesn't seem to lose the context as quickly. And it can deal with more niche libraries a lot better.
Due to the lack of consistently I've found it difficult to use 3.5
I haven't tried it with LangChain type stuff so maybe it's different there?
As a side note - I've been noticing their performance change a lot in between their "updates" and they mostly seem to be getting worse at following instructions :/
In fact, today I tried the Observation/Thought/Action/Execution pattern found in LangChain agents and GPT-3.5 did it perfectly, but GPT-4 stopped after formulating the observation