Lately, I'm getting lots of feedback from my team that its functioning less and less, to the point where it refuses to respond with JSONs and can't handle tasks that require repeating the same operation on a list of provided inputs in the prompt.
I'm wondering wether this is due to a general decline in quality, or due to some mechanism that degrades response-quality/capability in response to many near-identical prompts.
FWIW, we'd happily pay for the compute if that was an option.
Is it becoming less and less reliable for you, too?