How do you evaluate prompts?

Question

We've been developing prompts for writing agents with OpenAI. After working on it for the last months, we have a huge amount of prompts lying everywhere. We don't have a good workflow around keeping tracking of prompts, scoring them, revisiting them later on, tacking their output and degradation to give the right output over time.How do people here do this? What is your workflow around this?

gbertb · Accepted Answer

look into https://promptlayer.com/ https://www.helicone.ai/ https://wandb.ai/site/prompts