How do you monitor LLM response quality?

Question

For any of you that have systems with LLM interactions in production &mdash; how do you monitor how the quality of LLM outputs continuously?Do you use another LLM to evaluate if the response was hallucinated, and grade it across a set of metrics?

constantinum · Accepted Answer

Quoting Unstract&rsquo;s documentation on LLMEVAL to reduce hallucination &ldquo; it is always a good idea to choose an LLM from a completely different provenance or origin as the evaluator LLM as compared to the LLM that is used to structure documents.&rdquo; https://docs.unstract.com/unstract_platform/setup_accounts/s...

spenrose · Answer

What We&rsquo;ve Learned From A Year of Building with LLMs: https://applied-llms.org

How do you monitor LLM response quality?

For any of you that have systems with LLM interactions in production — how do you monitor how the quality of LLM outputs continuously?
Do you use another LLM to evaluate if the response was hallucinated, and grade it across a set of metrics?

What We’ve Learned From A Year of Building with LLMs: https://applied-llms.org