HACKER Q&A
📣 kundan_s__r

How are you preventing LLM hallucinations in production systems?


Hi HN,

For those running LLMs in real production environments (especially agentic or tool-using systems): what’s actually worked for you to prevent confident but incorrect outputs?

Prompt engineering and basic filters help, but we’ve still seen cases where responses look fluent, structured, and reasonable — yet violate business rules, domain boundaries, or downstream assumptions.

I’m curious:

Do you rely on strict schemas or typed outputs?

Secondary validation models or rule engines?

Human-in-the-loop for certain classes of actions?

Hard constraints before execution (e.g., allow/deny lists)?

What approaches failed for you, and what held up under scale and real user behavior?

Interested in practical lessons and post-mortems rather than theory.


  👤 al_borland Accepted Answer ✓
I’ve just been ignoring my boss every time he says something about how we should leverage AI. What we’re building doesn’t need it and can’t tolerate hallucinations. They just want to be able to brag up the chain that AI is being used, which is the wrong reason to use it.

If I was forced to use it, I’d probably be writing pretty extensive guardrails (outside of the AI) to make sure it isn’t going off the rails and the results make sense. I’m doing that anyway with all user input, so I guess I’d be treating all LLM generated text as user input and assuming it’s unreliable.


👤 stephenr
I've found that I can use a very similar approach to the one I've used when handling the risks associated with blockchain, cryptocurrencies, "web scale" infrastructure, and of course the chupacabra.

👤 Agent_Builder
We ran into this while building GTWY.ai. What reduced hallucinations for us wasn’t more prompting or verification layers, but narrowing what the agent was allowed to do at each step. When inputs, tools, and outputs were explicit, the model stopped confidently inventing things. Fewer degrees of freedom beat smarter models.