HACKER Q&A
📣 Agent_Builder

How are you preventing LLMs from hallucinating in real workflows?


I recently tried building a small agent for coaching centers.

The idea was simple: a teacher uploads a syllabus or notes, and the agent generates a test paper from that material. The hard requirement was reliability. No invented questions, no drifting outside the syllabus.

Instead of trying to “fix” hallucinations with better prompts, I constrained the agent’s job very narrowly.

I defined:

a fixed knowledge base (only the uploaded syllabus)

explicit tools the agent was allowed to use

a structured output format for the test paper

a hardness distribution (for example 30% easy, 50% medium, 20% hard)

Once those constraints were in place, the behavior changed a lot. The agent stopped being creative in the wrong places and consistently produced usable test papers. The quality improvement came from reducing freedom, not from changing models.

I built this using GTWY.ai, mainly because it let me wire together a knowledge base, step-level tool permissions, and model choice without writing a lot of glue code. But the interesting part for me wasn’t the platform, it was the pattern.

It made me wonder:

Are others seeing similar results by narrowing agent scope instead of adding verification layers?

Do constraints scale better than smarter models for production use cases?

For education or other regulated domains, is this how people are actually shipping agents?

Curious what’s working for others in real deployments


  👤 philwyshbone Accepted Answer ✓
One effective way to reduce hallucinations in LLMs is to implement an iterative feedback loop with domain experts. By having teachers review the generated test papers and provide corrections or adjustments, you can refine the model's outputs and reinforce accuracy.

We ran into this ourselves when developing an AI-assisted tool for educational content. Initially, we found that the model would sometimes produce irrelevant questions, so we brought in educators to help fine-tune the prompts and outputs, which significantly improved the results.

We ended up building Wyshbone to enhance our approach to content generation and ensure that the questions align closely with the provided materials, integrating feedback directly into our workflow.