What are the biggest limitations of agentic AI in real-world workflows?

Question

There&rsquo;s a lot of momentum around agentic AI systems that can plan and execute multi-step workflows autonomously.For teams that have tried deploying these in production environments, where do they actually break down?Is it reliability over long action chains, tool integration issues, cost unpredictability, state management, latency, observability, or something else entirely?I&rsquo;m especially interested in failure modes that only became obvious after moving beyond controlled demos into real usage.

buschleague · Accepted Answer

State management. The agents lose track of what they already did, re-implement things, or contradict decisions from 20 minutes ago. You need external state that survives compaction because the agent can't be trusted to maintain its own.
Constraint adherence degrades over long chains. You can put rules in system prompts, but agents follow them for the first few steps, then gradually drift. Instructions are suggestions. The longer the chain, the more they're ignored.
Cost unpredictability is real but solvable.
Ultimately, the systems need external enforcement rather than internal instruction. Markdown rules, or jinja templates etc., that the agent can read (and ignore) don't work at production scale. We ended up solving this by building Python enforcement gates that block task completion until acceptance criteria are verified, tests pass, and architecture limits are met. The core learning being that agents can't bypass what they don't control.