I've handled this before for BI teams — read replica with a generous `max_standby_streaming_delay` and `hot_standby_feedback` on, accepting the occasional bloat on the primary. Worked fine. But the AI/ML ask feels different in ways I can't fully articulate yet, which is part of why I'm asking.
A few things I'm trying to calibrate:
Where does the agent actually connect? Primary with RLS, read replica, warehouse (Snowflake/BigQuery/Redshift), lakehouse (Iceberg/Delta on S3), or something else?
If you're not doing this — is it compliance, cost fear, bad experiences (runaway queries, PII in prompts), or something else?
And the one I'm most curious about: does this actually feel different from giving BI tools DB access, or is it the same problem wearing new clothes?
Not looking for product recommendations. Trying to get a real sense from people who've actually faced this challenge.
The practical answer I've seen hold up: push column-level redaction before the agent layer, not after. A logical replica with PII columns replaced by null or a stable hash gives you the same query surface, plus one audit row per session at the connection pooler, not the app. The AI team gets its data, you get a hard boundary that doesn't rely on prompt engineering.
The harder question is ownership. In a startup where the ML lead, the infra person and the security person are often the same tired CTO at 10pm, the right answer depends on who gets paged when a hallucinated query wakes up the primary. Usually the answer is nobody, which is the real problem behind the technical one.