Why can't Codex/Claude compile app and test that changes worked?

Question

Right now, both Codex and Claude makes changes based on your request but it's still you who needs to test those changes. Why can't Codex and Claude test them too?For example, when making a change to the website, why can't Claude or Codex compile the changes, open a browser, test the changes to confirm that they worked?It seems like the ability to verify is hugely important in autonomy. Yet, neither are doing it.

delaminator · Accepted Answer

What do you mean?Even Claude for Web can do the full compile edit cycle.It runs Debian and apt installs stuff on demand and can run anything its Debian has.I built a Chrome extension so Claude can get the full rendered DOM via a socket for web stuff.Why do you think it can&rsquo;t?Are you perhaps using Claude Chatbot?Claude Code will run anything you instruct it to.I have to tell it to stop more than to start!

aurareturn · Answer

It seems like Cursor can do this?

theblazehen · Answer

It can. For android, I have it dump screenshots and uiautomator xml dumps, and for web the playwright mcp.I find that a critical step in agentic development is to close the loop for the LLM so it can get direct feedback without needing you to manually handle it.