The word observe above wasn't an accident - in addition to building your infrastructure via IaC, you should consider observability to be an MVP feature. Logging, metrics and traceability are essential when you're trying to debug an anomolous behavior. Using an APM (New Relic, Datadog, etc) is probably a time-saving move as there are pre-defined dashboards for many types of workloads.
Having IaC provides another benefit that can't be understated - it enables test instances of your application to be created and destroyed easily which means that for a few tens of dollars, you can run load tests on infrastructure that exactly matches production. There are even tools that can record network traffic on your production system and replay these requests on your test network. Make sure you've got realistic data to prepopulate the system when it's created ... and for your wallet's sake, don't forget to automate the destruction of these resources when the test has completed.
Chaos testing (Netflix's Simian Army, etc) is a great way to find edge-cases without waiting for them to show up on your production system. If you're focused on resilience, this is a great way to exercise failure states. I like to think of it as fuzz testing for systems.
If you define you're infra as Terraform, which I would generally encourage - you can use terratest[0] as a test framework for the infrastructure itself.
I've come to really prefer managing infrastucture as a github repo full of terraform this way.
The type of test determines what the best way to perform it is.