We have a small site we're hosting on Linode. It's a single app box, and a single db box. Thanks to datadog I noticed some occasional (random) slow requests. Digging into it revealed it was slow responses from the (MySQL) database eventually working out that it correlated exactly with disk latency spikes.
I then ran `iostat` in a loop to catch the slowdowns when it happened, and I can see `w_await` times of up to 2seconds at times!!. After contacting support they moved a few of our noisiest neighbours away which has reduced the issue a lot.
My question is: Is this just normal in cloud/shared infrastructure? Would moving to AWS (or similar) help at all? Maybe I should just forget about a handful (10-30) slow requests per week which are impacted by this (we have maybe 800K requests served by the app server a week, so it's a tiny percentage). I just find it annoying that for some people they get 5-10second requests when it ought to be 200-300ms.
Any insights most welcome!
Dedicated hardware helps, and over provisioned ram helps, but sometimes disks (including ssds) are off doing something else when it comes to servicing your requests and it takes more time.
The best you can do to get latency down is spend a lot more money on RAM and pray.