We do have most of the basics covered:
- rate limits based on IP (configured at nginx level)
- rate limits for authn users to prevent abuse (configured at app level)
- WAF by AWS Shield (before requests hit the Load Balancer)
- auto scaling system is in place too
However, we are seeing attacks which range from 500k-2M req/min lasting for about couple of hours. After a while the autoscale doesn't help much. WAF works in most cases, however, sometimes WAF blocks only about 4-10% of these DDOS requests which is not effective enough.
How can I smartly prepare for next such waves?
Don't autoscale to fix the situation unless you're providing a service which has to be available. There are trivial and cheap ways to open lots of connections that will burn your money.
Also if the traffic is not very sophisticated (requests to your front page), you can make sure your page is completely cachable and served from cloudfront / CloudFlare / some CDN rather than your servers.
If the traffic gets to nginx or your app you’ve already lost. You need to block the traffic farther out. AWS Shield is where you should focus, that’s what it’s for. If it’s only blocking a small percentage of the unwanted traffic consult with an expert about Shield configuration.
Cloudflare is another widely-used option.
There was a great write up recently about it at https://usefathom.com/blog/ddos-attack