How are you monitoring SQS dead letter queues in production?

Question

Asking because I've spent the last month talking to engineering teams about this. Nobody does it the same way. Hand-rolled CloudWatch alarms, Lambda pollers, cron jobs. Some teams just don't bother.Most are also watching the wrong metric. NumberOfMessagesSent won't fire when SQS automatically moves failed messages to the DLQ. You want ApproximateNumberOfMessagesVisible. Queue depth alone still misses slow drains too. Messages can age out silently before anyone notices, especially if your DLQ retention period is shorter than your source queue's.What's actually working out there?

mikece · Accepted Answer

The best way to monitor the DLQ is by setting up CloudWatch alarms on the ApproximateNumberOfMessagesVisible metric. Set an alarm to trigger when the message count exceeds 0, paired with an SNS notification to alert developers via email, Slack, teams, Pager Duty, or whatever your preferred mode of being alerted is.