HACKER Q&A
📣 misiti3780

What is the best solution to monitor your infrastructure in 2022?


I have a large AWS deployment, Django based using celery, scrapy, DRF, redis, ECS, postgres. I was wondering what software solutions other engineers are using to monitor this type of thing?

I was looking to ask questions like:

- Number of requests per minute/hour/day - Number of celery workers working - Size of celery queue - Keys in redis - postgres stats - etc.

Basically what a macro-view of what is going on. I dont want to build my solution from scratch because I do not have the time.


  👤 castillar76 Accepted Answer ✓
I tend to divide "monitoring" into two pieces: alerting ("tell me when XYZ is down or failed") and metrics ("tell me what's going on right now so I can look for patterns and bottlenecks"). Those are closely tied together, but they're not quite the same thing: there are plenty of products that are really good at one but not good at all at the other.

Looks like you're looking mostly for the latter, in which case Prometheus + Grafana would be the free-to-play option I'd reach for first. The new Grafana Alerts looks interesting for covering the "alerting" piece. Cloudwatch is the logical choice for AWS stuff and it does both pretty well (although I don't know whether you need a separate service like PagerDuty to really handle alerts), but it's not free (especially as you do more with it), and some of the other options like a managed provider (Datadog, e.g.) become more attractive as you put more and more money into it.


👤 runjake
What's my budget and staffing? My solution would entirely depend on that.

👤 voxic11
Since you are on aws cloudwatch/servicelens is an obvious choice https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitori...

Datadog is also pretty good at this but can cost quite a lot.


👤 shyn3
For database I love DPA. For New Relic is great for pricing for the rest. I prefer it to Datadog as NRQL is awesome. Datadog has an events API which is cool also.

👤 AishwaryaVenkat
Atatus provides dashboards and alerting for monitoring your servers, software and more in order to understand how healthy your systems are.

👤 sidcool
Prometheus and Grafana. Datadog. Managed services.

👤 altdataseller
Depends on your budget. If you have a shoestring budget, then I recommend Honeycomb or Sentry if you don’t want build your own solution

👤 Redsquare
Datadog

👤 dsgrillo
datadog