Prometheus vs. StatsD / Telegraf

Question

I see a lot of people talk about Prometheus on here and speak about it as though it's the only metrics gathering solution. In this way, it really does seem like it has become the poster child of Hacker News and metrics gathering.I've used both Prometheus and the Telegraf / StatsD solutions; and, for a very long time, I've disliked everything from the standard "bugs"[0] in Prometheus to the entire design philosophy of their pull vs Telegraf and similar's push methodology.What is the collective's general stance on Prometheus vs Telegraf; and why does the collective tend to end up preferring one over the other?[0] For example, Prometheus clients does tend to consider a counter that hasn't been incremented to exist, so if you have an error counter, the sudden existence of the error counter is how you find an error. The 'increase' is 0, though, because it went from not existing to a value of 1. Citation: https://github.com/prometheus/prometheus/issues/1673No, it's not technically a "bug", it's how it's designed; but, it speaks to how it's used and the work-arounds are unsatisfactory, in my opinion.

margor · Accepted Answer

While I somehow understand Prometheus idea that pull is easier to scale than push I've had a bad luck with it.
First of all Prometheus doesn't even consider monitoring of long-running jobs other than pull way (which didn't make sense for me). There is push gateway [0] but clients libraries seem to consider it only for short-lived jobs where you can send the metrics at the end [1]. It seems I couldn't "push" from long living jobs trivially
Second when using it for example with django you have to be careful with how you handle multiprocessing that UWSGI/gunicorn does, see [2] - it has bitten me at leas once.
Comparing to push model where I can just push metrics to [3] statsd_exporter directly and be done with it, but support for statsd is lacking both in terms of frameworks (everyone seems to be migrating to native clients...) and functionality (you've to do labeling basically manually [4])
To sum up: Prometheus is really great when it works, until you try to go off-track (intentionally or not, see django [2]) then you see its all undiscovered and immature landscape
[0] https://github.com/prometheus/pushgateway
[1] https://github.com/prometheus/client_python#exporting-to-a-p...
[2] https://github.com/korfuri/django-prometheus/blob/master/doc...
[3] https://github.com/prometheus/statsd_exporter
[4] https://github.com/prometheus/statsd_exporter#glob-matching

killtimeatwork · Answer

Not having much to add, but just wanted to say I'm also annoyed by not being able to track the 0 -> 1 increase in Prometheus (by functions like "rate")... Does anyone has a workaround for that?