1. https://github.com/giltene/wrk2
But I am curious, what does HN use? Any tips?
The simple view can be used by load balancers to determine a better distribution of load and keep new traffic off nodes that are about to be in trouble until their queues clear out. The more complex views can be used to help DevOps people find bottlenecks and do capacity planning.
It would be up to your developers to create either a simple HTML interface for this, or an API, whichever fits your workflow needs. The data could then be ingested by tools like Grafana or Prometheus.
Once all those metrics and load tuning are in place then simple tools like "ab" Apache Bench could be used to hit the monitoring page with parameters that trigger a bigger load. Use your graph tools to see which parts of the stack get into trouble first. This methodology is potentially also useful to rinse-and-repeat from Dev to QA to Staging to Production to see if "one of these things is not like the other" and if so, the team must play the song.