If so, what tools do folks use to generate load? How do you visualize and analyze results? What currently sucks?
The closer the load is to a real-world workload the better. If I'm building a log ingestion cluster or a metrics ingestion system, I'll try loading it with real logs or metrics. If I'm building some sort of high scale web app, I'll try generating traffic patterns that emulates what I think the user will do.
Of course, you almost certainly will miss out on something. The real world will always find something you didn't think of. The point though is to find the things that will be your bottleneck and fix them so you reduce the number of unknowns as possible.
Usually I just want to see how far I can get before the system breaks, and I want to see what breaks. This will tell me how much headroom I have and whether I feel comfortable with the amount of runway I'll have with the system.
For generating load, it depends on the system I'm building. Usually it means writing a script that generates the load that's close to what I expect in the real world.
Then again, could also use anything like Apache JMeter (https://jmeter.apache.org/), Gatling (https://gatling.io/open-source/) or any other solution out there, whichever is better suited for the on-prem/cloud use case.
That said, when time was limited and I didn't have the time to figure out how to test WebSocket connections and which resources the test should load, I cooked up a container image with Selenium (https://www.selenium.dev/) with Firefox/Chrome as a fully automated browser, for 1:1 behavior as real users would interact with the site.
That was a horrible decision from a memory usage point of view, but an excellent one from time-saving and data quality perspectives, because the behavior was just like having 100-1000 users clicking through the site.
Apart from that, you probably want something to aggregate the performance data of the app, be it something like Apache Skywalking (https://skywalking.apache.org/) or even Sentry (https://sentry.io/welcome/). Then you can probably ramp up the tests slowly over time in regards to how many parallel instances are generating load and see how the app reacts - the memory usage, CPU load, how many DB queries are done etc.
What sucks right now is that load testing tools are not DRY. The re-implementation of page walking logic in a separate tool delays releases by a full sprint at most orgs.
That's why I wrote a DRY load testing tool that lets reuse your existing implementations, like Page Objects (Playwright, Selenium, Cypress), custom code, or PostMan requests, to run full fledge load tests at scale--not a half-baked export--it runs your actual code from within your own EC2 account.
It's a pretty cool setup because there's virtually no maintenance on K6 side, all you have to do is extend the postman collection when necessary, commit that and you can manually trigger the load test CI job with the up to date collection.
https://loader.io/ has visualization included but the setup usually takes me more time, e.g. generating 10.000 random queries (for a search engine). They have a limit (3 or 5 megabyte, can't remember) on the maximum payload size and I regularly hit the limit. They host on AWS US East coast exclusively which added a tiny bit of latency when testing services in Europe.
Generating the load is the difficult part for me. Things that speak HTTP/HTTPS are generally trivial, but some APIs are more difficult than others.
I'm trying to wrap my head around simulating many Docker clients hitting a registry that's acting as a pull-through cache. Ideally with varying sets of images/tags to truly exercise it.
I don't really mind visualization. Most of the metrics I care about are those I can observe with common utilities like strace, iostat, and vmstat.
I mainly just want to be able to put varying levels of heat on the thing
It's not perfect; I wouldn't trust it to see potential upstream bottlenecks that would be triggered by a big bang launch, but we normally roll things out incrementally and it's more than adequate for staying ahead of organic growth.
it has decent lua hooks to customize behavior but i use it in the dumbest way possible to hammer a server at a fixed rate with the same payload over and over.
i run it by hand after a big change to the server to make sure nothing obviously regressed. i used to run it nightly in a jenkins job but 99% of the time no one looked at results. it was nice to see if assumptions on load a single node could handle didn't hold anymore.
In the past I used BlazeMeter SaaS, which rocks