I tried a vLLM and Resnet training workload. The H100 outperforms the A100 about 45% to 80% consistently, but it isn’t that much faster…
What workloads would see the most speedup, because I’m really not seeing 3x+ any on vLLM or simple training workloads?