Are you saving inference costs on GPUs at your company

Question

I&rsquo;m currently trying to solve a problem we're having, GPUs are expensive! I've been thinking of ways to cut our inference costs at my company and wanted to hear your perspective.Did anyone implement something similar? How did it go? How much time did it save? What was the cost improvement? I recently found this tool in the AWS samples: https://github.com/aws-samples/scalable-hw-agnostic-inferenceI'm wondering if anyone used/tried it or other approaches?

ricktdotorg · Accepted Answer

i've used GCP GPU Cloud Run to build an on-demand/auto scaling livestream/HLS video translation --> subtitle generation pipeline with great success.[edit: sorry, not inference, but a great cost-saver]