How can startups running ML models reduce their compute costs?

Question

Machine learning stuff seems generally expensive to run in production, especially at scale (how many GPU machines do you need up and running to serve 100 concurrent users without major delays?). It seems especially hard for B2C/freemium-style companies to offset those costs.If, for example, you're a freemium web app where each user request takes 100ms of inference time, you may have trouble turning a profit. Even something as basic as a translation tool (&agrave; la Google Translate) seems tricky for a startup to run at a profit. Larger companies presumably have the flexibility to treat some of these things as loss leaders, whereas a startup with ML as its core product likely needs to be more conscious about cost.What are some ways that you've seen companies cut down on compute costs for machine learning services?

sidlls · Accepted Answer

Batch as much as possible. For companies that absolutely must have actual ML the temptation to make everything real-time/on-demand is high but sometimes (often) unnecessary. Take a serious, hard look at where that line is. It&rsquo;s like companies that need simple rules or analytics reaching for ML: an unnecessary expense in terms of development time and hard cash.

consultutah · Answer

If you can batch things up and don&rsquo;t need real-time returns using spot instances is a great way to lower costs.