I've looked at Replicate and Together.ai, they both provide really the best tools in this space, but hosting is expensive. Together costs about 1.4/hr to host a 7B model. Replicate is more expensive.
Ideally, I wouldn't be charged for idle time and only active time (replicate does this already, but your finetuned model needs to be based off of a limited set of base models)
Any recommendations?
Roll your own k8s? Predibase?