I saw a couple YC launches like Hegel AI.
I'm personally interested in deployments in small teams or teams with a lot of freedom to pick and choose their own tooling.
Basically you can get a Docker container that will publish an Open AI API compatible end point. You can then choose the model that sits behind that API.
As deployment will be in Kuberenetes we will clusters with GPU resources to maxz out performance but we're not there yet.