import modal
def download_model():
from transformers import pipeline
pipeline("fill-mask", model="bert-base-uncased")
CACHE_PATH = "/root/model_cache" # model location in image
ENV = modal.Secret({"TRANSFORMERS_CACHE": CACHE_PATH})
image = (
modal.Image.debian_slim()
.pip_install("torch", "transformers")
.run_function(download_model, secret=ENV)
)
stub = modal.Stub(name="hn-demo", image=image)
class Model:
def __enter__(self):
from transformers import pipeline
self.model = pipeline("fill-mask", model="bert-base-uncased", device=0)
@stub.function(
gpu="a10g",
secret=ENV,
)
def handler(self, prompt: str):
return self.model(prompt)
if __name__ == "__main__":
with stub.run():
prompt = "Hello World! I am a [MASK] machine learning model."
print(Model().handler.call(prompt)[0]["sequence"])
Running `python hn_demo.py` prints "Hello World! I am a simple machine learning model."You can check out available GPUs at https://modal.com/docs/reference/modal.gpu.
There's also a bunch of easy-to-run examples in our docs :) https://modal.com/docs/guide/ex/stable_diffusion_cli
I'm at erik@banana.dev if you want any help with it :)
The first two are more customizable than the last. SageMaker is the cheapest.
I'm assuming you know what you need for a GPU. If you're unsure, consider trying to run inferences on a CPU and see how long it takes and if it could work.
And then just look at price and reliability for a gpu machine with the different cloud providers. Ovh is cheap but the only thing worse than their reliability is their customer service. Various niche players offering V100s used to pop up that were pretty cheap. AWS is more expensive, more reliable, they may still have availability problems. Paperspace looks pretty good. Etc.
You can give us a shot at https://truefoundry.com We are a general purpose ML Deployments platform which works on top of your existing Kubernetes clusters (AWS EKS, GCP GKE or Azure AKS) abstracting away the complexity of dealing with cloud providers and Kubernetes. We support Services for ML web apps, APIs, Jobs for ML training jobs, Model Registry for storing models, Model Servers for no code model deployments. (Our platform can be partially or completely self hosted for privacy and compliance)
Adding one or more GPUs (V100, T4, A10, A100, etc) is simply one extra line https://docs.truefoundry.com/docs/gpus#adding-gpu-to-service...
Examples:
- Stable Diffusion with Gradio: https://github.com/truefoundry/truefoundry-examples/tree/mai...
- GPT-J 6B fp16 with FastAPI: https://github.com/truefoundry/truefoundry-examples/tree/mai...
For non-serverless, some to check out are these (though likely all overkill if you just need a single GPU)
vast.ai
Lambda labs
Wrapped the thing in a flask app so I can expose APIs I build out.
[0] https://cloudmarketplace.oracle.com/marketplace/en_US/adf.ta...
Disclaimer: I am the CTO ;)
Why use us?
Competitive prices (billing by the minute, only pay when you actually run an instance). High reliability (professional DCs, customized hardware to suit requirements). Good connectivity (traffic is also free, no in-/egress fees). High security level (full VMs with dedicated GPUs with proper separation of customers instead of shared hosts with docker). Free storage. A great support team. Green energy (no greenwashing by carbon offsetting, we use energy sources that are renewable and carbon free at the source (geothermal/hydro)).
I could go on... Would love it if you just try our services, after sign up there are free credits available for risk free testing.
I use mix of both for my side project: https://trainengine.ai
There's one I won't share that's is now defunct but you could use any diffuser's compatible project on Hugging Face, which was such a cool feature. I wish someone (cheap) would implement this!
edit: just looked at banana.dev in this thread, their templates look closest to the HuggingFace integration though I don't think they have webhooks.
They’ll take a FastAPI setup too and just put it online to be used on demand.
This is exactly what you’re looking for