How to deploy GPU models for live video?

Question

Hi all -- I have some models (which require GPUs) that I've trained to run on my live webcam feed that I'm trying to figure out how to deploy to the cloud so other people can try them out. Has anyone had any success with this?On one hand, I see tools like BentoML, TorchServe, triton, etc which wrap your model in an http api. On the other hand, I see SaaS services like livekit, twilio, zoom sdk, etc, which let you stream your webcam, but I don't see anything which lets you run your GPU models on the video. The closest I could fine are livekit workers (which have a specific sdk and seem to require a pretty deep deployment understanding to deploy on hook up) and nvidia maxine (which is highly inflexible, is in c++, and seems to want you to "put C++ with your models here").I was wondering if anyone had any recommendations?

urhku3hjkdskfjs · Accepted Answer

I'm curious about this too. I've seen others discuss combining nvidia DeepStream with triton for processing live video, but these were just tangents from other topics. Does anyone have actual hands-on experience here?