On one hand, I see tools like BentoML, TorchServe, triton, etc which wrap your model in an http api. On the other hand, I see SaaS services like livekit, twilio, zoom sdk, etc, which let you stream your webcam, but I don't see anything which lets you run your GPU models on the video. The closest I could fine are livekit workers (which have a specific sdk and seem to require a pretty deep deployment understanding to deploy on hook up) and nvidia maxine (which is highly inflexible, is in c++, and seems to want you to "put C++ with your models here").
I was wondering if anyone had any recommendations?