I built a Network Video Recorder, with full GUI-only configurability, in pure Python, no containers or databases or any of those other things modern FOSS apps use to make it unnecessarily hard to set up.
I didn't actually do anything all that interesting with the AI though, the only real innovation is the algorithms I'm using to not use AI.
I only do object detection when there is motion, and I do motion detection without actually decoding anything other than the keyframes, creating a fairly convincing illusion of being able to actually detect objects in real time, without needing too much CPU.
I also have some pre-buffering in RAM, so I can capture stuff just before an object is detected, to compensate for the slower response.
Actually creating new useful models is beyond both my skill and available hardware, but I think I did pretty well at making optimal use of existing tech.