The idea is to figure out the distances of said people to certain projected inages and change them accordingly when these distances change.
Now I have some experience with laser baded Rime-of-Flight sensors, but those have too narrow of a field of view and wouldn't be able to cover the whole area (it would be a U shapes space). I had a look at OpenCV and YOLOv8 and bytetracker. That worked okay, but produced many false positives and jitters on my recorder-from-top test footage. Two people standing next to each other were classified as a single elephant, bear, horse all within the span of a single second. I had a look at training my own model, but I won't be able to film the footage at the space at which this should take place.
Is computer vision even the right option? If it is, does anybody know good models for people from top?
https://arxiv.org/search/?query=%22pose+estimation%22&search...
See also
https://github.com/pliablepixels/zmeventnotification
for a rather mature system that adds person and object detection to a security camera system.
Alternatively, could you place sensors on the floor under carpets to measure the weight of human bodies?
Or use opencv or something like it, but use infrared cameras from above. I'm not sure how hats would affect that...