Is there a way of automatically place 3D objects in the scene? That is automatically find a ground plane and somehow calibrate it using reference objects.
Is this scalable? Say I have thousands of different videos, now any manual step is not feasible. Is there a way of using the video itself for this calibration problem?
There's lots of research, mostly based on deep vision models / transformers (algo)