HACKER Q&A
📣 reolbox

How to use AI/ML to cut all dead time from table tennis video matches


Hi all,

I am a table tennis player and I like to record my matches for learning purposes. The problem is I always have to manually cut away the boring stuff (pick up balls, time outs, towel breaks,...). Are there any open source libraries I could use to do this with AI/ML? What is the best way to start? I am an experienced web programmer looking to build this as a hobby. I tried to Google but haven't found a solid library yet. Thanks for your recommendations!


  👤 Jugurtha Accepted Answer ✓
Thought experiment: play a video and close your eyes. You will know when the ball's in play by ear.

I think the best bet would be sound event detection. What's associated with players actually playing? Both sounds of ball/racket and ball/table contacts and impacts.

From a whole video, you cut out the segments where there's no impact plus a few seconds (end of play) and minus a few seconds (before serve).

A crude approach/proof of concept would be a simple (sound) peak detection which could go a long way in my opinion.


👤 lovelearning
Perhaps start with a video activity recognition model like X-CLIP [1].

It generates activity labels or free-text descriptions for segments of your video.

Then use text matching rules or, if the activities are diverse, semantic matching to select activities to either retain or remove corresponding fragments in the video.

Try a model demo first with one of your sports video to see what it outputs [2]. Alternative demos are also available on [3].

[1]: https://github.com/microsoft/VideoX/tree/master/X-CLIP

[2]: https://huggingface.co/spaces/fcakyon/zero-shot-video-classi...

[3]: https://huggingface.co/microsoft/xclip-base-patch32


👤 is_true
I would try to detect the table and then try to detect the ball moving on the table.

Then crop the video a couple of secs from the first/last detection


👤 jvln
After you solve it, please share :)