HACKER Q&A
📣 sss111

Computer Vision Project Ideas?


We're a team of 5 senior undergrads taking a grad level Machine Learning + Computer Vision class. If you have ever had an idea but never had time to implement it, let us know here!


  👤 ArtWomb Accepted Answer ✓
It seems like bar code scanning is a 1980s technology. The auto checkout kiosk could have product recognition, pricing & checkout built in ;)

State of the art in CV is remains video prediction: given N frames of patch input, generate the next frame.

If you are into space exploration, there are a lot of cool datasets like the "Spot the GEO" challenge

https://aiforspace.github.io/2021/

And if you get access to NVidia GPUs in the cluster, there's plenty of envelope pushing stuff you can do with Omniverse: AI for rendering, light transport, physics simulations, etc

http://cs348i.stanford.edu/


👤 BadCookie
Occasionally, a child will die due to accidentally being left on a bus that overheats. This feels like a problem that could be solved with a camera or two on each bus plus some computer vision logic. This might almost be too “easy” of a project from a technical standpoint … but it is a real problem.

Examples:

https://english.kyodonews.net/news/2021/07/6d6979264de4-japa...

https://www.latimes.com/local/lanow/la-me-ln-settlement-auti...


👤 ShamelessC
What sort of compute do you get access to? There's a lot of cool stuff you could do depending on whether or not you have decent GPU's and for how much time you're allowed to experiment on them. Experimentation is fairly fundamental in practice.

There are a lot of pretraining tasks in vision/multimodal that are cool. Largely techniques introduced or refined by OpenAI re-implemented as pytorch open source codebases with varying degrees of success:

- Finetune your own CLIP https://github.com/mlfoundations/open_clip

- Train a (much smaller) DALLE https://github.com/lucidrains/DALLE-pytorch

- Train your own guided diffusion https://colab.research.google.com/drive/1javQRTkALBWLFWnx1K4... (pretty tough, may only be feasible on domain-specific data)

- Train a variational autoencoder (VAE)

- "VQGAN" from Heidelberg https://github.com/CompVis/taming-transformers

- "Discrete VAE", used as the backbone for OpenAI's DALL-E, reimplimented here (and other places) https://github.com/lucidrains/DALLE-pytorch

- "VQVAE2" https://github.com/tgisaturday/dalle-lightning


👤 Jefro118
Identifying the elements within a GUI image (e.g. this is a button, that's an input field, etc.). I want this myself for a tool I'm building to turn Figma designs into code but it's also useful for things like automated testing. There are a bunch of papers on this already but no good public version that I can find. Probably companies like UIPath already have a sophisticated version of this internally. If you could do this and turn it into an API it would be quite valuable I think.

👤 no_time
Predict the outcome of a roulette spin realtime.

Write a fully machine vision aimbot for CSGO. Perhaps you could feed the mouse and keyboard input into the tracking algorithm to improve accuracy. You need to intercept the mouse input anyway to tamper with the game state.

Predict a coin flip realtime.

Write a program that retroactively looks for a certain cat in a security cam footage (I miss my cat). This is the one I actually attempted a while ago using a the most dumb method known to man: Since it was an orange cat on a mostly grey/green footage I just defined a color range from dark brownish orange to light brownish orange and parsed each frame of the recording. It didn't work that well without defining a lot of treshold rules.

There are quite a few deterministic carnival/arcade games you could cheat with a bit of machine vision magic :^) Stacker comes to mind for example


👤 the_only_law
I’m not super familiar with the domain or if it’s trivial or not, but many years ago I had a budding idea for a stupid AR game, where using a phone’s camera, you would view the world around and the game detect human faces real time, drawing over them and turning them into “enemies”.

Also the idea, that many have shared of using CV to detect insects (say a cockroach) and then attacking it with some sort of weapon (everyone loves lasers, but a laser strong enough to kill an insect like that seems like it would introduce significant risk of collateral, so I wonder if instead a jet of household pesticide could be used). I wondered a while back if those little hexbug toys could be used for development.


👤 ragebol
Take your favorite broad or card game, recognize the game state and suggest an optimal move.

I've created a bot for the card game Set years ago using classic computer vision. Should revisit that when I get my OAK-D Lite camera.


👤 jobigoud
I have one that should be relatively easy: get popular magazines in PDF format and train a network to predict if a page is a full page ad or actual content. Then rebuild the PDF without the ads.

👤 judohacker
I want to point my phone at a stack of poker chips and get a total of each chip color, e.g. 132 blue chips, 54 red chips, etc. Bonus points if I can then tell the app the value of each color and get the total.

I play cash games every week and it takes the host forever to count people's chips when they want to cash out.


👤 kordlessagain
It would be great to have a model extract images from a screenshot of a webpage, then save them as their own images with locations of where they came from on the page. I haven’t been able to find this solution, although I’ve been able to do it with a color flattened palette approach using opencv.

👤 high_byte
I'd like to convert FaceMesh (tensorflow) to Blend Shapes (aka morphs), like iPhone's LiveLink app but without iPhone. I have some solutions and workarounds but none are good enough.

👤 Raed667
I have been going to the gym for the last couple of months. I would love to see an app that analyses my movements (dumbbell, barbell, kettlebell, etc..) and tell me how i can correct it.

👤 hellohntoday
Please can you post a way for me to get in touch.

👤 fragmede
read video from my dash cam and tell me if my car will fit into a parallel parking space before I drive by it

read video from my dash cam and classify at the vehicles around me for being a police car or not.