Can I use ML to extract individual images from a webpage screenshot?

Question

proteuskor · Accepted Answer

Yes, a simple approach would be to use a convolutional network for multiple object detection. You would need a training set that was transferable. In your case I think you would best use a training set generator and generate a bunch of webpages with known image locations and sizes. Coursera has a video and links to references on efficient multiple object detection, look at the Andrew Ng courses. I suspect this would be an easy application. You would probably want to use PyTorch or Tensorflow to specify the model and use a GPU or GPU instance if on the cloud to accelerate the training sufficiently to explore hyper-parameters intuitively. Selenium may help with the test set generator if you can extract the image coordinates relative to the browser window with that and then generate a screen shot of that application window. You may need to generate 100k or so such randomly generated pages with multiple rectangles to reach very good accuracy, but that number is very hand wavy.E.g. you need to do your best to generate a uniform distribution of web pages, and then render them with a real browser, extracting the image rectangle coordinates somehow to generate the labeled test set. Then feed that into a convolutional network (or R-CNN) with back-prop's error metric being a function of the R-CNN's output objects vs. the training set labels. This should then converge on correct results and you should plot the convergence of the error metric to see when it asymptotes. If the quality is sufficient at asymptote you are done, if not you may have to change structure and/or hyper-parameters.

tgflynn · Answer

Probably but it would be a lot of work. Why do you need to work with screenshots instead of live sites ?

gerardnico · Answer

Yes. It's called edge detection in computer vision.

Jugurtha · Answer

What are you trying to do ?