HACKER Q&A
📣 woshiange

what are the best AI/services for product categorization


I have a bunch of product names and images that I scrapped from different e-commerce platforms. What would be the best algorithm/service to map them against a category from the Google Product Taxonomy (https://www.google.com/basepages/producttype/taxonomy.en-US.txt) ?


  👤 PaulHoule Accepted Answer ✓
That is a lot of categories!

If someone was going to train an algorithm to classify items they would need labeled training data for all of the categories. Maybe 1000 labeled items per category.

If you don’t have the labeled training data you WILL fail. If nobody has succeeded at this, it’s because nobody did the work. You can succeed where others failed IF you do the work, if you do not want to do the work a pro tip is let somebody else tilt at windmills and fail.

If you look closer you will find other maddening details. One of those categories is ‘yacht’, there also has to be one for ‘snacks’. The yacht is big, expensive and rare and the snacks are small, cheap and common. You might find a million snacks for sale and maybe 100 yachts! You can attain 99.99% accuracy in yachts vs snacks if the algo thinks it is all snacks, and a person who looks at a sample of 100 might think it works perfectly.

You are up against that times 1000’s.

Don’t despair, instead look at simplifying the problem by limiting the scope; also you might get lucky and find data that is already labeled or find a cheap way to bootstrap labels.