ML architecture for text pattern detection?

Question

Id like to train an inference model that takes a bunch of structured text as an input (html), and outputs the relevant text. The goal is to build a pipeline where given a website for technical product spec, it outputs the relevant data. Every manufacturers website (about 50 of them) is structured differently, but generally the data is in an html table, sometimes rows, sometimes columns.Anyone have links to papers or something I can read to get started? Or is this even a thing that exists?

ac2u · Accepted Answer

Is buying an option instead of building? You could try out some free credits with AWS Textract to see if it fits the bill (it specifically has table extraction) ?