Is anyone using AI to help with parsing data?
I'm interesting in crawling a variety of websites and extracting data from them. I'm curious to see if AI could be useful in the extraction process.
Some preliminary exploration shows that ChatGPT is pretty good at it. I'm not keen on paying lots of money for ChatGPT or being reliant on a third party. Has anyone tried doing parsing with a local LLM?
I've used it to come up with Beautiful Soup query selectors a couple times with pasted in HTML, but it's pretty bad at generalized solutions (ie, it usually just uses a cell ID).
I've had luck extracting/parsing documentation with chatGPT and the Link Reader plugin.
You feeding ChatGPT the raw HTML?