HACKER Q&A
📣 rco8786

How to host an AI model that processes PDFs into structured data?


I'm hoping there is a guide or tutorial out there for someone like me.

I'm an experienced software engineer, mostly backend. I have a general understanding of how AI works, and have used all the various tools (ChatGPT, Copilot, etc).

Where I'm totally ignorant is in how to self host my own models. In particular, I am looking to self host a model that can read PDFs and parse out structured data.

Any good starting points?


  👤 jonahbenton Accepted Answer ✓
Jan is a good starting point. Desktop app that will guide you through downloading models you can run locally. Then

https://jan.ai/docs/tools/retrieval

A similar tool, local desktop app, is LMStudio. Same deal.

https://lmstudio.ai/docs/basics/rag



👤 thundergolfer
Modal.com can support this. A couple existing PDF extraction startups use it. You could have a custom endpoint up in minutes. It’s serverless so will be low cost for your level of traffic.

👤 BOOSTERHIDROGEN
Docling to parse pdf into markdown.