HACKER Q&A
📣 Carlphilippe

RAG – Do you need PDFs as Markdown with integrated tables and images?


RAG – Do you need PDFs as Markdown with integrated tables and images?


  👤 Carlphilippe Accepted Answer ✓
Hi, I'm Carl. I created a PDF parsing solution that I’m now considering turning into a SaaS.

Existing tools didn’t meet my needs for: - Structured data (title, paragraph, checklist) - JSON and Markdown with integrated tables and images - Tables and images as separated files - Reasonable costs (many providers charge $1K-$25K upfront or high cost per-page) - Reliable response with all PDFs

My goal is to make data ready for LLMs to display relevant images and tables with text easily (e.g., text 1 + image 1, text 2 + image 2).

Integrating image links in markdown with hidden titles and descriptions improves LLM responses: - Images follow the relevant text (great for tutorials and more) - LLMs include relevant images more easily compared to separated images.

If you also need to turn your documents into JSON or Markdown with images and tables for RAG, you can subscribe here: https://mailchi.mp/5ae256ad323d/pdf-converter