HACKER Q&A
📣 alfarez

What's a good library/command line tool to extract tables from PDFs?


What's a good library/command line tool to extract tables from PDFs?


  👤 UglyToad Accepted Answer ✓
There's probably newer AI powered tools but Tabula is the main library I know of https://github.com/tabulapdf/tabula-java

👤 andrewio
You can use a PDF parser tool to extract data from PDF tables. I'm building parsio.io - we use pre-trained AI-powered parsers to parse PDF tables: https://parsio.io/table-extraction/. Another example us Tabula (free)

👤 phiv
there is also this option: https://docs.ropensci.org/tabulizer/

👤 phiv
have not tried it, but this has been in my bookmarks a while: https://github.com/camelot-dev/excalibur