HACKER Q&A
📣 aidangrimshaw

What are the best tools for extracting tax data from a W2 form?


I'm working on an open source tax filing web app at https://ustaxes.org/ and https://github.com/thegrims/UsTaxes

Any ideas on best practices for extracting tax data from a W-2 form? I've looked at Microsoft form-recognizer and AWS Textract, but I haven't been able to get good results so far. (caveat I haven't tried either with custom training data)


  👤 tgflynn Accepted Answer ✓
Is it still the case that W-2's are usually only provided in paper form ? If they would just e-mail a (non-scanned) PDF you could extract the data easily without having to deal with OCR.