I have 150gb of mail that I will try to work with from an NLP perspective.
I wish to do some different classification with the emails.
But the first fun thing I ran into was how I need to classify the signature in an email.
I ended up manually picking around 500 email signatures and trained a model for recognition. My model performs horrible.
How would you do that ?