HACKER Q&A
📣 k4ch0w

What do you use to clean text data for ML/DL?


What do you use to clean text data for ML/DL?


  👤 ktpsns Accepted Answer ✓
I guess you mean preparing CSV files for another format, in order to load it in some ML code?

Vim. It can easily handle large multi gigabyte text files.

For batching: head, tail, awk, grep -- the good old command line gems. They have hardly been beaten in speed.

If you mean "clean" in terms of some standarization (thinking of natural language recognition), I hardly can imagine there is a single tool which covers all use cases...