I have about 100GB of Log4j formatted logs to process to find a particular needle in the haystack, and am looking for a decent way to process those files locally without breaking out Spark in EMR etc.
I recall a few blog posts on this subject, but my search fu is letting me down. Is this ringing bells for anyone?
Thanks in advance :)
However, if you're just after a specific needle... why not grep? With 100GB on a local machine you'll either need to do pre-processing / filtering, or you'll be IO-limited. So instead of heavy tools, why not start with basics?