I have a bunch of text files and HTML pages that I'd like to dump into something and then be able to search over it, maybe even be able to find relationships (common terms, phrases, etc) between the various docs. I've heard of things like hadoop, but that seems to be overkill for the amount of data I have. I'd also like to keep things as low-cost as possible as this is just for personal use. I've looked at a few of the cloud providers but am honestly not sure what I'm looking for, so I find myself walking away more confused than when I started.
This seems like an easy problem, but for whatever reason I'm getting wrapped around the axle on it.
But Elasticsearch running on a cloud VM with an attached EBS volume would be a fast way to get work done.
I’d start with solr or elasticsearch and a simple indexing script (home rolled python script).
Then you can use solr admin or something like Jupyter for iterative querying.
I’m not an expert on index tuning, but you might even be able to dump it all into postgres with json types.
Best of luck!