HACKER Q&A
📣 boredemployee

Cheapest and simplest way to set up a data analytics stack?


I am developing a small project that needs to pull data from various sources, but the size probably won't exceed 20 gigabytes in 12 months.

My question is: should I go for a cloud service or could I avoid using GCP/AWS/Azure, etc., and set up a virtual machine with open-source software only (which software do you recommend)?

Tutorials, blogs, etc in the matter are welcome!


  👤 RobinL Accepted Answer ✓
For straightforward analytics queries, 20gb is well within what is possible to process on a single well specced machine. I would recommend storing the data on disk in parquet format and taking a look at duckdb or polars for processing it.

Obs this advice depends a bit on exactly the type of data, but as generic advice it's probably the simplest place to start


👤 zX41ZdbW
The easiest and best way is to use ClickHouse.

It installs as a single binary, reads every data format, interacts with Postgres, MySQL, and even SQL Server, and has the most efficient and versatile SQL engine.

At this moment in time, nothing beats ClickHouse.


👤 XCSme
Does the project "pull" data, or you need to send data somewhere?