HACKER Q&A
📣 hoerzu

Big Data 10x tips and tricks


What are your recommendations for scalable storage of append only data. What are your favorite frameworks for memory mapping like VAEX or Polars? What is hot like duckdb?


  👤 noud Accepted Answer ✓
Unpopular tip: you can store your data for the project in a simple csv file 9 out of the 10 times and load all of it in memory (with pandas for example). Don't waste your time on building a scalable data storage when you don't need it.

👤 sjducb
- How much data will you process in your first year (In Terrabytes)?

- How big is the average data unit?

- How are you going to analyse and process this data? (What kinds of questions will you ask it?)


👤 adammarples
Delta lake on parquet files works very well. Bigquery works well. Snowflake works well.