How do I do this properly so that it is performant and scales without breaking? Are there any gotchas?
There are so many ways to import -- which is the fastest?
Is this a definite case for partitioning?
Should I create adaptive radix tree indices? The docs say, "ART indexes must currently be able to fit in-memory. Avoid creating ART indexes if the index does not fit in memory."
What else am I missing here? Can DuckDB even handle databases of this size?
Any guidance would be greatly appreciated!
What does your 'serious analytics' entail? Using the full text search extension's macros (stemming, using match_bm25, ...), running regexes, computing aggregates? Are you doing highly-selective lookups based on some columns that you'd like to index on? What would be your partitioning key?
> There are so many ways to import -- which is the fastest?
Loading from Parquet is great if you have Parquet file... but for use case, CSV import is the best bet. It is also very fast (>1GB/s on uncompressed CSVs) and works fine if the CSVs are reasonably well-formatted.
(co-founder and head of produck, feel free to reach out)