HACKER Q&A
📣 robertbolton

Parsing JSON data in data warehouses


We are still in the early phase of choosing our cloud data warehouse vendor. Currently we have lots of json files in S3, that we would like to load and flatten. From a feature set and product maturity, Snowflake makes the most sense for us.

But then I saw there are other more niche vendors, like Firebolt - https://www.firebolt.io/resources/semi-structured-analytics or Starburst (Presto) - https://starburstdata.github.io/videos/2020-07-29-advanced-presto-sql.html that support a wider array of functions to parse json data. More precisely, partially flattening, and the use of lambda expressions for more advanced array manipulation, rather than only flattening and aggregating.

Have you had any experience with any of the above vendors (Firebolt or Starburst) and is it worth checking them up or based on your experience the out of the box functions that vendors like Snowflake - https://docs.snowflake.com/en/sql-reference/functions-semistructured.html or Bigquery - https://cloud.google.com/bigquery/docs/reference/standard-sql/array_functions offer should be more than enough? Have you run in any limitations with them? I see Snowflake has a 16mb limit on a variant column.


  👤 BenSherman121 Accepted Answer ✓
Given the vendors mentioned above, you may find this article useful about in-memory data warehouses/query engines and their underlying technology. Good luck!

https://underthejargon.com/the-fastest-sql-databases-on-the-...