Two examples:
A 400px wide line chart with time on the X axis. At the absolute maximum, you can fit 400 values, likely less. Your dataset could be GB of time-series data but you can bin it into time buckets and return the mean.
An map of the world based on Open Street Map data. If you're showing things at the continent scale, you can filter out 99+% of the map features server-side and return only relevant things (major highways and country borders for example). OSM is a 100GB dataset but maps that use this technique can be fetched and rendered in ms.
Bonus points for aggregating to a fixed grid, which allows you to aggressively cache the results.
If you're committed to handling the raw data on the client side, there's not much you can do other than write optimized web workers in wasm (effectively reinventing the aggregates and indexing provided by a database) and set realistic limits.
1. High memory usage: When loading large datasets, the memory usage can become quite high. (50MB csv will use 700MB memory in RATH)
2. Slow computation tasks: Group-by, filter, bin, or even Cube operations can be slow and sometimes block the main thread.
3. Slow chart rendering: Chart rendering can also be slow and sometimes block the main thread. (Currently using VegaLite)
I have implemented some solutions to address these issues:
1. For high memory usage, I am storing large raw data in indexedDB and reading it as needed. This reduces memory usage but can still consume a lot of memory when the data is loaded into the main thread.
2. To improve the performance of computation tasks, I am using web workers for some data computations (such as group-by, bin, transform, and filters). I am also testing duckDB-wasm, but lack of some knowledge of its best practice.
3. For slow chart rendering, I have tried using offscreen canvas to render the chart in a web worker. However, this approach creates a static canvas without any interactive features (such as tooltips, zooming, or callbacks for data selections). I am looking for methods on how to make the chart rendered in the web worker interactive.
Any suggestions or experiences shared would be greatly appreciated.
RATH Github Repo: https://github.com/Kanaries/Rath
RATH basic background: RATH is beyond an open-source alternative to Data Analysis and Visualization tools such as Tableau. It automates your Exploratory Data Analysis workflow with an Augmented Analytic engine by discovering patterns, insights, causals and presents those insights with powerful auto-generated multi-dimensional data visualization.