One approach is to convert them to json and serialize it and save it as a string or binary file. But then if you are dealing with TB of data, or just large amount of data, now your program needs to load the entire thing in ram/memory first.
So Second approach I think could be like regular databases, on-disk. Are there on-disk data structures? Can you make any data structure to exist on disk.
Suppose I have a huge decision tree, terabytes of data. How can I use it without loading the entire thing in memory.
Terabytes of data for a decision tree ? I've heard of deep learning models with billions of parameters but I find it surprising that you would need anywhere near that much storage for a decision tree.
EDIT: Moreover if you do need that much storage, you are almost certainly going to need multiple nodes to process that data reasonably quickly so you would almost certainly need a way to partition the dataset across nodes rather than having everything stored in a single file. The company probably most known for developing that kind of architecture is Google and I think there is a fair amount of information available online on their architecture. That might be a good place to look to get you started.
EDIT2: And how would you create such a large decision tree in the first place ? Are you sure that's the best type of model for your problem ? It seems to me it would likely massively overfit just about any dataset you trained it on.
Now that I think of it, I did use ObjectStore[0] at one time. It did what it claimed to do, but performance wasn't that great.
Nothing wrong with having a multi-GB HashMap or something that you have to load into RAM in startup. It just depends what you're doing. It's certainly cheaper than recomputing that HashMap.