The primary function of this database would be to act as a sort of "global cache." When a member computer is about to perform a computation, it would first check this database. If the computation has already been done, the computer would simply fetch the pre-computed result instead of redoing the computation. The underlying goal is to save on compute resources globally.
N.B. this does not mean we precompute anything necessarily, but we do store everything we have computed thus far. The hit rate on the cache might be very low for a while, but one would think it'd eventually go up. The way we're going about this (throwing more GPUs at it) just seems awfully wasteful to me.
Has anyone thought about/done any research on this?
This idea glosses over the engineering complexity of “searching” the cache, which sounds like it will grow to include every possible computation ever.
The reason it’s not feasible is the same reason computers can’t just have an huge L1 cache, instead of a hard drive. There’s physical limitations of materials when retrieving and searching the cache. So just performing the computation is often quicker.
However… your suggestion would be suitable for functional programming. Pure functions should always return the same result, so caching the result of CPU intensive functions makes a lot of sense… which is what [1] bazel’s remote cache does. But most software does not use pure functions…
Also, there is another interesting question that comes to mind. What if “quantum computing” could allow us to do “branch prediction” of computation at an incredible scale?
One immediate problem is that you have to map the complexity space of an infinite number of inputs and outputs to even meaningfully store a signature of the computation. The complexity of the input is probably larger than the output, in most cases. That makes "searching" for a solution almost pointless out of the gate.