Create a tree datastructure that mimics the file system and hash each file in place in the tree. Then create a new tree structure, where for each parent, you sort the children's hash's and hash them.
Any duplicate hashes should be identical.
I actually wouldn't be surprised if just straight:
find $ROOT_OF_SEARCH -type f | xargs shasum | sort > file_hashes
cat file_hashes | awk '{print $1}' | sort | uniq -c | sort -n > hash_freq
vim -O file_hashes hash_freq
was good enough> subtrees that are older versions of newer trees etc
This is more complicated. How would you concretely define the relationship between older and newer versions?