HACKER Q&A
📣 neverrroot

How to encrypt cloud-stored Git repos that contain sensitive data?


Given git repositories that contain sensitive data, how would you store them in the cloud, so that this is secured (e.g encrypted)?

By secure I mean: the cloud provider (or an attacker who could gain access to the repositories through the cloud provider) can’t see the names and contents of the files. Commit messages, branches and author names could be visible, but would be better if they can also be stored securely. I understand that a web interface is unusable, nor is it needed in our case.

Are there some other source code management systems (with cloud hosting offers) that have better support for storing such sensitive data security, and can do it in an easier / less cumbersome manner than git does?

I understand that this is just one piece in the security chain, would appreciate if we could keep this on-topic instead of making a broader security discussion out of it. Thank you!


  👤 imuli Accepted Answer ✓
Currently vaporware, but I am currently working on the underpinnings for this.

The basic idea is something of a mixture of git, Tahoe-LAFS, and eventually-consistent database replication (but that's a bit out of scope of your question here).

- Chunks of data are convergently encrypted (using a hash or MAC to derive the key) and indexed by the hash of their cihpertext.

- Files, trees, and commits are just specially formatted chunks of data that hold encryption keys and hashes for other chunks of data. Very similar to git.

- Branches equivalents are slightly different, each has a unique signing key. Branch objects are indexed both by the verifying key and a hash of their ciphertext. They hold a hash for a commit chunk and a list of hashes of the parent branch objects.

(The list of parent branch objects is to provide for a much more distributed sense of a branch than git has, and they are first class objects in such a system rather than just being locally held pointers. Partially this is due to version control being only one of the target uses for this system, but I also like the distributed branch model a little better.)

Anyway, this is probably more theoretical than you're looking for anyway :)