- The “CI” is a program that checks if the code is okay (like runs a test suite)
- Anyone can run it and get the result
- A central server/node can receive the output and SHA1 (that the code was run on) and quickly check if the CI passed
- It takes some effort to cheat the procedure—it doesn't have to be bullet proof (it's meant to be used “among friends”)
- No proof-of-work or something else that burns up electricity for questionable gain
It seems that a simple plaintext script CI is not sufficient since whatever node has run the script can just change the script: `return true`. But if you can either encrypt the program or obfuscate it enough (for a decompiler) then you might be able to implement something like this:
if they pass: calculate the output of the CI based on the stable
success-output of the tests + a secret + the SHA1
That means that the CI program that is distributed to the nodes needs to
literally be an undecipharable blob program that in principle can do
anything to the node computer—another reason why this can only work “among
friends”.
There are some theoretic foundations in this line of work: https://en.wikipedia.org/wiki/Verifiable_computing
You either trust the client to do the requested work, or you don't. If you can't trust the client to use the appropriate CI script version, then you can't trust anything the client responds with.
An example of this is how hacked DVD/BluRay drives work, where they fib to the driver about what byte is in which memory address.
You can issue the work to multiple clients, and treat it like a node failure. If you have a quorum of executions then you can trust that result. However, that's similar to the disallowed proof-of-work requirement.
No matter what, the client has to trust that the server isn't out to get them.
ofc this only works on smallish teams that push / pull very regularly and have good communication around who is deploying when.
it used to be a shout over a desk partition.
we go faster than you and break far far less than you
In this setup the nix daemon on the master can delegate build tasks to other build machines, let's call them agents. These agents have a feature set, arch set and a speed factor to steer task distribution. The feature sets define if, e.g., a machine can compile builds in a sandbox VM. The arch set is for cross compilation support.
You could schedule code checks as nix checks, e.g., linting and validarion could be done by exporting the output as a derivation.
There's also problem of incentive. Why would anyone want to run your builds? So you can run theirs? In that case why you won't run yours and their theirs and save the overhead?
Also any obfuscation can be removed and any compiled code can be decompiled.
Basically what you're describing is reproducible builds. The crypto stuff won't work because homomorphic encryption doesn't work, and neither do trusted compute platforms.
If it's for work, look into cryptographic signing (probably PKI) and as long as the signing keys are only issued to trusted agents, you can treat any result (plus Git hash) with a valid signature as a valid result.
In retrospect, it's remarkable that it's been running as long as it has with little major incident.
This is already possibly and being done by some, you of course need to trust the node, because that could swap out tool on you, but there are various ways to push the attestation further down the stack, into the hardware and CPU.
Compare to this actual working solution: I run gitlab runners on spot instances (or at home for personal projects) The runners check in with the server, get a job, process it and send the results. A CI job can have multiple parts. A server can be handling multiple client requests simultaneously.
CI is a resource intensive process so you want beefy machines but they shouldn’t run when not needed (at night if you’re in one time zone, when unloaded if you’re distributed). Enter autoscailing groups and spot instances. 4x the speed for 10% of the cost of always on and “on demand” priced instances. Or for personal projects, speedy when needed idle when not.
Compare this solution to what you’re working on. You’ll see you’re doing a lot of extra work for no additional value.
It turned out all right, but I dropped it for a few reasons. Overall, it was not worth the effort and complexity. First, making it fast and safe was a PITA compared to just...running a few commands in a normal CI environment. Second, for manifest-style hashes, you have to either exclude the .git folder (which means no diffing/affected commands) or try to clean it up somehow (which obviously can be done, but if you do that in a Docker layer it screws up your caching, etc.). Third, it generated an ungodly amount of local cached data. There are a few other reasons I dropped it, but they're generally related to safety/optimization/caching/git.