HACKER Q&A
📣 avgcorrection

Is “Distributed CI” Possible?


Is it possible to have distributed CI in the following sense:

- The “CI” is a program that checks if the code is okay (like runs a test suite)

- Anyone can run it and get the result

- A central server/node can receive the output and SHA1 (that the code was run on) and quickly check if the CI passed

- It takes some effort to cheat the procedure—it doesn't have to be bullet proof (it's meant to be used “among friends”)

- No proof-of-work or something else that burns up electricity for questionable gain

It seems that a simple plaintext script CI is not sufficient since whatever node has run the script can just change the script: `return true`. But if you can either encrypt the program or obfuscate it enough (for a decompiler) then you might be able to implement something like this:

    
    if they pass: calculate the output of the CI based on the stable
    success-output of the tests + a secret + the SHA1
That means that the CI program that is distributed to the nodes needs to literally be an undecipharable blob program that in principle can do anything to the node computer—another reason why this can only work “among friends”.


  👤 cobbal Accepted Answer ✓
Not practically that I know of. The definition of "not bulletproof" is very vague; why can't the friends just agree not to "return true"?

There are some theoretic foundations in this line of work: https://en.wikipedia.org/wiki/Verifiable_computing


👤 slipperlobster
Is this just reinventing the concept of Jenkins distributed executors?

👤 Nezteb
[delayed]

👤 er0k
I think nix's hydra and binary caching would accomplish this

https://nixos.wiki/wiki/Hydra

https://nixos.wiki/wiki/Binary_Cache


👤 jpollock
If it's “among friends”, isn't "change the script" out of scope?

You either trust the client to do the requested work, or you don't. If you can't trust the client to use the appropriate CI script version, then you can't trust anything the client responds with.

An example of this is how hacked DVD/BluRay drives work, where they fib to the driver about what byte is in which memory address.

You can issue the work to multiple clients, and treat it like a node failure. If you have a quorum of executions then you can trust that result. However, that's similar to the disallowed proof-of-work requirement.

No matter what, the client has to trust that the server isn't out to get them.


👤 MrTortoise
i dev on main in teams we build in environments that are very prod like we have tests that verify code that has been deployed - no matter the location on startup we build test and deploy from local machines (and test because startup)_- this is very reliable or else our tests woudl regularly fail because of this we push to main after our deploy has worked

ofc this only works on smallish teams that push / pull very regularly and have good communication around who is deploying when.

it used to be a shout over a desk partition.

we go faster than you and break far far less than you


👤 lijok
Which hedge fund put you up to this?

👤 cookiengineer
Do you know about mainframer? [1] It's a tool made for remote builds via ssh/scp. I am using it a lot for my AI training projects because I can continue to work on the code locally without my CPU and GPU going up in smoke.

[1] https://github.com/buildfoundation/mainframer


👤 c0balt
Yhis can probably be achieved with distribited builds on NixOS with a mster that had all other machines configured as build servers.

In this setup the nix daemon on the master can delegate build tasks to other build machines, let's call them agents. These agents have a feature set, arch set and a speed factor to steer task distribution. The feature sets define if, e.g., a machine can compile builds in a sandbox VM. The arch set is for cross compilation support.

You could schedule code checks as nix checks, e.g., linting and validarion could be done by exporting the output as a derivation.


👤 wholesomepotato
Sure it could be done but it's going to be slower and more unreliable than a centralized solution, and you'll need run it redundantly so you can be sure no one is malicious.

There's also problem of incentive. Why would anyone want to run your builds? So you can run theirs? In that case why you won't run yours and their theirs and save the overhead?


👤 pshirshov
If your "script" is Turing-complete and the remote nodes can't be trusted, there is no solution due to the halting problem. You can't verify correctness of an arbitrary function output w/o evaluating it step by step.

Also any obfuscation can be removed and any compiled code can be decompiled.


👤 traverseda
Sure, make sure your build is reproducible then use something like act (open source github actions runner) to build the executable, submit the hash when you're done.

Basically what you're describing is reproducible builds. The crypto stuff won't work because homomorphic encryption doesn't work, and neither do trusted compute platforms.


👤 drewcoo
If it's "among friends," then there is no reason to talk about trust.

If it's for work, look into cryptographic signing (probably PKI) and as long as the signing keys are only issued to trusted agents, you can treat any result (plus Git hash) with a valid signature as a valid result.


👤 jasonjayr
This is like perl's CPANTS system: The modules you upload to CPAN had their unit tests run on a variety of systems and configurations, operated by volunteers, which would report back to the module author any failures and trace output.

In retrospect, it's remarkable that it's been running as long as it has with little major incident.


👤 verdverm
You should check out projects like sigstore.dev, attestation, and in general "software supply chain security"

This is already possibly and being done by some, you of course need to trust the node, because that could swap out tool on you, but there are various ways to push the attestation further down the stack, into the hardware and CPU.


👤 RecycledEle
CI = Continuous Integration

👤 more_corn
Sounds like yet another attempt to shoehorn your chosen solution (blockchain?) into a problem better solved without it.

Compare to this actual working solution: I run gitlab runners on spot instances (or at home for personal projects) The runners check in with the server, get a job, process it and send the results. A CI job can have multiple parts. A server can be handling multiple client requests simultaneously.

CI is a resource intensive process so you want beefy machines but they shouldn’t run when not needed (at night if you’re in one time zone, when unloaded if you’re distributed). Enter autoscailing groups and spot instances. 4x the speed for 10% of the cost of always on and “on demand” priced instances. Or for personal projects, speedy when needed idle when not.

Compare this solution to what you’re working on. You’ll see you’re doing a lot of extra work for no additional value.


👤 basicallybones
I believe so, but I do not think it is worth the effort and complexity for a lean team. I just recently half-built something like this that used Docker build layers to run lint/test/build/etc. Each build layer was tagged by some sort of manifest hash to guarantee that, if that particular layer was retrieved from cache, the inputs (the codebase files) were identical. The idea was to use fast developer machines to do the work and basically just check the hashes in CI (or skip the separate CI environment entirely).

It turned out all right, but I dropped it for a few reasons. Overall, it was not worth the effort and complexity. First, making it fast and safe was a PITA compared to just...running a few commands in a normal CI environment. Second, for manifest-style hashes, you have to either exclude the .git folder (which means no diffing/affected commands) or try to clean it up somehow (which obviously can be done, but if you do that in a Docker layer it screws up your caching, etc.). Third, it generated an ungodly amount of local cached data. There are a few other reasons I dropped it, but they're generally related to safety/optimization/caching/git.