The main issue I am running into is how do I synchronize my code across all machines on my grid? I want to be able to write some code on one of my machines and have that code automatically be available on all grid machines. I have looked at using a distributed file system (NFS), but the latency is too high to store app code on it.
One workaround I was considering was having code on local fs on the main machine and an auto-rsync any changes to the NFS, and have all grid machines read from NFS (the lag is maybe tolerable when running grid jobs). However, this feels brittle and hacky.
Another solution would be to rsync code on local fs across all machines. This too feels somewhat brittle and hacky, especially in the context of new boxes being added to the grid as compute demand increases.
What ways are there to easily make the code available for the entire grid?
For context, I'm building this on AWS and most of the projects are structured as python code with an individual virtualenv attached.
Push your local code into git repo hosted in AWS. Have a git hook setup which will trigger your workflow. Then any worker should clone a specific version of the code:
git clone -–depth 1 --branch repo-url
Pushed git commits are atomic so that saves you from inconsistent code state of plan filesystem syncing.
Is NFS unacceptable for the build phase or for runtime phase?