ML and Monorepos

Question

Hello HN. Is there anyone here who uses monorepos for their ML projects, at work or otherwise? What's your opinion on the matter?What tools and patterns do you use? What challenges did you encounter? What seemed a good idea in the beginning, but turned out a mess?For some context, for every ML project, I keep my ML training, pre-processing, experiments, and serving code together, making it a kind of monorepo (very rudimentary). But when it comes to testing and building parts of the project, it's kind of a mess. I tried using some git log magic to keep it a bit cleaner, when it comes to running tests, builds, and re-trainings, but I wonder how others are doing it.

bfeynman · Accepted Answer

If you're deploying XGboost and small classifiers, or even now on larger models but just using high level interface like just hugging face I would say monorepo makes sense.With considerably large custom models with experimentation and R&D, dev and prod are way more separate, foremost because there are many disjoint dependencies and there is no need to clog up builds with all of it.