How do you like to version datasets for production ML systems?

Question

At my previous company (industrial ai vision platform) we had an ad-hoc mechanism for continuously versioning datasets for user applications that we built internally several years ago.I am curious how folks are versioning datasets now in practice. Platforms like Hugging Face and Weights and Biases seem to provide good abstractions for dataset versioning. Any feedback on the suitability of these for production systems, or other systems, patterns, or best practices that you have found to work well?

kristenkehrer · Accepted Answer

I use CometML (because I work there), but creating data artifacts and versioning my data have made life so much easier.. especially if I put a project down for a bit and try to pick it back up.