1. Why is there so little "unbiased" info about production deploying/serving ML models? (I mean except the official docs of frameworks like eg. TensorFlow which obviously suggest their mothership's own services/solutions.)
2. Do you hand code microservices around your TF or Pytorch (or sklearn / homebrewed / "shallow" learning) models?
3. Do you use TensorFlow Serving? (If so, is this working fine for you with Pytorch models too?)
4. Is using Go infra like eg. Cortex framework common? (Keep reading about it, love the point and I'd love using static language here but not Java, but talked with noooone who's actually used it.)
5. And going beyond the basics: is there any good established recipe for deploying and scaling models with dynamic re-training (eg. the user app expose something like a "retrain with params X + Y + Z" API action, callable in response to user actions - eg. the user control training too) that does not break horribly with more than tens of users?
P.S. Links to any collections of "established best practices" or "playbooks" would be awesome!
2. Handcoding to begin with is fine but as you start to scale the number of production models and actually productionalize the model at scale, it’s unfeasible and leads to plenty of maintenance issues. There are a few model infrastructure tools that help with this but again, many are homegrown because the market is still new. Algorithmia, Seldon are pretty good starts.
3. Rarely use serving options provided as the challenge is integrating it with the rest of engineering. Service monitoring gets handled by different teams.
4. Depends on the industry and usecase. Again integrating and maintenance comes into play. Go/Cortex might make sense but a lot of companies leverage Spark so Scala/Java could be the choice for production models.
5. We’re working on creating this recipe for enterprises. I believe Seldon (open source) might contain this capability. The challenge as you pointed out is ensuring things don’t break!
All of our core contributors + a good number of users are in there, and we're all happy to chat.
We've been doing consulting for more than six years and we're building a platform precisely to solve the problems we have encountered and you are writing about. We have learned some things that we are encoding in the platform, in case you want to build your own. We have started doing this because we hit a ceiling on the projects we could do, and we were under stress. We're a tiny, tiny team.
The problems are in interfaces between different roles, with each role having a stack with a gazillion tools, and a different "language" they speak and universe they live in. The stitching of people's interaction together, the workflow, the business problems, and the fragmented tooling is problematic. The inflexibility of said tooling and frameworks that you addressed also made us not be able to use them, or other platforms. This is why we are working hard to build a coherent, integrated experience, while still trying to bulid abstractions that allow us to substitute tools and view the tools as simple components, not to be tied.
For now, it allows you to create a notebook from several images with most libraries pre-installed. The infra it's deployed on allows Tesla K80 which you can use. You can of course install additional libraries.
This solves the problem of setting the environment, CUDA, docker engine, runtime versions, and the usual yak shaving. We're only using JupyterHub and JupyterLab for Python notebooks for now, as it is what our colleagues use, but we plan to support more.
It also solves the problem of the "it works on my machine" and running a colleague's notebook.
You can click on a button and publish an AppBook and share it with a domain expert right away to play with. It is automatically parametrized for you so you don't play with widgets, and automatically generates form fields for parameters. The parameters, metrics are tracked behind the scenes without you doing anything, and the models are saved to object storage. Again, one role we target is the ML practitioner who does not necessarily remember to do these things, so we do it for them.
Here's a video from a very early version: https://app.box.com/s/mwsw79g3d5b974o625f1mw979cc4znf0
We're using MLFlow for that, but plan to support GuildAI, and Cortex. We think hard to make things loosely coupled and configurable, so you get to pick the stack and easily integrate the platform with existing stack.
The AppBook is super useful in that you can publish it and then use it to train the model, or share it with a domain expert so they can play with different parameters. One of the problems we've seen was that some features are considered unimportant for an ML practitioner, but are critical to domain experts.
Thightening that feedback loop from notebook to domain expert makes the one click AppBook important because it saves you scheduling meetings and how to "show" the domain expert the work, while allowing them to interact with it.
You can also deploy models you choose with one click and it will give you an endpoint and generate a tutorial on how to hit that endpoint to invoke the model with curl or Python requests. You can generate a token and invoke the model in other places or services.
This self service feature is important because it allows an ML practitioner to "deploy" their own model, without asking a colleague to do so who might be doing other things. Self service is super important through this.
Right now, we're focusing on fixing bugs and improving tests and have added monitoring before going back to feature development. Some features we were working on were a more flexible and scalable model deployment strategies, monitoring, collaboration, retraining, and data streams, and building the SDK.