MLOps for Continuous Integration, Delivery, and Training of ML Models (ML4Devs, Issue 6)
Integrate Early and Iterate Often for Successful ML in Production
Continuing on the theme of “Integrate Early and Iterate Often” from the previous issue, the obvious question is how to do it well. We touched upon ML Pipeline briefly. In this issue, let’s examine how the best in the business do it.
Google has been running ML models on a large scale probably for the longest, and they have published their best practices in automating machine learning pipelines. They confirm that only a small fraction of a real-world ML system is composed of the ML code. You probably would have seen this diagram:
The automated pipeline needs to be built for:
Continuous Integration: Tests for not just code but also for validating data, data schemas, and models.
Continuous Delivery: Deploy not just one (ML prediction) service, but an ML training pipeline that should automatically deploy ML prediction service when desired.
Continuous Training: New and unique to ML for automatically retraining and serving ML models.
That article also defines MLOps maturity levels:
Level 0: Manual process: Train, and deploy ML models manually
Level 1: ML pipeline automation: Automated pipeline for continuous training, and continuous delivery of the ML prediction service, however, the ML pipeline is deployed manually.
Level 2: CI/CD pipeline automation: Automated ML pipeline deployment.
Just a conceptual pipeline does not suffice. We need tools to implement the pipeline. There has been an explosion of MLOps tools. I am listing here only a few prominent alternatives:
Tools are evolving very rapidly. TFX is more comprehensive and complex. MLFlow and MetaFlow are quite mature and not as complex as TFX. It is also common to combine multiple tools with Airflow and Kubeflow.
My apologies for so many links in this issue. I hope that it will be useful in the future, and you may return to it at a later date.
It is normal to feel overwhelmed in the maze of tools. Just as you can start with descriptive analytics instead of the most complex ML, you do not need to start at Level 2 of ML maturity. Starting at Level 0 and slowly progressing to higher levels is a good plan.
Being aware of the ML maturity levels will help you in crafting your path. So, you can skip all the links above, but this one article I highly recommend “MLOps: Continuous delivery and automation pipelines in machine learning.” It is quite easy to understand and gives a broad overview of MLOps pipelines. There is an interesting video series discussing this article in MLOps Community.
ML4Devs is a weekly newsletter for software developers with the aim:
To curate and create resources for practitioners to design, develop, deploy, and maintain ML applications at scale to drive measurable positive business impact.