A reader pointed out that I paid only passing attention to the ML life cycle in Machine Learning vs. Traditional Software issue.
Iterative software development and DevOps CI/CD infinite loop are now common sense. I prefer the flexibility to adjust according to circumstances instead of an elaborate process structure that must be rigidly followed. That’s why a simple ML project life cycle graph felt sufficient to me:
It is understandable that some might consider it oversimplified or even useless as everything connects to everything else. So I decided to research and learn about processes to develop data products, and share them with you.
Knowledge Discovery in Database (KDD) Process
Extracting insights from data predates Big Data. KDD Process (Knowledge Discovery and Data Mining: Towards a Unifying Framework by Fayyad et. al., 1996) defines a framework for data mining in databases. KDD process has 5 stages:
Interpretation / Evaluation
Modern data pipelines have pretty much the same steps.
CRoss-Industry Standard Process for Data Mining (CRISP-DM)
CRISP-DM connects data mining to business and deployment. It breaks the data mining process into six major phases:
Team Data Science Process Life Cycle
Microsoft’s Team Data Science Process (TDSP) Life Cycle defines four stages:
Data Acquisition and Understanding
It is envisioned as a waterfall model ending with Customer Acceptance, but it doesn’t require much imagination to extend it to be interactive.
Iterative-Incremental MLOps Process has three broad phases:
Designing the ML-powered application
ML Experimentation and Development
There is another popular MLOps loop:
As you can see the most critical part of the loop (ML) is left as an exercise to the reader, and that is not very helpful.
Machine Learning Loop
The Machine Learning Loop is an interesting way of superimposing code loop and data loop. The following illustration is self-explanatory.
I hope this overview helps you in picking and customizing a process to suit your ML project needs.
You may want to take another look at the previous issue “MLOps for Continuous Integration, Delivery, and Training of ML Models” and examine how that maps to these processes.
ML4Devs is a weekly newsletter for software developers with the aim:
To curate and create resources for practitioners to design, develop, deploy, and maintain ML applications at scale to drive measurable positive business impact.