Machine Learning Life Cycle (ML4Devs, Issue 7)

An overview of MLOps processes.

A reader pointed out that I paid only passing attention to the ML life cycle in Machine Learning vs. Traditional Software issue.

Iterative software development and DevOps CI/CD infinite loop are now common sense. I prefer the flexibility to adjust according to circumstances instead of an elaborate process structure that must be rigidly followed. That’s why a simple ML project life cycle graph felt sufficient to me:

It is understandable that some might consider it oversimplified or even useless as everything connects to everything else. So I decided to research and learn about processes to develop data products, and share them with you.

Knowledge Discovery in Database (KDD) Process

Extracting insights from data predates Big Data. KDD Process (Knowledge Discovery and Data Mining: Towards a Unifying Framework by Fayyad et. al., 1996) defines a framework for data mining in databases. KDD process has 5 stages:

  • Selection

  • Pre-processing

  • Transformation

  • Data Mining

  • Interpretation / Evaluation

Modern data pipelines have pretty much the same steps.

CRoss-Industry Standard Process for Data Mining (CRISP-DM)

CRISP-DM connects data mining to business and deployment. It breaks the data mining process into six major phases:

  • Business Understanding

  • Data Understanding

  • Data Preparation

  • Modeling

  • Evaluation

  • Deployment

Team Data Science Process Life Cycle

Microsoft’s Team Data Science Process (TDSP) Life Cycle defines four stages:

  • Business Understanding

  • Data Acquisition and Understanding

  • Modeling

  • Deployment

It is envisioned as a waterfall model ending with Customer Acceptance, but it doesn’t require much imagination to extend it to be interactive.

MLOps Loop

Iterative-Incremental MLOps Process has three broad phases:

  • Designing the ML-powered application

  • ML Experimentation and Development

  • ML Operations

There is another popular MLOps loop:

As you can see the most critical part of the loop (ML) is left as an exercise to the reader, and that is not very helpful.

Machine Learning Loop

The Machine Learning Loop is an interesting way of superimposing code loop and data loop. The following illustration is self-explanatory.

Conclusion

I hope this overview helps you in picking and customizing a process to suit your ML project needs.

You may want to take another look at the previous issue “MLOps for Continuous Integration, Delivery, and Training of ML Models” and examine how that maps to these processes.


ML4Devs is a weekly newsletter for software developers with the aim:

To curate and create resources for practitioners to design, develop, deploy, and maintain ML applications at scale to drive measurable positive business impact.

Each issue discusses a topic from a developer’s viewpoint. Please connect on Twitter or Linkedin, and send your feedback, experiences, and suggestions.