6 sustainability measures of MLops and how to tackle them

We’re excited to bring Transform 2022 back in person on July 19 and pretty much July 20-28. Join AI and data leaders for insightful conversations and exciting networking opportunities. Register today!


The adoption of artificial intelligence (AI) continues to grow. According to a McKinsey survey, 56% of companies are now using AI in at least one function, up from 50% in 2020. A PwC survey found that the pandemic has accelerated AI adoption, with 86% of companies saying that AI is becoming a mainstream technology in their company.

In recent years, significant advances in open-source AI, such as the groundbreaking TensorFlow framework, have opened up AI to a wider audience and made the technology more accessible. The relatively frictionless use of the new technology has led to a greatly accelerated adoption and an explosion of new applications. Tesla Autopilot, Amazon Alexa, and other well-known use cases have sparked our imaginations as well as sparked controversy, but AI finds applications in almost every aspect of our world.

The parts that make up the AI ​​puzzle

Historically, machine learning (ML) – the path to AI – has been reserved for academics and specialists with the necessary mathematical skills to develop complex algorithms and models. Today, the data scientists working on these projects require both the necessary knowledge and the right tools to effectively produce their machine learning models for consumption at scale – which can often be a hugely complicated task with advanced infrastructure and multiple steps in ML. workflow .

Another key component is Model Lifecycle Management (MLM), which manages the complex AI pipeline and helps to ensure results. However, the proprietary MLM systems of the past were expensive, yet often lagging far behind the latest AI technology advancements.

Effectively filling that gap in operational capacity is critical to the long-term success of AI programs, because training models that provide good predictions are only a small part of the overall challenge. Building ML systems that add value to an organization is more than that. Rather than the typical send and forget pattern of traditional software, an effective strategy requires regular iteration cycles with continuous monitoring, care and improvement.

Enter MLops (machine learning operations), which allows data scientists, engineering, and IT operations teams to collaborate to take ML models into production, manage them at scale, and continuously monitor their performance.

The main challenges for AI in production

MLops typically focuses on addressing six key challenges of deploying AI applications to production. These are: repeatability, availability, maintainability, quality, scalability and consistency.

Furthermore, MLops can help simplify AI consumption so that applications can leverage machine learning models for inference (i.e., to make predictions based on data) in a scalable, maintainable way. After all, this capability is the primary value that AI initiatives should deliver. To dive deeper:

repeatability is the process that ensures that the ML model is successfully executed in a repeatable manner.

Availability means that the ML model is deployed in such a way that it is sufficiently available to provide inference services to consuming applications and to provide an appropriate level of service.

maintainability refers to the processes that ensure that the ML model remains maintainable in the long term; for example when retraining of the model becomes necessary.

Quality: The ML model is continuously monitored to ensure it delivers acceptable quality predictions.

scalability means the scalability of inference services as well as the people and processes needed to retrain the ML model as needed.

coherence: A consistent approach to ML is essential to ensure success in the other measures mentioned above.

We can see MLops as a natural extension of agile devops applied to AI and ML. Typically, MLops covers the most important aspects of the machine learning lifecycle: data pre-processing (capturing, analyzing and preparing data – making sure the data is appropriately tailored to the model to be trained on), model development, model training and validation, and finally, bets.

The following six proven MLops techniques can measurably improve the effectiveness of AI initiatives, in terms of time-to-market, results and long-term sustainability.

1. ML Pipelines

ML pipelines typically consist of multiple steps, often orchestrated in a focused acyclic graph (DAG) that coordinates the flow of training data, as well as the generation and delivery of trained ML models.

The steps within an ML pipeline can be complex. For example, a data retrieval step on its own might require multiple subtasks to collect datasets, perform checks, and perform transformations. For example – data may need to be extracted from different source systems – perhaps data marts in a corporate data warehouse, web scraping, geospatial stores and APIs. The extracted data may then have to undergo quality and integrity checks using sampling techniques and may need to be modified in various ways – such as omitting data points that are not needed, aggregations such as summarizing or windowing other data points, and so on.

Converting the data into a format that can be used to train the machine learning ML model — a process called feature engineering — can benefit from additional tuning steps.

Training and test models often require a grid search to find optimal hyperparameters, running multiple experiments in parallel until the best set of hyperparameters is identified.

Storing models requires an effective approach to versioning and a way to capture associated metadata and metrics about the model.

MLops platforms such as Kubeflow, an open-source machine learning toolkit running on Kubernetes, translate the complex steps that make up a data science workflow into tasks performed in Docker containers on Kubernetes, providing a cloud-native, yet cross-platform, interface for the component steps of ML pipelines.

2.Inference Services

After the appropriate trained and validated model has been selected, the model must be deployed in a production environment where live data is available to make predictions.

And there’s good news: the model-as-a-service architecture has made this aspect of ML significantly easier. This approach separates the application from the model via an API, further simplifying processes such as model versioning, redeployment, and reuse.

There are a number of open source technologies available that can envelop an ML model and expose inference APIs; for example KServe and Seldon Core, open source platforms for deploying ML models on Kubernetes.

3.Continuous Bet

It is critical to be able to retrain and re-deploy ML models in an automated manner when a significant model deviation is detected.

Within the cloud native world, KNative provides a powerful open source platform for building serverless applications and can be used to activate MLops pipelines running on Kubeflow or any other open source task scheduler such as Apache Airflow.

4. Blue-Green Deployments

With solutions like Seldon Core, it can be useful to create an ML implementation with two predictors – for example, allocate 90% of the traffic to the existing (“champion”) predictor and 10% to the new (“challenger”) predictor . The MLops team can then (ideally automatically) observe the quality of the predictions. Once proven, the implementation can be updated to move all traffic to the new predictor. Conversely, if the new predictor is found to perform worse than the existing predictor, 100% of the traffic can be reverted to the old predictor.

5. Automatic drift detection:

As production data changes over time, model performance may deviate from baseline due to significant variations in the new data versus the data used in training and validating the model. This can significantly damage the prediction quality.

Drift detectors such as Seldon Alibi Detect can be used to automatically assess a model’s performance over time and trigger a model retraining and automatic redeployment process.

6. Feature Shops

estos are databases optimized for ML. Feature stores allow data scientists and data engineers to reuse datasets and collaborate on datasets prepared for machine learning, called “functions”. Job preparation can be a lot of work, and sharing access to prepared feature datasets across data science teams can significantly speed up time to market while improving the overall quality and consistency of the machine learning model. FEAST is one such open-source feature store that describes itself as “the fastest way to operationalize analytic data for model training and online inference.”

By embracing the MLops paradigm for their data lab and approaching AI with the six sustainability measures in mind – repeatability, availability, maintainability, quality, scalability and consistency – organizations and departments can measurably improve data team productivity, AI project in the long term success and continue to effectively maintain their competitive advantage.

Rob Gibbon is Product Manager for Data Platform and MLops at Canonical – the publishers of Ubuntu

DataDecision makers

Welcome to the VentureBeat Community!

DataDecisionMakers is where experts, including the technical people who do data work, can share data-related insights and innovation.

If you want to read about the very latest ideas and up-to-date information, best practices and the future of data and data technology, join us at DataDecisionMakers.

You might even consider contributing an article yourself!

Read more from DataDecisionMakers