Monitoring has a long history in IT, with multiple companies and open source projects delivering robust tools that keep the pulse of your IT infrastructure so your systems stay running strong. 

But how do you monitor AI/ML models in production? 

You might think it’s easy to just plug those models into the same tools you use to monitor web applications but this one-size-fits-all approach doesn’t work for AI.  There are some serious differences when it comes to keeping your models and data healthy in MLOps systems.

To help you understand those differences we’re running the MLOps Day 2: Monitor, Observe Explain event, sponsored by NVIDIA, at the AI Infrastructure Alliance (AIIA), a half day summit of terrific talks with 14 speakers from some of the top startups in AI/ML monitoring, security, observability, and explainability.

To give you a preview of the event, let’s talk about some of the major differences between traditional IT monitoring and AI/ML monitoring.

The terms monitoring and observability are often used interchangeably by marketing teams, which creates a lot of needless confusion.  While there is a lot of overlap, there are some major differences too.  Sometimes explainability is used interchangeably here as well but it should be considered a completely separate subset in a space that stands on its own.

At the AIIA we use the term AI Supervision to describe all the ways to watch MLOps systems from end-to-end. 

But what makes these systems so different? To start with, you often need a completely different backend engine powering AI/ML Supervision. Most traditional IT monitoring tools are geared to tracking the current state of a system.  Generally we only care if the web server is up or down right now and we don’t much care about the history of that web server.  There are a few exceptions.  We might take a look at the history if the server keeps bouncing.  We’d want to trace whether a night time code push or a backup was causing it to go down but in general we don’t care much about history in web apps.

The exact opposite is true when it comes to models.  When we’re tracking the accuracy of inference predictions we need that entire history.  We want to know all the decisions it made so we can plot their accuracy or declining accuracy over time.

AI Supervision also focuses not just on the models but on the entire pipeline, including data and data quality. Data is secondary in a traditional web application.  If a coder is writing a login script, the coder writes the logic and the system may call out to the data only once, to get the username and password hash. But in AI/ML systems data is primary. The system learns its own logic through training and that makes data not just important but essential in AI/ML.  AI supervision goes beyond just production monitoring to check the data as it flows into the system, taking corrective actions and sending alerts before a model ever starts to train.

There are three kinds of supervision of AI/ML models and data:

  1. Monitoring
  2. Observability
  3. Explainability

Typically monitoring tries to answer the questions of what and when.  Is a web server up or down?  When did it come up and when did it go down?  Monitoring capabilities track most closely with traditional IT monitoring and include uptime, performance and those kinds of key stats. 

Observability tries to give teams context on how and why.  The model went down because the last deployment created instabilities that caused it to crash after ten minutes.  It helps answer questions like why is the model’s inference and prediction performance degrading?

Explainability is a suite of algorithms that help humans understand why a model made a decision after the fact or what a model is focusing on when it makes decisions.

In general, an easy way to understand the differences is that monitoring tracks failures, outages, uptime, and performance; observability looks to understand the system in both a healthy and unhealthy state; and explainability answers questions about specific predictions or inferences or the model’s focus as a whole.

Lastly, there are two areas that AI/ML supervision platforms tend to focus on:

  1. Model supervision
  2. Data quality

When it comes to model supervision, explainability tends to focus almost exclusively on models and their performance after training or when they reach production but both monitoring and observability need to exist for the ML system as a whole, not just for the model.  A bug introduced in the data pipeline may cause a model to fail but a team needs to be able to track the problem back to where it started in order to fix it, which means looking at the whole pipeline.  

Data quality is a subset of supervision. It focuses on testing and evaluating data as it comes into the system.  It looks to identify problems such as missing data, out of range violations, type mismatches and more.

You should be also careful not to confuse a monitoring engine that uses AI on its own backend to augment its own traditional monitoring engine with AI/ML model and data monitoring.  That’s often called AI Ops and it’s not what we are talking about here.  Many application providers use ML to help make predictive recommendations to their customers about potential failures of IT systems, using techniques like anomaly detection, to alert systems administrators of potential problems before they happen.  Anomaly detection signals when data is suddenly outside its normal range, such as when an electrical usage suddenly shoots up or down after staying steady for weeks or months.

Hopefully that gives you a clear and concise picture of the world of AI/ML monitoring and how it’s different from traditional IT monitoring.  These new AI Supervision tools take many pages from the classic tools in the space but they bring their own twists that take into account the many unique wrinkles of industrialized AI production.  

Putting a model in production is the first step.  After that the challenge has only just begun.  

Come check out the MLOps Day 2: Monitor, Observe Explain event to learn how to establish top data quality, ensure those data and model pipelines are flowing smoothly, and keep those models accurate and responding swiftly.

You can also read more about MLOps and the AI development lifecycle on the NVIDIA blog.