Slide 1
Slide 1 text
● During a UI migration, AlerTiger detected a feature distribution shift across nine model
aspects, leading to missing features and performance drop. However, post retraining with the
new schema, we saw improved business metrics.
● When launching a new model, AlerTiger detected features with uniform values across all
instances, causing performance inconsistency. On resolution, the successfully deployed
model yielded expected business gains.
AlerTiger: Deep Learning for AI Model Health Monitoring at LinkedIn
Zhentao Xu, Ruoying Wang, Girish Balaji, Manas Bundele, Xiaofei Liu, Leo Liu, Tie Wang
{zhexu, ruowang, gbalaji, mbundele, xiaoliu, leoliu, tiewang}@linkedin.com
The health of AI models is crucial for the
business success of data-driven
companies. Unique challenges for AI
model health monitoring and anomaly
detection include:
● lack of unified health definition
● anomaly label sparsity
● difficulty in generalization
● lack of explainability
● Constructed an end-to-end data-driven
solution for monitoring AI models.
● Combined quantile loss with RMSE to
estimate a non-parametric distribution
● Demonstrated broad generalizability
and implemented this solution at scale,
delivering high performance.
● STEP-1 Health Statistics Generation: Serving pipeline emits data via Kafka; Offline spark
job calculates statistics for input feature, output score, and auxiliary metadata.
● STEP-4 Alerting and Visualization: Model health data is compiled into a comprehensive
report and emailed to AI model owners.
MLOps Model Health Dataset
METHODOLOGY
CONTRIBUTION
INTRODUCTION
PRODUCTION CASE STUDY
EVALUATION
LINK
Algorithm Precision Recall F1 Score Time
AlerTiger 0.47 1.00 0.64 0 h 35 mins
ARIMA 0.45 0.94 0.61 1 h 28 mins
DeepAR 0.44 0.88 0.59 8 h 49 mins
Prophet 0.44 1.00 0.61 14 h 0 mins
SARIMAX 0.43 0.94 0.59 1 h 30 mins
AR 0.43 0.94 0.59 1 h 30 mins
Effect of Time Series and Anomaly on Performance
We proposed AlerTiger, a deep-learning-
based MLOps model monitoring system:
● defines three categories of health
statistics and alerts on deviation.
● employs a two-stage training process
for sparse labels.
● trains one model with various patterns,
reducing onboarding costs.
● provides a holistic report with cross-
dimension interaction.
AlerTiger has been used on most of
LinkedIn's AI models for over a year. It
has identified and helped resolve issues,
which led to improved business metrics.
SOLUTION
Irregularity score for better
boundary prediction
● STEP-2 Two-stage Anomaly Detection
(Forecasting + Classification)
● STEP-3 Post Processing
(Filtering + Grouping)
Mixed quantile loss + MSE
loss for forecasting
Cross-entropy loss for
classification