Slide 1

Slide 1 text

● During a UI migration, AlerTiger detected a feature distribution shift across nine model aspects, leading to missing features and performance drop. However, post retraining with the new schema, we saw improved business metrics. ● When launching a new model, AlerTiger detected features with uniform values across all instances, causing performance inconsistency. On resolution, the successfully deployed model yielded expected business gains. AlerTiger: Deep Learning for AI Model Health Monitoring at LinkedIn Zhentao Xu, Ruoying Wang, Girish Balaji, Manas Bundele, Xiaofei Liu, Leo Liu, Tie Wang {zhexu, ruowang, gbalaji, mbundele, xiaoliu, leoliu, tiewang}@linkedin.com The health of AI models is crucial for the business success of data-driven companies. Unique challenges for AI model health monitoring and anomaly detection include: ● lack of unified health definition ● anomaly label sparsity ● difficulty in generalization ● lack of explainability ● Constructed an end-to-end data-driven solution for monitoring AI models. ● Combined quantile loss with RMSE to estimate a non-parametric distribution ● Demonstrated broad generalizability and implemented this solution at scale, delivering high performance. ● STEP-1 Health Statistics Generation: Serving pipeline emits data via Kafka; Offline spark job calculates statistics for input feature, output score, and auxiliary metadata. ● STEP-4 Alerting and Visualization: Model health data is compiled into a comprehensive report and emailed to AI model owners. MLOps Model Health Dataset METHODOLOGY CONTRIBUTION INTRODUCTION PRODUCTION CASE STUDY EVALUATION LINK Algorithm Precision Recall F1 Score Time AlerTiger 0.47 1.00 0.64 0 h 35 mins ARIMA 0.45 0.94 0.61 1 h 28 mins DeepAR 0.44 0.88 0.59 8 h 49 mins Prophet 0.44 1.00 0.61 14 h 0 mins SARIMAX 0.43 0.94 0.59 1 h 30 mins AR 0.43 0.94 0.59 1 h 30 mins Effect of Time Series and Anomaly on Performance We proposed AlerTiger, a deep-learning- based MLOps model monitoring system: ● defines three categories of health statistics and alerts on deviation. ● employs a two-stage training process for sparse labels. ● trains one model with various patterns, reducing onboarding costs. ● provides a holistic report with cross- dimension interaction. AlerTiger has been used on most of LinkedIn's AI models for over a year. It has identified and helped resolve issues, which led to improved business metrics. SOLUTION Irregularity score for better boundary prediction ● STEP-2 Two-stage Anomaly Detection (Forecasting + Classification) ● STEP-3 Post Processing (Filtering + Grouping) Mixed quantile loss + MSE loss for forecasting Cross-entropy loss for classification