of › Recommendation › Internal library development › Internal application development Personal › Living with Java sparrows Junki Ishikawa Machine Learning Development Team
OPERATE MONITOR MLOps at ML Dept. Common pipeline 1SPUPUZQJOHUPPMT *OUFSOBMFYQFSJNFOUNBOBHFS 8PSLqPX&OHJOFT $*$%UPPMT *OUFSOBM-JCSBSJFT 4IBSFEGFBUVSFWFDUPST
OPERATE MONITOR MLOps at ML Dept. Common pipeline ? 1SPUPUZQJOHUPPMT *OUFSOBMFYQFSJNFOUNBOBHFS 8PSLqPX&OHJOFT $*$%UPPMT *OUFSOBM-JCSBSJFT 4IBSFEGFBUVSFWFDUPST
operations Increasing monitoring costs › Each project has different monitoring methods and alerts. › Sometimes cheap, sometimes poor. › As the number of ML products increases, the cost of monitoring has steadily grown.
operations Outages due to lack of monitoring Increasing monitoring costs › Each project has different monitoring methods and alerts. › Sometimes cheap, sometimes poor. › As the number of ML products increases, the cost of monitoring has steadily grown. › There are many causes of outages (e.g. missing data, the changes of model outputs, etc.). › It is nearly impossible to manually monitor every product.
Effective metrics depend on the task, data, model and so on… Data drift / Concept drift › Statistics of input data › Statistics of target variables Model degradation / replacement › Statistics of predictions › Ground-truth evaluation › Training / Validation metrics Lupus library helps to aggregate these metrics
Anomalies in the context of MLOps have more complex conditions than DevOps. Basic rules › If a metric exceeds the threshold › If a metric deviates significantly form the average of recent days Complex rules › If a metric deviates significantly from periodical change. › If the trend of a metric changes.
specific use cases. Major OSS do not fit our needs despite their complexity. › Lupus has niche requirements like showing anomalies and narrow down by metric groups. › LINE takes user privacy seriously and Lupus has strict and complicated authentication requirements. Why self-made? Web UI for metrics visualization › Metrics charts with anomaly information. › An explorer to easily discover a desired chart. › User customizable dashboards for daily observations.
than before. Easy monitoring › Lupus helps finding outages by detect obstacles that we hadn’t noticed before. Avoiding outages Discover insights › We could move from self-made notebook to reliable codebase with reviews. Reliable monitoring code › We can access collected metrics very fast with Lupus WebUI. › Also, we can easily share them to project members. Fast access, shareable UI › We could find changes in the accuracy of our products that we hadn’t known. › Got motivated to improve the products.
than before. Easy monitoring › Lupus helps finding outages by detect obstacles that we hadn’t noticed before. Avoiding outages Discover insights › We could move from self-made notebook to reliable codebase with reviews. Reliable monitoring code › We can access collected metrics very fast with Lupus WebUI. › Also, we can easily share them to project members. Fast access, shareable UI › We could find changes in the accuracy of our products that we hadn’t known. › Got motivated to improve the products.
than before. Easy monitoring › Lupus helps finding outages by detect obstacles that we hadn’t noticed before. Avoiding outages Discover insights › We could move from self-made notebook to reliable codebase with reviews. Reliable monitoring code › We can access collected metrics very fast with Lupus WebUI. › Also, we can easily share them to project members. Fast access, shareable UI › We could find changes in the accuracy of our products that we hadn’t known. › Got motivated to improve the products.
than before. Easy monitoring › Lupus helps finding outages by detect obstacles that we hadn’t noticed before. › We could find changes in the accuracy of our products that we hadn’t known. › Got motivated to improve the products. Avoiding outages Discover insights › We could move from self-made notebook to reliable codebase with reviews. Reliable monitoring code › We can access collected metrics very fast with Lupus WebUI. › Also, we can easily share them to project members. Fast access, shareable UI
than before. Easy monitoring › Lupus helps finding outages by detect obstacles that we hadn’t noticed before. Avoiding outages Discover insights › We could move from self-made notebook to reliable codebase with reviews. Reliable monitoring code › We can access collected metrics very fast with Lupus WebUI. › Also, we can easily share them to project members. Fast access, shareable UI › We could find changes in the accuracy of our products that we hadn’t known. › Got motivated to improve the products.
than before. Easy monitoring › Lupus helps finding outages by detect obstacles that we hadn’t noticed before. Avoiding outages Discover insights › We could move from self-made notebook to reliable codebase with reviews. Reliable monitoring code › We can access collected metrics very fast with Lupus WebUI. › Also, we can easily share them to project members. Fast access, shareable UI › We could find changes in the accuracy of our products that we hadn’t known. › Got motivated to improve the products.
ML products in a short development time. › Along with this, the cost of monitoring has been getting bigger and bigger. Our challenges in MLOps monitoring Our solution › We have developed an original monitoring system for MLOps, called Lupus › Lupus provides 3 components to help us collect, alert and visualize metrics in an efficient manner. Monitoring on MLOps › MLOps requires additional monitoring metrics related to data and ML models.