Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Lupus - A Monitoring System for Accelerating MLOps

Lupus - A Monitoring System for Accelerating MLOps

3a35402ba4f955e81dce2c7a22609d5c?s=128

LINE DEVDAY 2021
PRO

November 10, 2021
Tweet

More Decks by LINE DEVDAY 2021

Other Decks in Technology

Transcript

  1. None
  2. Target audience › People who are managing ML products. ›

    People who belong to ML team which is expected to grow much more. › And all people who are interested in MLOps.
  3. Agenda › What’s MLOps monitoring? › MLOps at ML Dept.

    › Our challenges in MLOps monitoring › Lupus: our monitoring infrastructure
  4. Self introduction Joined on April 2021, as new-grads In charge

    of › Recommendation › Internal library development › Internal application development Personal › Living with Java sparrows Junki Ishikawa Machine Learning Development Team
  5. Agenda › What’s MLOps monitoring? › MLOps at ML Dept.

    › Our challenges in MLOps monitoring › Lupus: our monitoring infrastructure
  6. What’s MLOps? ML + DevOps OPS DEV ML DESIGN ANALYZE

    EVALUATE CODE PLAN BUILD RELEASE OPERATE MONITOR
  7. What’s MLOps? ML + DevOps OPS DEV ML DESIGN ANALYZE

    EVALUATE CODE PLAN BUILD RELEASE OPERATE MONITOR
  8. What’s MLOps? ML + DevOps OPS DEV ML DESIGN ANALYZE

    EVALUATE CODE PLAN BUILD RELEASE OPERATE MONITOR
  9. What’s MLOps? ML + DevOps OPS DEV ML DESIGN ANALYZE

    EVALUATE CODE PLAN BUILD RELEASE OPERATE MONITOR
  10. What’s MLOps? ML + DevOps OPS DEV ML DESIGN ANALYZE

    EVALUATE CODE PLAN BUILD RELEASE OPERATE MONITOR
  11. What’s MLOps? ML + DevOps OPS DEV ML DESIGN ANALYZE

    EVALUATE CODE PLAN BUILD RELEASE OPERATE MONITOR
  12. What’s MLOps monitoring? What to monitor DevOps › Resource usage

    (CPU, Memory, Storage, …) › Disk I/O › Network Traffic › Heartbeats › Business KPIs › DevOps KPIs (MTTR, …) › etc…
  13. What’s MLOps monitoring? What to monitor DevOps MLOps › Data

    statistics › Input data changes (data drift) › Input - target pattern changes (concept drift) › Model performance › Prediction accuracy › Diversity of recommendations › Fairness + › Resource usage (CPU, Memory, Storage, …) › Disk I/O › Network Traffic › Heartbeats › Business KPIs › DevOps KPIs (MTTR, …) › etc…
  14. What’s MLOps monitoring? What to monitor DevOps MLOps › Data

    statistics › Input data changes (data drift) › Input - target pattern changes (concept drift) › Model performance › Prediction accuracy › Diversity of recommendations › Fairness + › Resource usage (CPU, Memory, Storage, …) › Disk I/O › Network Traffic › Heartbeats › Business KPIs › DevOps KPIs (MTTR, …) › etc…
  15. What’s MLOps monitoring? Other differences DevOps MLOps › Data statistics

    › Input data changes (data drift) › Input - target pattern changes (concept drift) › Model performance › Prediction accuracy › Diversity of recommendations › Fairness › Resource usage (CPU, Memory, Storage, …) › Disk I/O › Network Traffic › Heartbeats › Business KPIs › DevOps KPIs (MTTR, …) › etc…
  16. What’s MLOps monitoring? Other differences DevOps MLOps › Data statistics

    › Input data changes (data drift) › Input - target pattern changes (concept drift) › Model performance › Prediction accuracy › Diversity of recommendations › Fairness "VUPNBUJPO › Resource usage (CPU, Memory, Storage, …) › Disk I/O › Network Traffic › Heartbeats › Business KPIs › DevOps KPIs (MTTR, …) › etc…
  17. What’s MLOps monitoring? Other differences DevOps MLOps › Data statistics

    › Input data changes (data drift) › Input - target pattern changes (concept drift) › Model performance › Prediction accuracy › Diversity of recommendations › Fairness "VUPNBUJPO *OUFSWBM › Resource usage (CPU, Memory, Storage, …) › Disk I/O › Network Traffic › Heartbeats › Business KPIs › DevOps KPIs (MTTR, …) › etc…
  18. What’s MLOps monitoring? Other differences DevOps MLOps › Data statistics

    › Input data changes (data drift) › Input - target pattern changes (concept drift) › Model performance › Prediction accuracy › Diversity of recommendations › Fairness "VUPNBUJPO *OUFSWBM "MFSUMPHJD › Resource usage (CPU, Memory, Storage, …) › Disk I/O › Network Traffic › Heartbeats › Business KPIs › DevOps KPIs (MTTR, …) › etc…
  19. Agenda › What’s MLOps monitoring? › MLOps at ML Dept.

    › Our challenges in MLOps monitoring › Lupus: our monitoring infrastructure
  20. MLOps at ML Dept. Products

  21. MLOps at ML Dept. Products

  22. MLOps at ML Dept. Products

  23. MLOps at ML Dept. Products

  24. MLOps at ML Dept. Scale Organizations 20+ Tables from external

    organizations 500+ ML Products 100+
  25. 27 53 80 107 133 160 No. of logics to

    select contents on SmartCH 19/5 7 9 11 20/1 3 5 7 9 11 21/1 3 5 7 9 MLOps at ML Dept. Scale Logic 1 Logic 2 … Logic n
  26. MLOps at ML Dept. Facilities

  27. MLOps at ML Dept. Facilities › Kubernetes › IU (LINE’s

    Hadoop cluster) Infrastructure
  28. MLOps at ML Dept. Facilities › Kubernetes › IU (LINE’s

    Hadoop cluster) Infrastructure › Jutopia (LINE’s Jupyter server) Prototyping environment
  29. MLOps at ML Dept. Facilities › Kubernetes › IU (LINE’s

    Hadoop cluster) Infrastructure › Argo Workflows › Azkaban › Airflow Workflow engines › Jutopia (LINE’s Jupyter server) Prototyping environment
  30. MLOps at ML Dept. Facilities › Kubernetes › IU (LINE’s

    Hadoop cluster) Infrastructure › Argo Workflows › Azkaban › Airflow Workflow engines › ArgoCD › Drone CI CI / CD tools › Jutopia (LINE’s Jupyter server) Prototyping environment
  31. MLOps at ML Dept. Facilities › User sparse/dense features ›

    Item metadata features Shared feature vectors › Kubernetes › IU (LINE’s Hadoop cluster) Infrastructure › Argo Workflows › Azkaban › Airflow Workflow engines › ArgoCD › Drone CI CI / CD tools › Jutopia (LINE’s Jupyter server) Prototyping environment
  32. MLOps at ML Dept. Facilities › Distributed training & inference

    › Model collections › Recommendation automation › I/O manager › etc… Internal libraries › User sparse/dense features › Item metadata features Shared feature vectors › Kubernetes › IU (LINE’s Hadoop cluster) Infrastructure › Argo Workflows › Azkaban › Airflow Workflow engines › ArgoCD › Drone CI CI / CD tools › Jutopia (LINE’s Jupyter server) Prototyping environment
  33. MLOps at ML Dept. Facilities › Distributed training & inference

    › Model collections › Recommendation automation › I/O manager › etc… Internal libraries › User sparse/dense features › Item metadata features Shared feature vectors › A/B test manager › A/B test monitoring system › Recommendation demo generator Internal experiment manager › Kubernetes › IU (LINE’s Hadoop cluster) Infrastructure › Argo Workflows › Azkaban › Airflow Workflow engines › ArgoCD › Drone CI CI / CD tools › Jutopia (LINE’s Jupyter server) Prototyping environment
  34. OPS DEV ML DESIGN ANALYZE EVALUATE CODE PLAN BUILD RELEASE

    OPERATE MONITOR MLOps at ML Dept. Common pipeline 1SPUPUZQJOHUPPMT *OUFSOBMFYQFSJNFOUNBOBHFS 8PSLqPX&OHJOFT $*$%UPPMT *OUFSOBM-JCSBSJFT 4IBSFEGFBUVSFWFDUPST
  35. OPS DEV ML DESIGN ANALYZE EVALUATE CODE PLAN BUILD RELEASE

    OPERATE MONITOR MLOps at ML Dept. Common pipeline ? 1SPUPUZQJOHUPPMT *OUFSOBMFYQFSJNFOUNBOBHFS 8PSLqPX&OHJOFT $*$%UPPMT *OUFSOBM-JCSBSJFT 4IBSFEGFBUVSFWFDUPST
  36. Agenda › What’s MLOps monitoring? › MLOps at ML Dept.

    › Our challenges in MLOps monitoring › Lupus: our monitoring infrastructure
  37. Our challenges in MLOps monitoring Monitoring issues › As the

    number of ML products increases, the cost of monitoring has steadily grown. Increasing monitoring costs
  38. Our challenges in MLOps monitoring Monitoring issues Disjointed, project-dependent monitoring

    operations Increasing monitoring costs › Each project has different monitoring methods and alerts. › Sometimes cheap, sometimes poor. › As the number of ML products increases, the cost of monitoring has steadily grown.
  39. Our challenges in MLOps monitoring Monitoring issues Disjointed, project-dependent monitoring

    operations Outages due to lack of monitoring Increasing monitoring costs › Each project has different monitoring methods and alerts. › Sometimes cheap, sometimes poor. › As the number of ML products increases, the cost of monitoring has steadily grown. › There are many causes of outages (e.g. missing data, the changes of model outputs, etc.). › It is nearly impossible to manually monitor every product.
  40. Our challenges in MLOps monitoring Actual outage we experienced before

    Data Missing Model Update Manual Monitoring Cause: - Handcraft monitoring code on jupyter notebook Impact: - Cheap metrics - Poor alerting - Unreviewed code Cause: - Cluster outage - Delay Impact: - Low quality prediction - Empty prediction Cause: - Model architecture update - Smoothing Impact: - Significant drift in the prediction distribution - Found out 2 weeks later
  41. Our challenges in MLOps monitoring Actual outage we experienced before

    Data Missing Model Update Manual Monitoring Cause: - Handcraft monitoring code on jupyter notebook Impact: - Cheap metrics - Poor alerting - Unreviewed code Cause: - Cluster outage - Delay Impact: - Low quality prediction - Empty prediction Cause: - Model architecture update - Smoothing Impact: - Significant drift in the prediction distribution - Found out 2 weeks later
  42. Our challenges in MLOps monitoring Actual outage we experienced before

    Data Missing Model Update Manual Monitoring Cause: - Handcraft monitoring code on jupyter notebook Impact: - Cheap metrics - Poor alerting - Unreviewed code Cause: - Cluster outage - Delay Impact: - Low quality prediction - Empty prediction Cause: - Model architecture update - Smoothing Impact: - Significant drift in the prediction distribution - Found out 2 weeks later
  43. Our challenges in MLOps monitoring What we need

  44. Our challenges in MLOps monitoring What we need Collection

  45. Our challenges in MLOps monitoring What we need Collection Metrics

    aggregation tools Reliable metrics store
  46. Our challenges in MLOps monitoring What we need Detection Collection

    Metrics aggregation tools Reliable metrics store
  47. Our challenges in MLOps monitoring What we need Detection Collection

    Metrics aggregation tools Reliable metrics store Flexible anomaly detector Alerting system
  48. Our challenges in MLOps monitoring What we need Detection Visualization

    Collection Metrics aggregation tools Reliable metrics store Flexible anomaly detector Alerting system
  49. Our challenges in MLOps monitoring What we need Detection Visualization

    Collection User-friendly GUI app Metrics aggregation tools Reliable metrics store Flexible anomaly detector Alerting system
  50. Agenda › What’s MLOps monitoring? › MLOps at ML Dept.

    › Our challenges in MLOps monitoring › Lupus: our monitoring infrastructure
  51. Lupus Common monitoring infrastructure for MLOps

  52. Lupus Concept for engineers Easy to collect for operators Easy

    to detect for project members Easy to visualize
  53. Lupus Components Lupus server : Metric management and anomaly detection

    APIs Lupus SPA : Web app for metrics and anomalies visualization Lupus library : Metrics aggregation tools and API client
  54. Lupus Ecosystem

  55. Lupus Ecosystem

  56. Lupus Ecosystem

  57. Lupus Ecosystem

  58. Lupus Ecosystem

  59. Lupus Ecosystem

  60. Lupus Ecosystem

  61. Metrics collection Lupus

  62. Case: Metrics collection Which kind of metrics should we monitor?

    Effective metrics depend on the task, data, model and so on… Data drift / Concept drift › Statistics of input data › Statistics of target variables Model degradation / replacement › Statistics of predictions › Ground-truth evaluation › Training / Validation metrics Lupus library helps to aggregate these metrics
  63. Case: Metrics collection Library support Avg@k Sum 95 percentile Region

    Age Device Rating Interests JP 23 iOS [5.0, 4.0] [a, b, c] JP 42 Mac [2.0] [e, g] JP 64 Android [4.5, 3.5] [x, y, a] US 27 iOS [4.0, 4.0] [v, t, v] US 38 Android [3.0] [y] … … … … … count per entity Unique entity count@k / Region Min
  64. Case: Metrics collection Library support import pyspark.sql.functions as F from

    pyspark.sql import Row stats = [] # age age_stats = df.groupby("region").agg(F.avg("age").alias("avg"), F.max("age").alias("max"), F.min("age").alias("min")) for row in age_stats.toLocalIterator(): stats.append({"col": "age", "region": row.region, "metric": "avg", "value": row["avg"]}) stats.append({"col": "age", "region": row.region, "metric": "max", "value": row["max"]}) stats.append({"col": "age", "region": row.region, "metric": "min", "value": row["min"]}) # device device_counts = df.groupby("region", "device").agg(F.count("device").alias("count")) device_unique = device_counts.groupby("region").agg(F.count("count").alias("unique")) for row in device_counts.toLocalIterator(): stats.append({"col": "device", "region": row.region, "metric": "count", "value": row["count"], "device": row.device}) for row in device_unique.toLocalIterator(): stats.append({"col": "device", "region": row.region, "metric": "count", "unique": row["unique"]}) # ratings def truncate(df, col, k): def _(row): dic = row.asDict() dic[col] = dic[col][:k] return Row(**dic) return df.rdd.map(_).toDF() ratings_stats_all = ( df.select("region", F.explode("ratings").alias("ratings")) .groupby("region").agg(F.avg("ratings").alias("avg"), F.max("ratings").alias("max"), F.min("ratings").alias("min")) ) for row in ratings_stats_all.toLocalIterator(): stats.append({"col": "ratings", "region": row.region, "metric": "avg", "value": row["avg"]}) stats.append({"col": "ratings", "region": row.region, "metric": "max", "value": row["max"]}) stats.append({"col": "ratings", "region": row.region, "metric": "min", "value": row["min"]}) ratings_stats_top5 = ( truncate(df, "ratings", 5) .select("region", F.explode("ratings").alias("ratings")) .groupby("region").agg(F.avg("ratings").alias("avg"), F.max("ratings").alias("max"), F.min("ratings").alias("min")) ) for row in ratings_stats_top5.toLocalIterator(): stats.append({"col": "ratings", "region": row.region, "metric": "avg@5", "value": row["avg"]}) stats.append({"col": "ratings", "region": row.region, "metric": "max@5", "value": row["max"]}) stats.append({"col": "ratings", "region": row.region, "metric": "min@5", "value": row["min"]}) interests_count_all = ( df.select("region", F.explode("interests").alias("interests")) .groupby("region", "interests").agg(F.count("interests").alias("count")) ) interests_unique_all = interests_count_all.groupby("region").agg(F.count("count").alias("unique")) for row in interests_count_all.toLocalIterator(): stats.append({"col": "interests", "region": row.region, "metric": "count", "value": row["count"], "device": row.interests}) for row in interests_unique_all.toLocalIterator(): stats.append({"col": "interests", "region": row.region, "metric": "count", "unique": row["unique"]}) interests_count_top5 = ( truncate(df, "interests", 5) .select("region", F.explode("interests").alias("interests")) .groupby("region", "interests").agg(F.count("interests").alias("count")) ) interests_unique_top5 = interests_count_top5.groupby("region").agg(F.count("count").alias("unique")) for row in interests_count_top5.toLocalIterator(): stats.append({"col": "interests", "region": row.region, "metric": "count", "value": row["count"], "device": row.interests}) for row in interests_unique_top5.toLocalIterator(): stats.append({"col": "interests", "region": row.region, "metric": "count", "unique": row["unique"]}
  65. import pyspark.sql.functions as F from pyspark.sql import Row stats =

    [] # age age_stats = df.groupby("region").agg(F.avg("age").alias("avg"), F.max("age").alias("max"), F.min("age").alias("min")) for row in age_stats.toLocalIterator(): stats.append({"col": "age", "region": row.region, "metric": "avg", "value": row["avg"]}) stats.append({"col": "age", "region": row.region, "metric": "max", "value": row["max"]}) stats.append({"col": "age", "region": row.region, "metric": "min", "value": row["min"]}) # device device_counts = df.groupby("region", "device").agg(F.count("device").alias("count")) device_unique = device_counts.groupby("region").agg(F.count("count").alias("unique")) for row in device_counts.toLocalIterator(): stats.append({"col": "device", "region": row.region, "metric": "count", "value": row["count"], "device": row.device}) for row in device_unique.toLocalIterator(): stats.append({"col": "device", "region": row.region, "metric": "count", "unique": row["unique"]}) # ratings def truncate(df, col, k): def _(row): dic = row.asDict() dic[col] = dic[col][:k] return Row(**dic) return df.rdd.map(_).toDF() ratings_stats_all = ( df.select("region", F.explode("ratings").alias("ratings")) .groupby("region").agg(F.avg("ratings").alias("avg"), F.max("ratings").alias("max"), F.min("ratings").alias("min")) ) for row in ratings_stats_all.toLocalIterator(): stats.append({"col": "ratings", "region": row.region, "metric": "avg", "value": row["avg"]}) stats.append({"col": "ratings", "region": row.region, "metric": "max", "value": row["max"]}) stats.append({"col": "ratings", "region": row.region, "metric": "min", "value": row["min"]}) ratings_stats_top5 = ( truncate(df, "ratings", 5) .select("region", F.explode("ratings").alias("ratings")) .groupby("region").agg(F.avg("ratings").alias("avg"), F.max("ratings").alias("max"), F.min("ratings").alias("min")) ) for row in ratings_stats_top5.toLocalIterator(): stats.append({"col": "ratings", "region": row.region, "metric": "avg@5", "value": row["avg"]}) stats.append({"col": "ratings", "region": row.region, "metric": "max@5", "value": row["max"]}) stats.append({"col": "ratings", "region": row.region, "metric": "min@5", "value": row["min"]}) interests_count_all = ( df.select("region", F.explode("interests").alias("interests")) .groupby("region", "interests").agg(F.count("interests").alias("count")) ) interests_unique_all = interests_count_all.groupby("region").agg(F.count("count").alias("unique")) for row in interests_count_all.toLocalIterator(): stats.append({"col": "interests", "region": row.region, "metric": "count", "value": row["count"], "device": row.interests}) for row in interests_unique_all.toLocalIterator(): stats.append({"col": "interests", "region": row.region, "metric": "count", "unique": row["unique"]}) interests_count_top5 = ( truncate(df, "interests", 5) .select("region", F.explode("interests").alias("interests")) .groupby("region", "interests").agg(F.count("interests").alias("count")) ) interests_unique_top5 = interests_count_top5.groupby("region").agg(F.count("count").alias("unique")) for row in interests_count_top5.toLocalIterator(): stats.append({"col": "interests", "region": row.region, "metric": "count", "value": row["count"], "device": row.interests}) for row in interests_unique_top5.toLocalIterator(): stats.append({"col": "interests", "region": row.region, "metric": "count", "unique": row["unique"]} from lupus.processor.spark import \ DistributionProcessor processor = DistributionProcessor( df, group_columns=[“region”], column_metrics={ “age”: [“avg”, “p25”, “p50”, “p75”], “device”: [“count”, “unique”], “ratings”: [“avg”, “avg@5”, “min”, “max”], “interests”: [“count”, “unique”, “unique@3”], }, ) metrics = processor.get_metrics() Case: Metrics collection Library support
  66. Case: Metrics collection Library support pred gt A A B

    C C C A B B b … … Label count F1-score Recall Accuracy
  67. Case: Metrics collection Library support pred gt [A, C, B]

    [A, B] [A, C, B] [A] [C, D, E] [C, D] [A, D, C] [C] [D, E, B] [A, D] … … Unique nDCG@k Recall Entropy@k
  68. Case: Metrics collection Library support Training loss Extra metrics Validation

    loss MLFlow
  69. Case: Metrics collection Overview

  70. Case: Metrics collection Overview 1. Aggregated metrics

  71. Case: Metrics collection Overview 2. Push them to Lupus server

  72. Case: Metrics collection Overview 3. metrics are uploaded to S3-compatible

    storage
  73. Case: Metrics collection Overview 4. Submit the collection job to

    queue
  74. Case: Metrics collection Overview 5. Workflow saves metrics to Hive

    and Elasticsearch
  75. Anomaly detection Lupus

  76. Case: Anomaly detection Which kind of alert do we need?

    Anomalies in the context of MLOps have more complex conditions than DevOps. Basic rules › If a metric exceeds the threshold › If a metric deviates significantly form the average of recent days Complex rules › If a metric deviates significantly from periodical change. › If the trend of a metric changes.
  77. Case: Anomaly detection Available anomaly detection methods Thresholding Time-series prediction

    by Prophet Window-based Rules Twitter’s AnomalyDetection package
  78. Case: Anomaly detection Overview

  79. Case: Anomaly detection Overview 1. Request detection

  80. Case: Anomaly detection Overview 2. Detection job is queued

  81. Case: Anomaly detection Overview 3. Workflow reads metrics from Hive

    and performs detection
  82. Case: Anomaly detection Overview 4. Save anomalies to Hive and

    Elasticsearch
  83. Visualization Lupus

  84. Case: Visualization Overview

  85. Case: Visualization Features and motivation › We have simple but

    specific use cases. Major OSS do not fit our needs despite their complexity. › Lupus has niche requirements like showing anomalies and narrow down by metric groups. › LINE takes user privacy seriously and Lupus has strict and complicated authentication requirements. Why self-made? Web UI for metrics visualization › Metrics charts with anomaly information. › An explorer to easily discover a desired chart. › User customizable dashboards for daily observations.
  86. Top Entrypoint to dashboards and the data source explorer

  87. Top Entrypoint to dashboards and the data source explorer

  88. Top Entrypoint to dashboards and the data source explorer

  89. Discover Chart listing for discovering a desired metric chart

  90. Discover Chart listing for discovering a desired metric chart

  91. Discover Chart listing for discovering a desired metric chart

  92. Discover Chart listing for discovering a desired metric chart

  93. Metric chart Detail page to show a series of metrics

    with anomaly points
  94. Metric chart Detail page to show a series of metrics

    with anomaly points
  95. Metric chart Detail page to show a series of metrics

    with anomaly points
  96. Anomalies Detailed anomaly information by clicking a series of metrics

  97. Anomalies Detailed anomaly information by clicking a series of metrics

  98. Dashboard Customizable dashboard to display favorite charts

  99. Impacts › It became much easier to collect daily metrics

    than before. Easy monitoring › Lupus helps finding outages by detect obstacles that we hadn’t noticed before. Avoiding outages Discover insights › We could move from self-made notebook to reliable codebase with reviews. Reliable monitoring code › We can access collected metrics very fast with Lupus WebUI. › Also, we can easily share them to project members. Fast access, shareable UI › We could find changes in the accuracy of our products that we hadn’t known. › Got motivated to improve the products.
  100. Impacts › It became much easier to collect daily metrics

    than before. Easy monitoring › Lupus helps finding outages by detect obstacles that we hadn’t noticed before. Avoiding outages Discover insights › We could move from self-made notebook to reliable codebase with reviews. Reliable monitoring code › We can access collected metrics very fast with Lupus WebUI. › Also, we can easily share them to project members. Fast access, shareable UI › We could find changes in the accuracy of our products that we hadn’t known. › Got motivated to improve the products.
  101. Impacts › It became much easier to collect daily metrics

    than before. Easy monitoring › Lupus helps finding outages by detect obstacles that we hadn’t noticed before. Avoiding outages Discover insights › We could move from self-made notebook to reliable codebase with reviews. Reliable monitoring code › We can access collected metrics very fast with Lupus WebUI. › Also, we can easily share them to project members. Fast access, shareable UI › We could find changes in the accuracy of our products that we hadn’t known. › Got motivated to improve the products.
  102. Impacts › It became much easier to collect daily metrics

    than before. Easy monitoring › Lupus helps finding outages by detect obstacles that we hadn’t noticed before. › We could find changes in the accuracy of our products that we hadn’t known. › Got motivated to improve the products. Avoiding outages Discover insights › We could move from self-made notebook to reliable codebase with reviews. Reliable monitoring code › We can access collected metrics very fast with Lupus WebUI. › Also, we can easily share them to project members. Fast access, shareable UI
  103. Impacts › It became much easier to collect daily metrics

    than before. Easy monitoring › Lupus helps finding outages by detect obstacles that we hadn’t noticed before. Avoiding outages Discover insights › We could move from self-made notebook to reliable codebase with reviews. Reliable monitoring code › We can access collected metrics very fast with Lupus WebUI. › Also, we can easily share them to project members. Fast access, shareable UI › We could find changes in the accuracy of our products that we hadn’t known. › Got motivated to improve the products.
  104. Impacts › It became much easier to collect daily metrics

    than before. Easy monitoring › Lupus helps finding outages by detect obstacles that we hadn’t noticed before. Avoiding outages Discover insights › We could move from self-made notebook to reliable codebase with reviews. Reliable monitoring code › We can access collected metrics very fast with Lupus WebUI. › Also, we can easily share them to project members. Fast access, shareable UI › We could find changes in the accuracy of our products that we hadn’t known. › Got motivated to improve the products.
  105. Summary › With previous efforts, ML Dept. can now release

    ML products in a short development time. › Along with this, the cost of monitoring has been getting bigger and bigger. Our challenges in MLOps monitoring Our solution › We have developed an original monitoring system for MLOps, called Lupus › Lupus provides 3 components to help us collect, alert and visualize metrics in an efficient manner. Monitoring on MLOps › MLOps requires additional monitoring metrics related to data and ML models.
  106. Reference Introducing MLOps (O'Reilly Media, Inc.) › Mark Treveil, Nicolas

    Omont, Clément Stenac, Kenji Lefevre, Du Phan, Joachim Zentici, Adrien Lavoillotte, Makoto Miyazaki and Lynn Heidmann Practical MLOps (O'Reilly Media, Inc.) › Noah Gift and Alfredo Deza MLOps: Continuous delivery and automation pipelines in machine learning (Google Could) › https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in- machine-learning Evidently AI blog: machine learning monitoring series (Evidently AI) › https://evidentlyai.com/blog#!/tfeeds/393523502011/c/machine%20learning%20monitoring%20series
  107. Thank you