Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Lupus - A Monitoring System for Accelerating MLOps

Lupus - A Monitoring System for Accelerating MLOps

LINE DEVDAY 2021
PRO

November 10, 2021
Tweet

More Decks by LINE DEVDAY 2021

Other Decks in Technology

Transcript

  1. View Slide

  2. Target audience
    › People who are managing ML products.
    › People who belong to ML team which is expected to grow much more.
    › And all people who are interested in MLOps.

    View Slide

  3. Agenda
    › What’s MLOps monitoring?
    › MLOps at ML Dept.
    › Our challenges in MLOps monitoring
    › Lupus: our monitoring infrastructure

    View Slide

  4. Self introduction
    Joined on April 2021, as new-grads
    In charge of
    › Recommendation
    › Internal library development
    › Internal application development
    Personal
    › Living with Java sparrows
    Junki Ishikawa
    Machine Learning Development Team

    View Slide

  5. Agenda
    › What’s MLOps monitoring?
    › MLOps at ML Dept.
    › Our challenges in MLOps monitoring
    › Lupus: our monitoring infrastructure

    View Slide

  6. What’s MLOps?
    ML + DevOps
    OPS
    DEV
    ML
    DESIGN
    ANALYZE
    EVALUATE
    CODE
    PLAN
    BUILD
    RELEASE
    OPERATE
    MONITOR

    View Slide

  7. What’s MLOps?
    ML + DevOps
    OPS
    DEV
    ML
    DESIGN
    ANALYZE
    EVALUATE
    CODE
    PLAN
    BUILD
    RELEASE
    OPERATE
    MONITOR

    View Slide

  8. What’s MLOps?
    ML + DevOps
    OPS
    DEV
    ML
    DESIGN
    ANALYZE
    EVALUATE
    CODE
    PLAN
    BUILD
    RELEASE
    OPERATE
    MONITOR

    View Slide

  9. What’s MLOps?
    ML + DevOps
    OPS
    DEV
    ML
    DESIGN
    ANALYZE
    EVALUATE
    CODE
    PLAN
    BUILD
    RELEASE
    OPERATE
    MONITOR

    View Slide

  10. What’s MLOps?
    ML + DevOps
    OPS
    DEV
    ML
    DESIGN
    ANALYZE
    EVALUATE
    CODE
    PLAN
    BUILD
    RELEASE
    OPERATE
    MONITOR

    View Slide

  11. What’s MLOps?
    ML + DevOps
    OPS
    DEV
    ML
    DESIGN
    ANALYZE
    EVALUATE
    CODE
    PLAN
    BUILD
    RELEASE
    OPERATE
    MONITOR

    View Slide

  12. What’s MLOps monitoring?
    What to monitor
    DevOps
    › Resource usage (CPU, Memory,
    Storage, …)
    › Disk I/O
    › Network Traffic
    › Heartbeats
    › Business KPIs
    › DevOps KPIs (MTTR, …)
    › etc…

    View Slide

  13. What’s MLOps monitoring?
    What to monitor
    DevOps MLOps
    › Data statistics
    › Input data changes (data drift)
    › Input - target pattern changes
    (concept drift)
    › Model performance
    › Prediction accuracy
    › Diversity of recommendations
    › Fairness
    +
    › Resource usage (CPU, Memory,
    Storage, …)
    › Disk I/O
    › Network Traffic
    › Heartbeats
    › Business KPIs
    › DevOps KPIs (MTTR, …)
    › etc…

    View Slide

  14. What’s MLOps monitoring?
    What to monitor
    DevOps MLOps
    › Data statistics
    › Input data changes (data drift)
    › Input - target pattern changes
    (concept drift)
    › Model performance
    › Prediction accuracy
    › Diversity of recommendations
    › Fairness
    +
    › Resource usage (CPU, Memory,
    Storage, …)
    › Disk I/O
    › Network Traffic
    › Heartbeats
    › Business KPIs
    › DevOps KPIs (MTTR, …)
    › etc…

    View Slide

  15. What’s MLOps monitoring?
    Other differences
    DevOps MLOps
    › Data statistics
    › Input data changes (data drift)
    › Input - target pattern changes
    (concept drift)
    › Model performance
    › Prediction accuracy
    › Diversity of recommendations
    › Fairness
    › Resource usage (CPU, Memory,
    Storage, …)
    › Disk I/O
    › Network Traffic
    › Heartbeats
    › Business KPIs
    › DevOps KPIs (MTTR, …)
    › etc…

    View Slide

  16. What’s MLOps monitoring?
    Other differences
    DevOps MLOps
    › Data statistics
    › Input data changes (data drift)
    › Input - target pattern changes
    (concept drift)
    › Model performance
    › Prediction accuracy
    › Diversity of recommendations
    › Fairness
    "VUPNBUJPO
    › Resource usage (CPU, Memory,
    Storage, …)
    › Disk I/O
    › Network Traffic
    › Heartbeats
    › Business KPIs
    › DevOps KPIs (MTTR, …)
    › etc…

    View Slide

  17. What’s MLOps monitoring?
    Other differences
    DevOps MLOps
    › Data statistics
    › Input data changes (data drift)
    › Input - target pattern changes
    (concept drift)
    › Model performance
    › Prediction accuracy
    › Diversity of recommendations
    › Fairness
    "VUPNBUJPO
    *OUFSWBM
    › Resource usage (CPU, Memory,
    Storage, …)
    › Disk I/O
    › Network Traffic
    › Heartbeats
    › Business KPIs
    › DevOps KPIs (MTTR, …)
    › etc…

    View Slide

  18. What’s MLOps monitoring?
    Other differences
    DevOps MLOps
    › Data statistics
    › Input data changes (data drift)
    › Input - target pattern changes
    (concept drift)
    › Model performance
    › Prediction accuracy
    › Diversity of recommendations
    › Fairness
    "VUPNBUJPO
    *OUFSWBM
    "MFSUMPHJD
    › Resource usage (CPU, Memory,
    Storage, …)
    › Disk I/O
    › Network Traffic
    › Heartbeats
    › Business KPIs
    › DevOps KPIs (MTTR, …)
    › etc…

    View Slide

  19. Agenda
    › What’s MLOps monitoring?
    › MLOps at ML Dept.
    › Our challenges in MLOps monitoring
    › Lupus: our monitoring infrastructure

    View Slide

  20. MLOps at ML Dept.
    Products

    View Slide

  21. MLOps at ML Dept.
    Products

    View Slide

  22. MLOps at ML Dept.
    Products

    View Slide

  23. MLOps at ML Dept.
    Products

    View Slide

  24. MLOps at ML Dept.
    Scale
    Organizations
    20+
    Tables from
    external organizations
    500+
    ML Products
    100+

    View Slide

  25. 27
    53
    80
    107
    133
    160
    No. of logics to select contents on SmartCH
    19/5 7 9 11 20/1 3 5 7 9 11 21/1 3 5 7 9
    MLOps at ML Dept.
    Scale
    Logic
    1
    Logic
    2
    … Logic
    n

    View Slide

  26. MLOps at ML Dept.
    Facilities

    View Slide

  27. MLOps at ML Dept.
    Facilities
    › Kubernetes
    › IU (LINE’s Hadoop cluster)
    Infrastructure

    View Slide

  28. MLOps at ML Dept.
    Facilities
    › Kubernetes
    › IU (LINE’s Hadoop cluster)
    Infrastructure
    › Jutopia (LINE’s Jupyter server)
    Prototyping environment

    View Slide

  29. MLOps at ML Dept.
    Facilities
    › Kubernetes
    › IU (LINE’s Hadoop cluster)
    Infrastructure
    › Argo Workflows
    › Azkaban
    › Airflow
    Workflow engines
    › Jutopia (LINE’s Jupyter server)
    Prototyping environment

    View Slide

  30. MLOps at ML Dept.
    Facilities
    › Kubernetes
    › IU (LINE’s Hadoop cluster)
    Infrastructure
    › Argo Workflows
    › Azkaban
    › Airflow
    Workflow engines
    › ArgoCD
    › Drone CI
    CI / CD tools
    › Jutopia (LINE’s Jupyter server)
    Prototyping environment

    View Slide

  31. MLOps at ML Dept.
    Facilities
    › User sparse/dense features
    › Item metadata features
    Shared feature vectors
    › Kubernetes
    › IU (LINE’s Hadoop cluster)
    Infrastructure
    › Argo Workflows
    › Azkaban
    › Airflow
    Workflow engines
    › ArgoCD
    › Drone CI
    CI / CD tools
    › Jutopia (LINE’s Jupyter server)
    Prototyping environment

    View Slide

  32. MLOps at ML Dept.
    Facilities
    › Distributed training & inference
    › Model collections
    › Recommendation automation
    › I/O manager
    › etc…
    Internal libraries
    › User sparse/dense features
    › Item metadata features
    Shared feature vectors
    › Kubernetes
    › IU (LINE’s Hadoop cluster)
    Infrastructure
    › Argo Workflows
    › Azkaban
    › Airflow
    Workflow engines
    › ArgoCD
    › Drone CI
    CI / CD tools
    › Jutopia (LINE’s Jupyter server)
    Prototyping environment

    View Slide

  33. MLOps at ML Dept.
    Facilities
    › Distributed training & inference
    › Model collections
    › Recommendation automation
    › I/O manager
    › etc…
    Internal libraries
    › User sparse/dense features
    › Item metadata features
    Shared feature vectors
    › A/B test manager
    › A/B test monitoring system
    › Recommendation demo generator
    Internal experiment manager
    › Kubernetes
    › IU (LINE’s Hadoop cluster)
    Infrastructure
    › Argo Workflows
    › Azkaban
    › Airflow
    Workflow engines
    › ArgoCD
    › Drone CI
    CI / CD tools
    › Jutopia (LINE’s Jupyter server)
    Prototyping environment

    View Slide

  34. OPS
    DEV
    ML
    DESIGN
    ANALYZE
    EVALUATE
    CODE
    PLAN
    BUILD
    RELEASE
    OPERATE
    MONITOR
    MLOps at ML Dept.
    Common pipeline
    1SPUPUZQJOHUPPMT
    *OUFSOBMFYQFSJNFOUNBOBHFS
    8PSLqPX&OHJOFT
    $*$%UPPMT
    *OUFSOBM-JCSBSJFT
    4IBSFEGFBUVSFWFDUPST

    View Slide

  35. OPS
    DEV
    ML
    DESIGN
    ANALYZE
    EVALUATE
    CODE
    PLAN
    BUILD
    RELEASE
    OPERATE
    MONITOR
    MLOps at ML Dept.
    Common pipeline
    ?
    1SPUPUZQJOHUPPMT
    *OUFSOBMFYQFSJNFOUNBOBHFS
    8PSLqPX&OHJOFT
    $*$%UPPMT
    *OUFSOBM-JCSBSJFT
    4IBSFEGFBUVSFWFDUPST

    View Slide

  36. Agenda
    › What’s MLOps monitoring?
    › MLOps at ML Dept.
    › Our challenges in MLOps monitoring
    › Lupus: our monitoring infrastructure

    View Slide

  37. Our challenges in MLOps monitoring
    Monitoring issues
    › As the number of ML products increases, the cost of monitoring has steadily grown.
    Increasing monitoring costs

    View Slide

  38. Our challenges in MLOps monitoring
    Monitoring issues
    Disjointed, project-dependent monitoring operations
    Increasing monitoring costs
    › Each project has different monitoring methods and alerts.
    › Sometimes cheap, sometimes poor.
    › As the number of ML products increases, the cost of monitoring has steadily grown.

    View Slide

  39. Our challenges in MLOps monitoring
    Monitoring issues
    Disjointed, project-dependent monitoring operations
    Outages due to lack of monitoring
    Increasing monitoring costs
    › Each project has different monitoring methods and alerts.
    › Sometimes cheap, sometimes poor.
    › As the number of ML products increases, the cost of monitoring has steadily grown.
    › There are many causes of outages (e.g. missing data, the changes of model outputs, etc.).
    › It is nearly impossible to manually monitor every product.

    View Slide

  40. Our challenges in MLOps monitoring
    Actual outage we experienced before
    Data Missing Model Update Manual Monitoring
    Cause:
    - Handcraft monitoring code
    on jupyter notebook
    Impact:
    - Cheap metrics
    - Poor alerting
    - Unreviewed code
    Cause:
    - Cluster outage
    - Delay
    Impact:
    - Low quality prediction
    - Empty prediction
    Cause:
    - Model architecture update
    - Smoothing
    Impact:
    - Significant drift in the
    prediction distribution
    - Found out 2 weeks later

    View Slide

  41. Our challenges in MLOps monitoring
    Actual outage we experienced before
    Data Missing Model Update Manual Monitoring
    Cause:
    - Handcraft monitoring code
    on jupyter notebook
    Impact:
    - Cheap metrics
    - Poor alerting
    - Unreviewed code
    Cause:
    - Cluster outage
    - Delay
    Impact:
    - Low quality prediction
    - Empty prediction
    Cause:
    - Model architecture update
    - Smoothing
    Impact:
    - Significant drift in the
    prediction distribution
    - Found out 2 weeks later

    View Slide

  42. Our challenges in MLOps monitoring
    Actual outage we experienced before
    Data Missing Model Update Manual Monitoring
    Cause:
    - Handcraft monitoring code
    on jupyter notebook
    Impact:
    - Cheap metrics
    - Poor alerting
    - Unreviewed code
    Cause:
    - Cluster outage
    - Delay
    Impact:
    - Low quality prediction
    - Empty prediction
    Cause:
    - Model architecture update
    - Smoothing
    Impact:
    - Significant drift in the
    prediction distribution
    - Found out 2 weeks later

    View Slide

  43. Our challenges in MLOps monitoring
    What we need

    View Slide

  44. Our challenges in MLOps monitoring
    What we need
    Collection

    View Slide

  45. Our challenges in MLOps monitoring
    What we need
    Collection
    Metrics
    aggregation
    tools
    Reliable
    metrics
    store

    View Slide

  46. Our challenges in MLOps monitoring
    What we need
    Detection
    Collection
    Metrics
    aggregation
    tools
    Reliable
    metrics
    store

    View Slide

  47. Our challenges in MLOps monitoring
    What we need
    Detection
    Collection
    Metrics
    aggregation
    tools
    Reliable
    metrics
    store
    Flexible
    anomaly
    detector
    Alerting
    system

    View Slide

  48. Our challenges in MLOps monitoring
    What we need
    Detection Visualization
    Collection
    Metrics
    aggregation
    tools
    Reliable
    metrics
    store
    Flexible
    anomaly
    detector
    Alerting
    system

    View Slide

  49. Our challenges in MLOps monitoring
    What we need
    Detection Visualization
    Collection
    User-friendly
    GUI app
    Metrics
    aggregation
    tools
    Reliable
    metrics
    store
    Flexible
    anomaly
    detector
    Alerting
    system

    View Slide

  50. Agenda
    › What’s MLOps monitoring?
    › MLOps at ML Dept.
    › Our challenges in MLOps monitoring
    › Lupus: our monitoring infrastructure

    View Slide

  51. Lupus
    Common monitoring infrastructure for MLOps

    View Slide

  52. Lupus
    Concept
    for engineers
    Easy to
    collect
    for operators
    Easy to
    detect
    for project members
    Easy to
    visualize

    View Slide

  53. Lupus
    Components
    Lupus server : Metric management and anomaly detection APIs
    Lupus SPA : Web app for metrics and anomalies visualization
    Lupus library : Metrics aggregation tools and API client

    View Slide

  54. Lupus
    Ecosystem

    View Slide

  55. Lupus
    Ecosystem

    View Slide

  56. Lupus
    Ecosystem

    View Slide

  57. Lupus
    Ecosystem

    View Slide

  58. Lupus
    Ecosystem

    View Slide

  59. Lupus
    Ecosystem

    View Slide

  60. Lupus
    Ecosystem

    View Slide

  61. Metrics collection
    Lupus

    View Slide

  62. Case: Metrics collection
    Which kind of metrics should we monitor?
    Effective metrics depend on the task, data, model and so on…
    Data drift / Concept drift
    › Statistics of input data
    › Statistics of target variables
    Model degradation / replacement
    › Statistics of predictions
    › Ground-truth evaluation
    › Training / Validation metrics
    Lupus library helps to aggregate these metrics

    View Slide

  63. Case: Metrics collection
    Library support
    [email protected]
    Sum
    95
    percentile
    Region Age Device Rating Interests
    JP 23 iOS [5.0, 4.0] [a, b, c]
    JP 42 Mac [2.0] [e, g]
    JP 64 Android [4.5, 3.5] [x, y, a]
    US 27 iOS [4.0, 4.0] [v, t, v]
    US 38 Android [3.0] [y]
    … … … … …
    count
    per entity
    Unique
    entity
    [email protected]
    / Region
    Min

    View Slide

  64. Case: Metrics collection
    Library support
    import pyspark.sql.functions as F
    from pyspark.sql import Row
    stats = []
    # age
    age_stats = df.groupby("region").agg(F.avg("age").alias("avg"), F.max("age").alias("max"), F.min("age").alias("min"))
    for row in age_stats.toLocalIterator():
    stats.append({"col": "age", "region": row.region, "metric": "avg", "value": row["avg"]})
    stats.append({"col": "age", "region": row.region, "metric": "max", "value": row["max"]})
    stats.append({"col": "age", "region": row.region, "metric": "min", "value": row["min"]})
    # device
    device_counts = df.groupby("region", "device").agg(F.count("device").alias("count"))
    device_unique = device_counts.groupby("region").agg(F.count("count").alias("unique"))
    for row in device_counts.toLocalIterator():
    stats.append({"col": "device", "region": row.region, "metric": "count", "value": row["count"], "device": row.device})
    for row in device_unique.toLocalIterator():
    stats.append({"col": "device", "region": row.region, "metric": "count", "unique": row["unique"]})
    # ratings
    def truncate(df, col, k):
    def _(row):
    dic = row.asDict()
    dic[col] = dic[col][:k]
    return Row(**dic)
    return df.rdd.map(_).toDF()
    ratings_stats_all = (
    df.select("region", F.explode("ratings").alias("ratings"))
    .groupby("region").agg(F.avg("ratings").alias("avg"), F.max("ratings").alias("max"), F.min("ratings").alias("min"))
    )
    for row in ratings_stats_all.toLocalIterator():
    stats.append({"col": "ratings", "region": row.region, "metric": "avg", "value": row["avg"]})
    stats.append({"col": "ratings", "region": row.region, "metric": "max", "value": row["max"]})
    stats.append({"col": "ratings", "region": row.region, "metric": "min", "value": row["min"]})
    ratings_stats_top5 = (
    truncate(df, "ratings", 5)
    .select("region", F.explode("ratings").alias("ratings"))
    .groupby("region").agg(F.avg("ratings").alias("avg"), F.max("ratings").alias("max"), F.min("ratings").alias("min"))
    )
    for row in ratings_stats_top5.toLocalIterator():
    stats.append({"col": "ratings", "region": row.region, "metric": "[email protected]", "value": row["avg"]})
    stats.append({"col": "ratings", "region": row.region, "metric": "[email protected]", "value": row["max"]})
    stats.append({"col": "ratings", "region": row.region, "metric": "[email protected]", "value": row["min"]})
    interests_count_all = (
    df.select("region", F.explode("interests").alias("interests"))
    .groupby("region", "interests").agg(F.count("interests").alias("count"))
    )
    interests_unique_all = interests_count_all.groupby("region").agg(F.count("count").alias("unique"))
    for row in interests_count_all.toLocalIterator():
    stats.append({"col": "interests", "region": row.region, "metric": "count", "value": row["count"], "device": row.interests})
    for row in interests_unique_all.toLocalIterator():
    stats.append({"col": "interests", "region": row.region, "metric": "count", "unique": row["unique"]})
    interests_count_top5 = (
    truncate(df, "interests", 5)
    .select("region", F.explode("interests").alias("interests"))
    .groupby("region", "interests").agg(F.count("interests").alias("count"))
    )
    interests_unique_top5 = interests_count_top5.groupby("region").agg(F.count("count").alias("unique"))
    for row in interests_count_top5.toLocalIterator():
    stats.append({"col": "interests", "region": row.region, "metric": "count", "value": row["count"], "device": row.interests})
    for row in interests_unique_top5.toLocalIterator():
    stats.append({"col": "interests", "region": row.region, "metric": "count", "unique": row["unique"]}

    View Slide

  65. import pyspark.sql.functions as F
    from pyspark.sql import Row
    stats = []
    # age
    age_stats = df.groupby("region").agg(F.avg("age").alias("avg"), F.max("age").alias("max"), F.min("age").alias("min"))
    for row in age_stats.toLocalIterator():
    stats.append({"col": "age", "region": row.region, "metric": "avg", "value": row["avg"]})
    stats.append({"col": "age", "region": row.region, "metric": "max", "value": row["max"]})
    stats.append({"col": "age", "region": row.region, "metric": "min", "value": row["min"]})
    # device
    device_counts = df.groupby("region", "device").agg(F.count("device").alias("count"))
    device_unique = device_counts.groupby("region").agg(F.count("count").alias("unique"))
    for row in device_counts.toLocalIterator():
    stats.append({"col": "device", "region": row.region, "metric": "count", "value": row["count"], "device": row.device})
    for row in device_unique.toLocalIterator():
    stats.append({"col": "device", "region": row.region, "metric": "count", "unique": row["unique"]})
    # ratings
    def truncate(df, col, k):
    def _(row):
    dic = row.asDict()
    dic[col] = dic[col][:k]
    return Row(**dic)
    return df.rdd.map(_).toDF()
    ratings_stats_all = (
    df.select("region", F.explode("ratings").alias("ratings"))
    .groupby("region").agg(F.avg("ratings").alias("avg"), F.max("ratings").alias("max"), F.min("ratings").alias("min"))
    )
    for row in ratings_stats_all.toLocalIterator():
    stats.append({"col": "ratings", "region": row.region, "metric": "avg", "value": row["avg"]})
    stats.append({"col": "ratings", "region": row.region, "metric": "max", "value": row["max"]})
    stats.append({"col": "ratings", "region": row.region, "metric": "min", "value": row["min"]})
    ratings_stats_top5 = (
    truncate(df, "ratings", 5)
    .select("region", F.explode("ratings").alias("ratings"))
    .groupby("region").agg(F.avg("ratings").alias("avg"), F.max("ratings").alias("max"), F.min("ratings").alias("min"))
    )
    for row in ratings_stats_top5.toLocalIterator():
    stats.append({"col": "ratings", "region": row.region, "metric": "[email protected]", "value": row["avg"]})
    stats.append({"col": "ratings", "region": row.region, "metric": "[email protected]", "value": row["max"]})
    stats.append({"col": "ratings", "region": row.region, "metric": "[email protected]", "value": row["min"]})
    interests_count_all = (
    df.select("region", F.explode("interests").alias("interests"))
    .groupby("region", "interests").agg(F.count("interests").alias("count"))
    )
    interests_unique_all = interests_count_all.groupby("region").agg(F.count("count").alias("unique"))
    for row in interests_count_all.toLocalIterator():
    stats.append({"col": "interests", "region": row.region, "metric": "count", "value": row["count"], "device": row.interests})
    for row in interests_unique_all.toLocalIterator():
    stats.append({"col": "interests", "region": row.region, "metric": "count", "unique": row["unique"]})
    interests_count_top5 = (
    truncate(df, "interests", 5)
    .select("region", F.explode("interests").alias("interests"))
    .groupby("region", "interests").agg(F.count("interests").alias("count"))
    )
    interests_unique_top5 = interests_count_top5.groupby("region").agg(F.count("count").alias("unique"))
    for row in interests_count_top5.toLocalIterator():
    stats.append({"col": "interests", "region": row.region, "metric": "count", "value": row["count"], "device": row.interests})
    for row in interests_unique_top5.toLocalIterator():
    stats.append({"col": "interests", "region": row.region, "metric": "count", "unique": row["unique"]}
    from lupus.processor.spark import \
    DistributionProcessor
    processor = DistributionProcessor(
    df,
    group_columns=[“region”],
    column_metrics={
    “age”: [“avg”, “p25”, “p50”, “p75”],
    “device”: [“count”, “unique”],
    “ratings”: [“avg”, “[email protected]”, “min”, “max”],
    “interests”: [“count”, “unique”, “[email protected]”],
    },
    )
    metrics = processor.get_metrics()
    Case: Metrics collection
    Library support

    View Slide

  66. Case: Metrics collection
    Library support
    pred gt
    A A
    B C
    C C
    A B
    B b
    … …
    Label
    count
    F1-score Recall
    Accuracy

    View Slide

  67. Case: Metrics collection
    Library support
    pred gt
    [A, C, B] [A, B]
    [A, C, B] [A]
    [C, D, E] [C, D]
    [A, D, C] [C]
    [D, E, B] [A, D]
    … …
    Unique
    [email protected] Recall
    [email protected]

    View Slide

  68. Case: Metrics collection
    Library support
    Training
    loss
    Extra
    metrics
    Validation
    loss
    MLFlow

    View Slide

  69. Case: Metrics collection
    Overview

    View Slide

  70. Case: Metrics collection
    Overview
    1. Aggregated metrics

    View Slide

  71. Case: Metrics collection
    Overview
    2. Push them to Lupus server

    View Slide

  72. Case: Metrics collection
    Overview
    3. metrics are uploaded
    to S3-compatible storage

    View Slide

  73. Case: Metrics collection
    Overview
    4. Submit the collection job to queue

    View Slide

  74. Case: Metrics collection
    Overview
    5. Workflow saves metrics to Hive
    and Elasticsearch

    View Slide

  75. Anomaly detection
    Lupus

    View Slide

  76. Case: Anomaly detection
    Which kind of alert do we need?
    Anomalies in the context of MLOps have more complex conditions than DevOps.
    Basic rules
    › If a metric exceeds the threshold
    › If a metric deviates significantly
    form the average of recent days
    Complex rules
    › If a metric deviates significantly
    from periodical change.
    › If the trend of a metric changes.

    View Slide

  77. Case: Anomaly detection
    Available anomaly detection methods
    Thresholding Time-series
    prediction
    by Prophet
    Window-based
    Rules
    Twitter’s
    AnomalyDetection
    package

    View Slide

  78. Case: Anomaly detection
    Overview

    View Slide

  79. Case: Anomaly detection
    Overview
    1. Request detection

    View Slide

  80. Case: Anomaly detection
    Overview
    2. Detection job is queued

    View Slide

  81. Case: Anomaly detection
    Overview
    3. Workflow reads metrics from
    Hive and performs detection

    View Slide

  82. Case: Anomaly detection
    Overview
    4. Save anomalies to Hive and
    Elasticsearch

    View Slide

  83. Visualization
    Lupus

    View Slide

  84. Case: Visualization
    Overview

    View Slide

  85. Case: Visualization
    Features and motivation
    › We have simple but specific use cases. Major OSS do not fit our needs despite their complexity.
    › Lupus has niche requirements like showing anomalies and narrow down by metric groups.
    › LINE takes user privacy seriously and Lupus has strict and complicated authentication requirements.
    Why self-made?
    Web UI for metrics visualization
    › Metrics charts with anomaly information.
    › An explorer to easily discover a desired chart.
    › User customizable dashboards for daily observations.

    View Slide

  86. Top
    Entrypoint to dashboards and the data source explorer

    View Slide

  87. Top
    Entrypoint to dashboards and the data source explorer

    View Slide

  88. Top
    Entrypoint to dashboards and the data source explorer

    View Slide

  89. Discover
    Chart listing for discovering a desired metric chart

    View Slide

  90. Discover
    Chart listing for discovering a desired metric chart

    View Slide

  91. Discover
    Chart listing for discovering a desired metric chart

    View Slide

  92. Discover
    Chart listing for discovering a desired metric chart

    View Slide

  93. Metric chart
    Detail page to show a series of metrics with anomaly points

    View Slide

  94. Metric chart
    Detail page to show a series of metrics with anomaly points

    View Slide

  95. Metric chart
    Detail page to show a series of metrics with anomaly points

    View Slide

  96. Anomalies
    Detailed anomaly information by clicking a series of metrics

    View Slide

  97. Anomalies
    Detailed anomaly information by clicking a series of metrics

    View Slide

  98. Dashboard
    Customizable dashboard to display favorite charts

    View Slide

  99. Impacts
    › It became much easier to collect daily
    metrics than before.
    Easy monitoring
    › Lupus helps finding outages by detect
    obstacles that we hadn’t noticed before.
    Avoiding outages
    Discover insights
    › We could move from self-made notebook
    to reliable codebase with reviews.
    Reliable monitoring code
    › We can access collected metrics very fast
    with Lupus WebUI.
    › Also, we can easily share them to project
    members.
    Fast access, shareable UI
    › We could find changes in the accuracy of
    our products that we hadn’t known.
    › Got motivated to improve the products.

    View Slide

  100. Impacts
    › It became much easier to collect daily
    metrics than before.
    Easy monitoring
    › Lupus helps finding outages by detect
    obstacles that we hadn’t noticed before.
    Avoiding outages
    Discover insights
    › We could move from self-made notebook
    to reliable codebase with reviews.
    Reliable monitoring code
    › We can access collected metrics very fast
    with Lupus WebUI.
    › Also, we can easily share them to project
    members.
    Fast access, shareable UI
    › We could find changes in the accuracy of
    our products that we hadn’t known.
    › Got motivated to improve the products.

    View Slide

  101. Impacts
    › It became much easier to collect daily
    metrics than before.
    Easy monitoring
    › Lupus helps finding outages by detect
    obstacles that we hadn’t noticed before.
    Avoiding outages
    Discover insights
    › We could move from self-made notebook
    to reliable codebase with reviews.
    Reliable monitoring code
    › We can access collected metrics very fast
    with Lupus WebUI.
    › Also, we can easily share them to project
    members.
    Fast access, shareable UI
    › We could find changes in the accuracy of
    our products that we hadn’t known.
    › Got motivated to improve the products.

    View Slide

  102. Impacts
    › It became much easier to collect daily
    metrics than before.
    Easy monitoring
    › Lupus helps finding outages by detect
    obstacles that we hadn’t noticed before.
    › We could find changes in the accuracy of
    our products that we hadn’t known.
    › Got motivated to improve the products.
    Avoiding outages
    Discover insights
    › We could move from self-made notebook
    to reliable codebase with reviews.
    Reliable monitoring code
    › We can access collected metrics very fast
    with Lupus WebUI.
    › Also, we can easily share them to project
    members.
    Fast access, shareable UI

    View Slide

  103. Impacts
    › It became much easier to collect daily
    metrics than before.
    Easy monitoring
    › Lupus helps finding outages by detect
    obstacles that we hadn’t noticed before.
    Avoiding outages
    Discover insights
    › We could move from self-made notebook
    to reliable codebase with reviews.
    Reliable monitoring code
    › We can access collected metrics very fast
    with Lupus WebUI.
    › Also, we can easily share them to project
    members.
    Fast access, shareable UI
    › We could find changes in the accuracy of
    our products that we hadn’t known.
    › Got motivated to improve the products.

    View Slide

  104. Impacts
    › It became much easier to collect daily
    metrics than before.
    Easy monitoring
    › Lupus helps finding outages by detect
    obstacles that we hadn’t noticed before.
    Avoiding outages
    Discover insights
    › We could move from self-made notebook
    to reliable codebase with reviews.
    Reliable monitoring code
    › We can access collected metrics very fast
    with Lupus WebUI.
    › Also, we can easily share them to project
    members.
    Fast access, shareable UI
    › We could find changes in the accuracy of
    our products that we hadn’t known.
    › Got motivated to improve the products.

    View Slide

  105. Summary
    › With previous efforts, ML Dept. can now release ML products in a short development time.
    › Along with this, the cost of monitoring has been getting bigger and bigger.
    Our challenges in MLOps monitoring
    Our solution
    › We have developed an original monitoring system for MLOps, called Lupus
    › Lupus provides 3 components to help us collect, alert and visualize metrics in an efficient manner.
    Monitoring on MLOps
    › MLOps requires additional monitoring metrics related to data and ML models.

    View Slide

  106. Reference
    Introducing MLOps (O'Reilly Media, Inc.)
    › Mark Treveil, Nicolas Omont, Clément Stenac, Kenji Lefevre, Du Phan, Joachim Zentici, Adrien
    Lavoillotte, Makoto Miyazaki and Lynn Heidmann
    Practical MLOps (O'Reilly Media, Inc.)
    › Noah Gift and Alfredo Deza
    MLOps: Continuous delivery and automation pipelines in machine learning (Google Could)
    › https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-
    machine-learning
    Evidently AI blog: machine learning monitoring series (Evidently AI)
    › https://evidentlyai.com/blog#!/tfeeds/393523502011/c/machine%20learning%20monitoring%20series

    View Slide

  107. Thank you

    View Slide