Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[2022] Invisible Interfaces - Considerations fo...

[2022] Invisible Interfaces - Considerations for Abstracting Complexities of a Real-time ML Platform

If you are a data scientist or a platform engineer, you probably can relate to the pains of working with the current explosive growth of Data/ML technologies and toolings. With many overlapping options and steep learning curves for each, it’s increasingly challenging for data science teams. Many platform teams started thinking about building an abstracted ML platform layer to support generalized ML use cases. But there are many complexities involved, especially as the underlying real-time data is shifting into the mainstream.

In this talk, we’ll discuss why ML platforms can benefit from a simple and “invisible” abstraction. We’ll offer some evidence on why you should consider leveraging streaming technologies even if your use cases are not real-time yet. We’ll share learnings (combining both ML and Infra perspectives) about some of the hard complexities involved in building such simple abstractions, the design principles behind them, and some counterintuitive decisions you may come across along the way.

By the end of the talk, I hope data scientists can walk away with some tips on how to evaluate ML platforms, and platform engineers learned a few architectural and design tricks.

Zhenzhong Xu

October 31, 2022
Tweet

More Decks by Zhenzhong Xu

Other Decks in Technology

Transcript

  1. Invisible Interfaces Zhenzhong Xu (@zhenzhongxu) Current 22 - Oct, 2022

    Considerations for Abstracting Complexities of a Real-time ML Platform
  2. About Zhenzhong Xu • Building real-time ML platform @Claypot •

    Real-time Data Infrastructure @ Netflix • Cloud infra @ Microsoft
  3. "There's been an explosion of ML use cases that …

    don't make sense if they aren't in real time. More and more people are doing ML in production, and most cases have to be streamed." Ali Ghodsi, Databricks CEO Fraud prevention Personalization Customer support Dynamic pricing Trending products Risk assessment Robotics Ads ETA Network analysis Sentiment analysis Object detection …
  4. Data Science Realtime ML Platform Data Infrastructure Exploration & Research

    Model Architecture & Turning Model Analysis & Selection Ingestion & Transport Security & Governance Multi-tenancy Isolation Data Sources Storage Query & Compute Business Decision Optimization Workflow Orchestration Analytics / Visualization
  5. Model Serving Model Training Model Monitoring Model Evaluation Feature Materialization

    Label Materialization Data Monitoring Data Model Flow Data Flow Data Flow Data Flow Data Flow Product Ecosystem Analytics ecosystem
  6. Data Loop Model Loop Challenge/Value Slow Slow Low freshness, low

    quality. Out-of-date models, predictions & trainings with stale data, model drift results in low model accuracy. Slow Fast Low freshness, low quality. Model training is bottlenecked by availability of fresh data. Prediction latency high or predicted with stale data. Fast Slow High freshness, low quality. Fresh data available for predictions, trainings, and observability. Slow model iteration results in out-of-date model, lower accuracy. Fast Fast High freshness, high quality. You want your ML ecosystem to be here. Combine your data and model loops: why you need both to be fast
  7. Online Customer Service Use Case Example • Suggest diagnostic runbook

    • Proactive in-the-moment remediation action • Fraud prevention vs detection
  8. Define model features • average transaction amount from past 14

    days • request channel encoding • text embedding similarity score Data Scientists
  9. What’s the appropriate level of complexity the ML platform should

    expose? ML Platforms: What’s preventing Ubiquitous?
  10. DWH (Snowflake / BigQuery / S3) Predictions 1 Offline batch

    prediction • Use cases: churn prediction, user LTV, risk planning, etc. 2 BI Batch job to generate predictions (e.g. Airflow + Spark)
  11. App DWH Prediction requests Batch job to generate features Prediction

    service 3 Online prediction with batch features • Batch features: computed offline, e.g. product embeddings • Use cases: recsys KV store 4 For low-latency online access Write to offline 2 Batch features Write to online 1 2 Joined batch features
  12. App DWH Prediction requests Batch job to generate features Prediction

    service 3 Online prediction with on-demand features • Batch features: queried from transactional stores, e.g. # orders in the last 30 mins • Use cases: recsys KV store 4 For low-latency online access Write to offline 2 Batch features Write to online 1 2 TX store (eg Postgres, Cassandra) Joined features 4 Transactions
  13. App DWH Batch job to generate features Prediction service 3

    Online prediction with streaming features • Online features: computed online, ◦ e.g. distance between two locations, count/percentile in the last 30 mins KV store Write to offline 2 Write to online 2 Real-time transport Logs Stream feature extraction Feature service 5 4 Batch features 1 4 Prediction requests
  14. Combining offline and online data Time DWH Stream transaction behavior

    over the last 6 months T-7 days T-1 day to T-6 month
  15. Combining offline and online data Time DWH Stream transaction behavior

    over the last 6 months T-7 days T-1 day to T-6 month Backfilling challenge
  16. Backfill in Lambda Architecture Data Source In-motion Compute At-rest Compute

    Online Storage Offline Storage Online Query (serving) Mixed Query (backfill) Offline Query (training)
  17. Backfill in Lambda Architecture Data Source In-motion Compute At-rest Compute

    Online Storage Offline Storage Online Query (serving) Mixed Query (backfill) Offline Query (training)
  18. Backfill in Kappa Architecture Data Source In-motion Compute (Backfill from

    historical log) Materialized Views Online Query (serving) Offline Query (training) batch transformation streaming transformation
  19. Backfill in Kappa Architecture Data Source In-motion Compute (Backfill from

    historical log) Materialized Views Online Query (serving) Offline Query (training) batch transformation streaming transformation
  20. 23 Unified Backfill Data Source In-motion Compute (intelligent backfill from

    dual sources) Materialized Views Online Query (serving) Offline Query (training) batch transformation streaming transformation DWH backed logs Orchestration & Governance
  21. 24 Abstracted Unified Backfill Data Source In-motion Compute (intelligent backfill

    from dual sources) Materialized Views Online Query (serving) Offline Query (training) batch transformation streaming transformation DWH backed logs Orchestration & Governance
  22. Build model features • Should I declare features in SQL

    or Python? • How do I join existing intent classification results to my new feature • What confidence can I get before checking in my code? Data Scientists
  23. Does the ML platform speak the same language as the

    users? ML Platforms: What’s preventing easy and responsive?
  24. Does the ML platform speak the same language as the

    users? Questions for ML Platforms: • Can users express or declare what they need to control in a single coherent interface? • Can the platform understand the intent and drive the underlying system? • Can user and platform communicate interactively, in a timely fashion? • Can the user understand their options and tradeoffs without reading a 300-pages manual? • How much integration effort is needed to plug a model into existing data streams?
  25. Online Prediction: Latency vs. Staleness Latency Request Prediction Feature computation

    Prediction retrieval Feature retrieval Prediction computation Raw data Staleness RT Feature NRT Feature Batch Feature Staleness No staleness* > secs > hours Latency Low (10s ms-1s sec) Lower (10s-100s ms) Lower (10s-100s ms) Footnote: *computation takes time, latency includes the computation time; Feature performance dependent on source technology and shared traffic pattern.
  26. What about tradeoffs? • Three dimensions! • Can choose 2!

    • Have to be flexible on the 3rd • Need clean abstractions for full freedom Correctness Low cost Low latency 1. Fast & Correct 2. Cheap & Correct 3. Fast & Cheap reference: Open Problems in Stream Processing: A Call To Action, Tyler Akidau (2019)
  27. Deploy model features • Should I duplicate the feature results

    in a different table? • Which team do I need to inform about the change? • Do I need to worry about training/prediction skew? Data Scientists
  28. What symptoms are there indicating your platform is not trusted?

    ML Platforms: What doesn’t just work? • My freedom and your responsibility • Producer and consumer tension • Users are forced to choose between basic requirements
  29. You are part of the endeavor to make real-time data

    useful! • Ubiquitous • Easy and responsive • Just works! https://zhenzhongxu.com/ [email protected] the invisible interface