[2022] Invisible Interfaces - Considerations for Abstracting Complexities of a Real-time ML Platform

Invisible Interfaces Zhenzhong Xu (@zhenzhongxu) Current 22 - Oct, 2022
Considerations for Abstracting Complexities of a Real-time ML Platform

The discovery of something invisible Ancient Greek name for amber:
elektron Thales of Miletus

The endeavor to make it useful Ubiquitous Easy and responsive
Just works! the invisible interface

About Zhenzhong Xu • Building real-time ML platform @Claypot •
Real-time Data Infrastructure @ Netﬂix • Cloud infra @ Microsoft

"There's been an explosion of ML use cases that …
don't make sense if they aren't in real time. More and more people are doing ML in production, and most cases have to be streamed." Ali Ghodsi, Databricks CEO Fraud prevention Personalization Customer support Dynamic pricing Trending products Risk assessment Robotics Ads ETA Network analysis Sentiment analysis Object detection …

AIIA survey (2022) - https://ai-infrastructure.org/ai-infrastructure-ecosystem-report-of-2022/

Data Science Realtime ML Platform Data Infrastructure Exploration & Research
Model Architecture & Turning Model Analysis & Selection Ingestion & Transport Security & Governance Multi-tenancy Isolation Data Sources Storage Query & Compute Business Decision Optimization Workﬂow Orchestration Analytics / Visualization

Model Serving Model Training Model Monitoring Model Evaluation Feature Materialization
Label Materialization Data Monitoring Data Model Flow Data Flow Data Flow Data Flow Data Flow Product Ecosystem Analytics ecosystem

Data Loop Model Loop Challenge/Value Slow Slow Low freshness, low
quality. Out-of-date models, predictions & trainings with stale data, model drift results in low model accuracy. Slow Fast Low freshness, low quality. Model training is bottlenecked by availability of fresh data. Prediction latency high or predicted with stale data. Fast Slow High freshness, low quality. Fresh data available for predictions, trainings, and observability. Slow model iteration results in out-of-date model, lower accuracy. Fast Fast High freshness, high quality. You want your ML ecosystem to be here. Combine your data and model loops: why you need both to be fast

Online Customer Service Use Case Example • Suggest diagnostic runbook
• Proactive in-the-moment remediation action • Fraud prevention vs detection

Deﬁne model features • average transaction amount from past 14
days • request channel encoding • text embedding similarity score Data Scientists

What’s the appropriate level of complexity the ML platform should
expose? ML Platforms: What’s preventing Ubiquitous?

DWH (Snowflake / BigQuery / S3) Predictions 1 Offline batch
prediction • Use cases: churn prediction, user LTV, risk planning, etc. 2 BI Batch job to generate predictions (e.g. Airflow + Spark)

App DWH Prediction requests Batch job to generate features Prediction
service 3 Online prediction with batch features • Batch features: computed ofﬂine, e.g. product embeddings • Use cases: recsys KV store 4 For low-latency online access Write to ofﬂine 2 Batch features Write to online 1 2 Joined batch features

App DWH Prediction requests Batch job to generate features Prediction
service 3 Online prediction with on-demand features • Batch features: queried from transactional stores, e.g. # orders in the last 30 mins • Use cases: recsys KV store 4 For low-latency online access Write to ofﬂine 2 Batch features Write to online 1 2 TX store (eg Postgres, Cassandra) Joined features 4 Transactions

App DWH Batch job to generate features Prediction service 3
Online prediction with streaming features • Online features: computed online, ◦ e.g. distance between two locations, count/percentile in the last 30 mins KV store Write to ofﬂine 2 Write to online 2 Real-time transport Logs Stream feature extraction Feature service 5 4 Batch features 1 4 Prediction requests

Combining ofﬂine and online data Time DWH Stream transaction behavior
over the last 6 months T-7 days T-1 day to T-6 month

Combining ofﬂine and online data Time DWH Stream transaction behavior
over the last 6 months T-7 days T-1 day to T-6 month Backﬁlling challenge

Backfill in Lambda Architecture Data Source In-motion Compute At-rest Compute
Online Storage Offline Storage Online Query (serving) Mixed Query (backfill) Offline Query (training)

Backfill in Kappa Architecture Data Source In-motion Compute (Backﬁll from
historical log) Materialized Views Online Query (serving) Ofﬂine Query (training) batch transformation streaming transformation

23 Unified Backfill Data Source In-motion Compute (intelligent backﬁll from
dual sources) Materialized Views Online Query (serving) Ofﬂine Query (training) batch transformation streaming transformation DWH backed logs Orchestration & Governance

24 Abstracted Unified Backfill Data Source In-motion Compute (intelligent backﬁll
from dual sources) Materialized Views Online Query (serving) Ofﬂine Query (training) batch transformation streaming transformation DWH backed logs Orchestration & Governance

Build model features • Should I declare features in SQL
or Python? • How do I join existing intent classification results to my new feature • What confidence can I get before checking in my code? Data Scientists

Does the ML platform speak the same language as the
users? ML Platforms: What’s preventing easy and responsive?

Does the ML platform speak the same language as the
users? Questions for ML Platforms: • Can users express or declare what they need to control in a single coherent interface? • Can the platform understand the intent and drive the underlying system? • Can user and platform communicate interactively, in a timely fashion? • Can the user understand their options and tradeoffs without reading a 300-pages manual? • How much integration effort is needed to plug a model into existing data streams?

Online Prediction: Latency vs. Staleness Latency Request Prediction Feature computation
Prediction retrieval Feature retrieval Prediction computation Raw data Staleness RT Feature NRT Feature Batch Feature Staleness No staleness* > secs > hours Latency Low (10s ms-1s sec) Lower (10s-100s ms) Lower (10s-100s ms) Footnote: *computation takes time, latency includes the computation time; Feature performance dependent on source technology and shared trafﬁc pattern.

What about tradeoffs? • Three dimensions! • Can choose 2!
• Have to be ﬂexible on the 3rd • Need clean abstractions for full freedom Correctness Low cost Low latency 1. Fast & Correct 2. Cheap & Correct 3. Fast & Cheap reference: Open Problems in Stream Processing: A Call To Action, Tyler Akidau (2019)

Python vs SQL vs (Scala) ? vs

Python vs SQL? ≈

Python vs SQL? ≈ Intermediate representation (IR) Compute Engines

There is a catch! UDF…

Don’t invent a new language/DSL! Evolve existing ones to make
it better.

Connector ecosystem is getting more mature. Nice, but what about
event schema and envelope standards?

Deploy model features • Should I duplicate the feature results
in a different table? • Which team do I need to inform about the change? • Do I need to worry about training/prediction skew? Data Scientists

What symptoms are there indicating your platform is not trusted?
ML Platforms: What doesn’t just work?

What symptoms are there indicating your platform is not trusted?
ML Platforms: What doesn’t just work? • My freedom and your responsibility • Producer and consumer tension • Users are forced to choose between basic requirements

Ofﬂine / Online consistencies Sharing and reusing Schema evolution SWE
Practices

You are part of the endeavor to make real-time data
useful! • Ubiquitous • Easy and responsive • Just works! https://zhenzhongxu.com/ [email protected] the invisible interface

[2022] Invisible Interfaces - Considerations fo...

[2022] Invisible Interfaces - Considerations for Abstracting Complexities of a Real-time ML Platform

More Decks by Zhenzhong Xu

Other Decks in Technology

Featured

Transcript