AI Observability ツ - Examples with Spark, Pandas, and Scikit-Learn

© Kensu, inc. 2020 Andy Petrella from Kensu Data Engineering
Melbourne by

© Kensu, inc. 2021 AI Observability ツ Examples with Spark,
Pandas, and Scikit-Learn 1. “Who’s that dude” (73’) 2. Introduction to `Datastrophes` (10’) 3. Solution: needs and methods (15’) 4. Showtime: implementation examples (15’) AGENDA

© Kensu, inc. 2021 Hi! Creator of Founder & CEO
Author and trainer

© Kensu, inc. 2021 Introduction to `Datastrophes` Like any projects,
a data project needs to limit its scope. To do so many assumptions are necessary. Also, those assumptions are made by both the business (about the market) and the engineering (about the system) which leads, inevitably, to Datastrophes; Catastrophe = denouement of a drama. Datastrophe = catastrophe with data. ---------------------------------------------------------------------- Datastrophe = denouement of a DAMA (*). (*) DAta MAnagement

© Kensu, inc. 2021 Only data? Nope! https://papers.nips.cc/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba- Paper.pdf: Hidden
Technical Debt in Machine Learning Systems By Sculley

© Kensu, inc. 2021 Datastrophes ⇢? AI Winter 🥶🥶🥶 “The
AI winter was a result of such hype, due to over-inflated promises by developers, unnaturally high expectations from end-users, and extensive promotion in the media.” https://www.actuaries.digital/2018/09/05/history-of-ai-winters/

© Kensu, inc. 2021 Datastrophes ⇢? AI Winter 🥶🥶🥶 4.
Data variations are uncontrolled and unknown Why? POC Syndrome ROI drops over time It is not about garbage in, garbage out or data quality but trust what’s going on with data in production “15% of documentation overhead to ensure compliance and Data Catalog usefulness” -- project manager “Data is not available on time in production” -- data ops “Data suppliers changed schema, or semantic (business definition), impacting business rules accuracy” -- data engineer “The data is different than 6 months ago, all predictions are wrong” -- data scientist “Datastrophes” 1. Data is hard to find and usable in production 2. Cost of maintenance reduces team capabilities 3. Impact assessments are ineffective, incomplete

© Kensu, inc. 2021 AI Observability Wave-Particle duality A. Einstein:
“It seems as though we must use sometimes the one theory and sometimes the other, while at times we may use either. We are faced with a new kind of difficulty. We have two contradictory pictures of reality; separately neither of them fully explains the phenomena of light, but together they do.”

© Kensu, inc. 2021 AI Observability A Machine Learning model
can be seen as: - Data: it is a bunch of doubles resulting from the training process on the observations (i.e. the known world). - Application: it is used as a function (e.g. to predict). Moreover the behavior of the application part depends on the observations used in the training phase. Where our control resides in the hyper parameters we provide (found) during the training. It is like Java, Scala, Python, R, Go, SQL, etc. code changing automatically with its context, and what it’s becoming is unknown.

© Kensu, inc. 2021 AI Observability I can hear that
it is raining cats and dogs. I see a poor person outside walking the street. Although, I don’t have to help as the umbrella does already the job Note: in this case, it is raining Schrödinger's cats

© Kensu, inc. 2021 AI Observability As per the Schrödinger's
cat, an AI system can be considered, after a certain of time, good and bad simultaneously. Unless an observer looks into it and identifies its real state. However, especially with AI… the question is: What do we have to observe? In other words, which outputs shall we use to infer the internal state?

© Kensu, inc. 2021 (some) Links with Data Mesh •
Responsibility The domain becomes responsible for the data it exposes → The consumer shares the responsibility by exposing its usages and constraints • Data as a Product Linked to the responsibility, SLAs (SLOs) have to be defined and communicated. But more importantly, their failures must be detected, or anticipated • Federated Governance As data products are shared and promoted, (analytical) applications are mostly crossing several domain boundaries.

© Kensu, inc. 2021 Stay calm and let your codes/tools
speak while they run At least 3 strategies have been used successfully (running in prod 😁): • Catch events or use APIs of high-end tools (e.g. tableau): For example, lineage, nowadays, is being more and more implemented. • Wrap your preferred libraries with auto-logging capabilities: Spark, Pandas, Dplyr, Spring, and so on can be beefed up with internals log reporting. • Use opentracing philosophy to capture facets from your data usage that can be reconsolidated later on (trace reporter)

© Kensu, inc. 2021 Link with Data Intelligence Management (DIM)
Data Management, or especially, Data Governance is often thought of as Data Catalog (metadata repository, glossary, workflow management, …) DM by essence focuses on the data, therefore for example, to allow one to find a dataset based on its metadata - e.g. where are the customers data? AI Observability allows an organization to also capture the purposes of the data usages through the lens of the applications. Such that a usage based catalog will allow to find dataset based on the purpose - e.g. how can I predict my churn?

THANKS! Ping me on @nooostab or LinkedIn Checkout Kensu DIM
on https://kensu.io 🎺 📣 O’Reilly training (4/28): ML Monitoring in Python

AI Observability ツ - Examples with Spark, Panda...

AI Observability ツ - Examples with Spark, Pandas, and Scikit-Learn

Andy Petrella

More Decks by Andy Petrella

Other Decks in Technology

Featured

Transcript

© Kensu, inc. 2020 Andy Petrella from Kensu Data Engineering

© Kensu, inc. 2021 AI Observability ツ Examples with Spark,

© Kensu, inc. 2021 Hi! Creator of Founder & CEO

© Kensu, inc. 2021 Introduction to `Datastrophes` Like any projects,

© Kensu, inc. 2021 Datastrophes: some among so many others

© Kensu, inc. 2021 Datastrophes: some among so many others

© Kensu, inc. 2021 Datastrophes: some among so many others

© Kensu, inc. 2021 Datastrophes: some among so many others

© Kensu, inc. 2021 Datastrophes: some among so many others

© Kensu, inc. 2021 Datastrophes: some among so many others

© Kensu, inc. 2021 Datastrophes: some among so many others

© Kensu, inc. 2021 Only data? Nope! https://papers.nips.cc/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba- Paper.pdf: Hidden

© Kensu, inc. 2021 Only data? Nope! https://papers.nips.cc/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba- Paper.pdf: Hidden

© Kensu, inc. 2021 Only data? Nope! https://papers.nips.cc/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba- Paper.pdf: Hidden

© Kensu, inc. 2021 Only data? Nope! https://papers.nips.cc/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba- Paper.pdf: Hidden

© Kensu, inc. 2021 Datastrophes ⇢? AI Winter 🥶🥶🥶 “The

© Kensu, inc. 2021 Datastrophes ⇢? AI Winter 🥶🥶🥶 4.

© Kensu, inc. 2021 AI Observability Wave-Particle duality A. Einstein:

© Kensu, inc. 2021 AI Observability A Machine Learning model

© Kensu, inc. 2021 AI Observability I can hear that

© Kensu, inc. 2021 AI Observability As per the Schrödinger's

© Kensu, inc. 2021 AI Observability

© Kensu, inc. 2021 (some) Links with Data Mesh •

© Kensu, inc. 2021 Let’s jump in Jupyter to see

© Kensu, inc. 2021 Stay calm and let your codes/tools

© Kensu, inc. 2021 Link with Data Intelligence Management (DIM)

THANKS! Ping me on @nooostab or LinkedIn Checkout Kensu DIM