Pitfalls in ML in production: Greatest Hits

Pitfalls in ML in production: Greatest Hits Flávio Clésio, MSc.
- Pipeline Data Engineering Academy ﬂavioclesio at gmail dot com - fclesio - @ﬂavioclesio

About me • Staff Data/Machine Learning Engineer • MSc. Production
Engineering (Computational Intelligence) • Some of my thoughts flavioclesio.com • Some conference talks (Strata Hadoop World, Spark Summit, PAPIS.io, The Developers Conference, and others) 2 flavioclesio Flávio Clésio

Disclaimer #1 All views expressed in this here are based
in my personal empirical views that I experienced in recent years in the industry. Do not take any part of those views as hard science, best practices playbook for socio-technological systems, cautionary tales, BroScience, or any kind of science at all. The idea here is about only sharing some experiences from a practitioner standpoint.

Disclaimer #2 All views expressed in this presentation are my
own. They have not been reviewed or approved by my current, past, or either future employers. I do not speak on behalf of any company.

• Moonshot ML Projects • Lack of experimentation protocols •
The late 90’s nostalgic ex-dev, now in upper mgt • Lack (or poor) SWE practices • Be heavily invested in tools instead ecosystems • Canonical Models (God Model) • “Lab Mentality” • Project not related with any business result • Thinking Algos before Manual + Rules + Heuristics • Product Management clueless about ML projects • Final Remarks Agenda - Pitfalls in ML in production: Greatest Hits

Intro - Why Greatest Hits? DS/ML/DE projects aren’t easy and
in this ocean of resources around the interwebs, there are tons of materials. Unfortunately, there’s a lot of untold, hidden truths, corporate agendas, all of that mixed with the hype that brings people to the uncanny valley that can lead to failed ML projects. This presentation will talk about some of the greatest of them... Source: ML In Production

Moonshot ML Projects Characteristics: - Top-Down (most of the time
after someone attends an O’Reilly conference keynote) - Tries to solve a super hard problem like any FAANG, even having fewer employees than those companies has engineers (Source) - Starts like a big headline in a corporate email and end up like a death march with shattered teams (people quitting or being ﬁred) What to do: - Intellectual stamina is everything - Do the ﬁne art of “Boss Management” and tell the truth - In doubt, consider another place to work where you can bring real value

Lack of experimentation protocols Characteristics: - Deploy and forget -
Lack of success measures - Decisions based on little data or worse: In “gut” feeling - Vanity metrics or bad proxies indicates success What to do: - Elaborate your ideas around diﬀerent experimentation protocols (e.g. Concierge Tests, Alpha, Shadow Mode, Early Adopters program etc.) - Education is the key to change the culture

The late 90’s nostalgic ex-dev, now in upper mgt Characteristics:
- Someone that used to have a bit technical background in a remote past and achieved some management position in the corporate ladder (do not confuse with ageism) - Most of the time confuses practical tips with patronization - In reality, is a person with good intentions that want to collaborate in technical terms but has some knowledge debt What to do: - Explain the limitations of some outdated approaches patiently - Persuasion over conﬂict (you still needs the job) - Be open-minded and put their ideas to test, and if it doesn’t work test yours

Lack (or poor) SWE practices Characteristics: - eXtreme Go Horse
Methodology (XGH) - No testing, no logging, no CI, no CD, no git, and data in local machine - All work in the “Untitled38.ipynb” What to do: - Explain that notebooks are great in prototyping, but not suitable for production if you’re not Netﬂix - Pairing + Mobbing with Data Scientists - Explain to your boss the overhead of not have it

Be heavily invested in tools instead ecosystems Characteristics: - Entire
pipeline relying on a single “canonic data cloud vendor”... - or language... - or single tool (a.k.a one-way-ticket to the eternity) What to do: - The better tool is not only the one that solves the job, but that solves the job more eﬃciently - If you can choose ecosystem instead “language eﬃciency” (e.g. convenience, RDD) - If you’re not a FANNG (Tesla) or have strict requirements (e.g. low latency in inference phase, embedded ML, etc) pick the tool or language that has the biggest ecosystem related with the problem that you’re trying to solve

Canonical Models (God Model) Characteristics: - No reproducibility and/or explainability
- Overly complex model artifact with convoluted data pre-processing - Model training that takes forever, wasting more CO₂ than a Boeing 747-400 What to do: - Explain the power of ensembles - Sometimes it’s better to pay an extra engineering price in ensembling for the sake of simplicity than to be heavily invested in a single model artifact - Hierarchical Classiﬁcation with Local Classiﬁers might help in some cases

“Lab Mentality” Characteristics: - Judge the project success thru the
lens of an academic standpoint instead of the users + business standpoint - Trapped in the “fantastic world of the Offline Evaluation” - A fake excessive rigor mixed with clueless stakeholders (a.k.a. flood stakeholders with technicalities) What to do: - Explains that the corporate world it’s a blend between business pragmatism with the innovation mentality from academia - Unless you work in some FANNG research lab or as Research Engineer, be aware that you have to deliver something - Offline experimentation still has great value but is not a panacea

Project not related with any business result Characteristics: - No
revenue - No relation with any process optimization - No relation with a single business KPI - No deadline - No product or user beneﬁt What to do: - Unless you’re in some research lab or working as a research engineer, stay away from those projects - Be eager to bring those initiatives closer to the business to generate value in a positive way - In extreme cases consider change teams because once the management discovers how is slacking or distant from the business it will ax down the redundant people

Thinking Algos before Manual + Rules + Heuristics Characteristics: -
A complete lack of a production-wise approach - A naive impression that a single model artifact trained with code forked from Kaggle will encapsulate all dynamics represented in the data and all business complexity - Stale models in jupyter notebooks being presented as “AI” What to do: - A starting pipeline with heuristics, rules + deployment since day zero gives not only faster feedback but generates momentum - It’s easier to convince upper management with concrete results of a running product than power-points

Product Management clueless about ML projects Characteristics: - Ceremonies over
running code - Tons of PoCs, a few MVPs, and no system - Not embracing the uncertainty related to this kind of project - Projects starting with no feasibility analysis - No considerations about data quality - Projects trapped in the “sunk cost” fallacy and not embracing the fact that ML projects fail What to do: - Be patient and explain the speciﬁcities of the ﬁeld due to Project Management as a discipline is at least 10 years behind modern ML/AI projects - If it’s possible use a shotgun approach and experiment. Most of the ML products it’s not about being right all the time, but it’s about to be right only one time - There is a lot of low-hanging fruits out there. Start small on that and scale according to the results

Final Remarks - Model in production getting results with real
users/processes beats power-point presentations and cheap buzzword rhetoric - It’s a new discipline for everyone: Project Managers, Data Scientists, ML Engineers. Be humble and accept the fact that sometimes people do not have all answers - Embrace failure since day zero

Pitfalls in ML in production: Greatest Hits Flávio Clésio, MSc.
- Pipeline Data Engineering Academy ﬂavioclesio at gmail dot com - fclesio - @ﬂavioclesio

Pitfalls in ML in production: Greatest Hits

Pitfalls in ML in production: Greatest Hits

Flavio Clesio

More Decks by Flavio Clesio

Other Decks in Technology

Featured

Transcript

Pitfalls in ML in production: Greatest Hits Flávio Clésio, MSc.

About me • Staﬀ Data/Machine Learning Engineer • MSc. Production

Disclaimer #1 All views expressed in this here are based

Disclaimer #2 All views expressed in this presentation are my

• Moonshot ML Projects • Lack of experimentation protocols •

Intro - Why Greatest Hits? DS/ML/DE projects aren’t easy and

Moonshot ML Projects Characteristics: - Top-Down (most of the time

Lack of experimentation protocols Characteristics: - Deploy and forget -

The late 90’s nostalgic ex-dev, now in upper mgt Characteristics:

Lack (or poor) SWE practices Characteristics: - eXtreme Go Horse

Be heavily invested in tools instead ecosystems Characteristics: - Entire

Canonical Models (God Model) Characteristics: - No reproducibility and/or explainability

“Lab Mentality” Characteristics: - Judge the project success thru the

Project not related with any business result Characteristics: - No

Thinking Algos before Manual + Rules + Heuristics Characteristics: -

Product Management clueless about ML projects Characteristics: - Ceremonies over

Final Remarks - Model in production getting results with real

Pitfalls in ML in production: Greatest Hits Flávio Clésio, MSc.