Machine Learning - the only expertise you need

Modern Data Science practice: Machine Learning - the only expertise
you need B i g D a t a E x p o 2 0 2 2 . U t r e c h t Alexey Chaplygin Chief Technology & Product Officer @ expondo GmbH

• Chief Information Officer @ Reface AI • Data Science
Manager @ PVH Europe • Software Developer / Data Science / Machine Learning Engineer @ Booking.com, Vrije Universiteit Amsterdam, ASML, SAP AG and others Alexey Chaplygin Chief Technology & Product Officer @ expondo GmbH

• 120+mln EUR revenue • 400+ exponDOers • HQ in
Berlin • Offices in Warshaw, Zielona Góra (PL), Shanghai and Hong Kong • Very remote friendly! Key facts: Procurement from 400+ partners in China, Vietnam, India and EU Product QA control Logistics to own warehouse in Poland Own digital production and marketing Sales via own web platform and marketplaces Own customer care and product aftercare

Company Values

Modern Data Science Practice from scratch

Data Science vs Machine Learning • Neural Networks perform worse
on small datasets • Neural Networks are smoothing functions • Neural Networks are being affected more by noisy inputs Findings: Conclusion: Stick to XGBoost and Random Forest Find good Machine Learning experts!

Data Science vs Machine Learning Neural Networks perform worse on
small datasets Medium size business: 10.000.000 EUR revenue, 100 EUR per customer gives 100.000 sales points per year. Number of impressions, touch-points and events generated by each customers is 100 times bigger. Neural Networks are smoothing functions Neural Networks are being affected more by noisy inputs If you don't know how to cook them and follow only the bookish approach.

Practical Experiments Experiment #1 – fit the known function: using
gradient decent find coefficients a, b, c, d Experiment #2 – find the unknown function: from random set X, consisting 256 points [0,1], knowing f(X) find the function g(x), that g(X) = f(X) Experiment #3 – find the unknown space of function: from random sets X consisting of n random points [0,1], where n is between 1 and 256, knowing f(X) find coefficients a, b, c, d of the function describing those points

Experiment #1 – fit known Experiment #1 – fit the
known function: using gradient decent find coefficients a, b, c, d To make it work: 1. Adam -> RMSProp 2. BatchSize -> 1 3. LearningRate -> gradually from 1 to .01 Error Space:

Experiment #2 – fit unknown Experiment #2 – find the
unknown function: from random set X, consisting 256 points [0,1], knowing f(X) find the function g(x), that g(X) = f(X) To make it work: 1. Adam -> RMSProp 2. BatchSize -> 1 3. LearningRate -> .001 Only interpolation!

Experiment #3 – fit them all Experiment #3 – find
the unknown space of function: from random sets X consisting of n random points [0,1], where n is between 1 and 256, knowing f(X) find coefficients a, b, c, d of the function describing those points To make it work – classic setup! Raw Data Feature Engineering Extracted Features Regression Model

Dynamic Pricing The goal: For each product, each sales channel
in each country find a function (price-demand elasticity), that depends on price and [all other data available], which output is sales density. Product Master vector Product Image matrix Sales History sequence of n vectors Marketing constant Classic stack Feature Engineering: 2FTE, SQL/Python (pandas) Modelling: 1FTE, Data Science Deployment: 1FTE, Python Engineering Total: 4FTE, 3 disciplines Machine Learning stack Modelling: 2FTE, Machine Learning Research Deployment: 1FTE, Machiner Learning Engineering Total: 3FTE, 1.5 disciplines

Data Science vs Machine Learning Data Science: Prepare the raw
data sources, from each data source manually extract a vector of features with the same key to join, build a Data Science Model using features as the input, deploy the model. Machine Learning: Prepare the raw data sources, build a Machine Learning Model, which automatically extracts vectors of features on its the shallow layers, and maps them onto the target space on its deep layers, deploy the model. Machine Learning is Data Science with automated feature engineering!

Why Machine Learning as a core practice? Pros: • Narrow
stack • Shared knowledge, less bus factor • Machine Learning specialists can usually do Data Science, but not the opposite • Machine Learning specialists are better coders than Data Scientists • Industry invests a lot in GPUs, TPUs, mobile "TensorCores" and other hardware accelerators for Machine Learning Cons: • Knowledge is scars, both in management and execution • Seniority required to keep the same speed and quality of developments and models interpretability

T H A N K Y O U F O
R Y O U R T I M E !

Machine Learning - the only expertise you need

Machine Learning - the only expertise you need

Marketing OGZ
PRO

More Decks by Marketing OGZ

Featured

Transcript

Modern Data Science practice: Machine Learning - the only expertise

• Chief Information Officer @ Reface AI • Data Science

• 120+mln EUR revenue • 400+ exponDOers • HQ in

Company Values

Modern Data Science Practice from scratch

Data Science vs Machine Learning • Neural Networks perform worse

Data Science vs Machine Learning Neural Networks perform worse on

Practical Experiments Experiment #1 – fit the known function: using

Experiment #1 – fit known Experiment #1 – fit the

Experiment #2 – fit unknown Experiment #2 – find the

Experiment #3 – fit them all Experiment #3 – find

Dynamic Pricing The goal: For each product, each sales channel

Data Science vs Machine Learning Data Science: Prepare the raw

Why Machine Learning as a core practice? Pros: • Narrow

T H A N K Y O U F O