Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
"Haute Couture" and "Prêt-à-Porter" Data Science
Search
Christophe Bourguignat
April 15, 2016
Technology
0
370
"Haute Couture" and "Prêt-à-Porter" Data Science
Talk given @ Telecom ParisTech on April 2016
Christophe Bourguignat
April 15, 2016
Tweet
Share
More Decks by Christophe Bourguignat
See All by Christophe Bourguignat
Adding Neurons to your Assistants
kriss
1
330
Software Engineers, the New Data Scientists
kriss
1
140
Machine Learning for Chief Future Officers
kriss
1
130
Whitening The Blackbox : Why And How To Explain Machine Learning Predictions ?
kriss
1
1.1k
Building a Data Science Team
kriss
2
400
Lean Machine Learning
kriss
5
720
Kaggle Criteo Challenge and Online Learning
kriss
1
260
The #FrenchData landscape
kriss
0
450
Other Decks in Technology
See All in Technology
Google Cloud Next '24でブログを10本書いた方法と勉強会を沸かせた方法
yasumuusan
0
240
自動生成を活用した、運用保守コストを抑える Error/Alert/Runbook の一元集約管理 / Centralized management of Error/Alert/Runbook to minimize operational costs using automated code generation
biwashi
12
2.3k
2024/4/26 コンピュータ歴史博物館解説告知
toshi_atsumi
0
210
現代CSSフレームワークの内部実装とその仕組み
poteboy
8
3.4k
SPI原点回帰論:事業課題とFour Keysの結節点を見出す実践的ソフトウェアプロセス改善 / DevOpsDays Tokyo 2024
visional_engineering_and_design
4
1.8k
Kernel MemoryでAzure OpenAI Serviceとお手軽データソース連携
mitsuzono
1
160
Tableau事例紹介 / Tableau Case Study of Eureka
kazuya_araki_tokyo
1
180
ユーザーストーリーのレビューを自動化したみたの
bun913
1
380
反実仮想機械学習とは何か
usaito
PRO
8
3k
AOAI をきっかけに 社内の Azure 管理を見直した話
recruitengineers
PRO
1
190
Java EE/Jakarta EEの現状と将来―クラウドネイティブ時代にJava EEは対応できるのか?―
takakiyo
1
120
最近たまに見かけるTiDBってなんだ? - Findy
pingcap0315
2
740
Featured
See All Featured
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
321
20k
Bootstrapping a Software Product
garrettdimon
PRO
301
110k
We Have a Design System, Now What?
morganepeng
42
6.7k
The Pragmatic Product Professional
lauravandoore
24
5.8k
Documentation Writing (for coders)
carmenintech
59
3.9k
Embracing the Ebb and Flow
colly
79
4.1k
Put a Button on it: Removing Barriers to Going Fast.
kastner
58
3k
Visualization
eitanlees
135
14k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
30
6k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
5
1.5k
Pencils Down: Stop Designing & Start Developing
hursman
117
11k
The Art of Programming - Codeland 2020
erikaheidi
41
12k
Transcript
Christophe Bourguignat zelros.com /
[email protected]
/ @zelrosHQ
None
Agenda Models interpretation Models production A short history of Kaggle
MODELS INTERPRETATION
WHY ? Models opacity is a major reject cause by
users Unfortunately, predictive models that are the most powerful are usually the least interpretable
None
None
None
FEATURE IMPORTANCE
None
None
None
AEROSOLVE (AirBnb) Prior = general belief, before looking at the
data Inform the model of our prior beliefs by adding them to a text configuration file during training
None
None
None
Scikit Learn
Scikit Learn March 2014
Scikit Learn March 2014 April 2015
Scikit Learn March 2014 April 2015
Scikit Learn March 2014 April 2015
Scikit Learn March 2014 April 2015
Scikit Learn https://github.com/andosa/treeinterpreter/blob/master/treeinterpreter/treeinterpreter.py
EXEMPLE ON BOSTON DATASET
None
http://blog.datadive.net/prediction-intervals-for-random-forests/ Prediction Intervals for Random Forests
None
None
PRODUCTION
None
None
TRADITIONAL B.I. DEPARTMENT DATA ANALYSTS ETL ENGINEER DBAs
“INFINITE LOOP OF SADNESS” DATA SCIENTISTS IT / DATA ENGINEERS
SOFTWARE ENGINEERS BUSINESS http://multithreaded.stitchfix.com/blog/2016/03/16/engineers-shouldnt-write-etl/
CODE http://treycausey.com/software_dev_skills.html
COMPLEXITY AND TECHNICAL DEBT Underutilized features Undeclared consumers Pipeline Jungles
- preparing data in a ML-friendly format http://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/43146.pdf
PRODUCTION FAILS Unseen category Unreproductible feat eng workflow (PMML) Leakage
in DataBase fields (churn) Monitoring
A BRIEF HISTORY OF KAGGLE
June 2013 Sept 2013 Nov 2014 Apr 2015 Mar 2016
None
None
None
None
None
None
None
Refinements : - hashing function - adaptive learning rate (different
flavours) - Vowpal Wabbit - Dropout - PyPy
None
None
None
None
None
None
None
None
QUESTIONS ? zelros.com /
[email protected]
/ @zelrosHQ