Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
"Haute Couture" and "Prêt-à-Porter" Data Science
Search
Christophe Bourguignat
April 15, 2016
Technology
0
470
"Haute Couture" and "Prêt-à-Porter" Data Science
Talk given @ Telecom ParisTech on April 2016
Christophe Bourguignat
April 15, 2016
Tweet
Share
More Decks by Christophe Bourguignat
See All by Christophe Bourguignat
Adding Neurons to your Assistants
kriss
1
360
Software Engineers, the New Data Scientists
kriss
1
140
Machine Learning for Chief Future Officers
kriss
1
140
Whitening The Blackbox : Why And How To Explain Machine Learning Predictions ?
kriss
1
1.2k
Building a Data Science Team
kriss
2
410
Lean Machine Learning
kriss
5
770
Kaggle Criteo Challenge and Online Learning
kriss
1
290
The #FrenchData landscape
kriss
0
490
Other Decks in Technology
See All in Technology
実装で解き明かす並行処理の歴史
zozotech
PRO
1
550
「Verify with Wallet API」を アプリに導入するために
hinakko
1
250
オープンソースでどこまでできる?フォーマル検証チャレンジ
msyksphinz
0
100
ACA でMAGI システムを社内で展開しようとした話
mappie_kochi
1
290
多野優介
tanoyusuke
1
470
SwiftUIのGeometryReaderとScrollViewを基礎から応用まで学び直す:設計と活用事例
fumiyasac0921
0
150
Azure SynapseからAzure Databricksへ 移行してわかった新時代のコスト問題!?
databricksjapan
0
150
Why Governance Matters: The Key to Reducing Risk Without Slowing Down
sarahjwells
0
110
AI駆動開発を推進するためにサービス開発チームで 取り組んでいること
noayaoshiro
0
220
職種別ミートアップで社内から盛り上げる アウトプット文化の醸成と関係強化/ #DevRelKaigi
nishiuma
2
140
SoccerNet GSRの紹介と技術応用:選手視点映像を提供するサッカー作戦盤ツール
mixi_engineers
PRO
1
190
o11yで育てる、強い内製開発組織
_awache
3
120
Featured
See All Featured
What’s in a name? Adding method to the madness
productmarketing
PRO
23
3.7k
GraphQLとの向き合い方2022年版
quramy
49
14k
Embracing the Ebb and Flow
colly
88
4.8k
Optimizing for Happiness
mojombo
379
70k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
48
9.7k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
15
1.7k
Building Adaptive Systems
keathley
43
2.8k
Rebuilding a faster, lazier Slack
samanthasiow
84
9.2k
Music & Morning Musume
bryan
46
6.8k
YesSQL, Process and Tooling at Scale
rocio
173
14k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
507
140k
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
PRO
23
1.5k
Transcript
Christophe Bourguignat zelros.com /
[email protected]
/ @zelrosHQ
None
Agenda Models interpretation Models production A short history of Kaggle
MODELS INTERPRETATION
WHY ? Models opacity is a major reject cause by
users Unfortunately, predictive models that are the most powerful are usually the least interpretable
None
None
None
FEATURE IMPORTANCE
None
None
None
AEROSOLVE (AirBnb) Prior = general belief, before looking at the
data Inform the model of our prior beliefs by adding them to a text configuration file during training
None
None
None
Scikit Learn
Scikit Learn March 2014
Scikit Learn March 2014 April 2015
Scikit Learn March 2014 April 2015
Scikit Learn March 2014 April 2015
Scikit Learn March 2014 April 2015
Scikit Learn https://github.com/andosa/treeinterpreter/blob/master/treeinterpreter/treeinterpreter.py
EXEMPLE ON BOSTON DATASET
None
http://blog.datadive.net/prediction-intervals-for-random-forests/ Prediction Intervals for Random Forests
None
None
PRODUCTION
None
None
TRADITIONAL B.I. DEPARTMENT DATA ANALYSTS ETL ENGINEER DBAs
“INFINITE LOOP OF SADNESS” DATA SCIENTISTS IT / DATA ENGINEERS
SOFTWARE ENGINEERS BUSINESS http://multithreaded.stitchfix.com/blog/2016/03/16/engineers-shouldnt-write-etl/
CODE http://treycausey.com/software_dev_skills.html
COMPLEXITY AND TECHNICAL DEBT Underutilized features Undeclared consumers Pipeline Jungles
- preparing data in a ML-friendly format http://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/43146.pdf
PRODUCTION FAILS Unseen category Unreproductible feat eng workflow (PMML) Leakage
in DataBase fields (churn) Monitoring
A BRIEF HISTORY OF KAGGLE
June 2013 Sept 2013 Nov 2014 Apr 2015 Mar 2016
None
None
None
None
None
None
None
Refinements : - hashing function - adaptive learning rate (different
flavours) - Vowpal Wabbit - Dropout - PyPy
None
None
None
None
None
None
None
None
QUESTIONS ? zelros.com /
[email protected]
/ @zelrosHQ