Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
"Haute Couture" and "Prêt-à-Porter" Data Science
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Christophe Bourguignat
April 15, 2016
Technology
480
0
Share
"Haute Couture" and "Prêt-à-Porter" Data Science
Talk given @ Telecom ParisTech on April 2016
Christophe Bourguignat
April 15, 2016
More Decks by Christophe Bourguignat
See All by Christophe Bourguignat
Adding Neurons to your Assistants
kriss
1
370
Software Engineers, the New Data Scientists
kriss
1
150
Machine Learning for Chief Future Officers
kriss
1
140
Whitening The Blackbox : Why And How To Explain Machine Learning Predictions ?
kriss
1
1.2k
Building a Data Science Team
kriss
2
420
Lean Machine Learning
kriss
5
790
Kaggle Criteo Challenge and Online Learning
kriss
1
300
The #FrenchData landscape
kriss
0
500
Other Decks in Technology
See All in Technology
Contract One Engineering Unit 紹介資料
sansan33
PRO
0
16k
マルチエージェント × ハーネスエンジニアリング × GitLab Duo Agent Platformで実現する「AIエージェントに仕事をさせる時代へ。」 / 20260421 GitLab Duo Agent Platform
n11sh1
0
140
The Journey of Box Building
tagomoris
4
310
CDK Insightsで見る、AIによるCDKコード静的解析(+AI解析)
k_adachi_01
2
180
明日からドヤれる!超マニアックなAWSセキュリティTips10連発 / 10 Ultra-Niche AWS Security Tips
yuj1osm
0
530
ARIA Notifyについて
ryokatsuse
1
120
ぼくがかんがえたさいきょうのあうとぷっと
yama3133
0
180
AIペネトレーションテスト・ セキュリティ検証「AgenticSec」ご紹介資料
laysakura
0
3.9k
AIを共同作業者にして書籍を執筆する方法 / How to Write a Book with AI as a Co-Creator
ama_ch
2
120
[OpsJAWS 40]リリースしたら終わり、じゃなかった。セキュリティ空白期間をAWS Security Agentで埋める
sh_fk2
3
220
AI駆動1on1〜AIに自分を育ててもらう〜
yoshiakiyasuda
0
120
扱える不確実性を増やしていく - スタートアップEMが考える「任せ方」
kadoppe
0
260
Featured
See All Featured
BBQ
matthewcrist
89
10k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
PRO
199
73k
Winning Ecommerce Organic Search in an AI Era - #searchnstuff2025
aleyda
1
2k
Leveraging Curiosity to Care for An Aging Population
cassininazir
1
220
Are puppies a ranking factor?
jonoalderson
1
3.3k
Code Reviewing Like a Champion
maltzj
528
40k
The Straight Up "How To Draw Better" Workshop
denniskardys
239
140k
Collaborative Software Design: How to facilitate domain modelling decisions
baasie
0
200
Information Architects: The Missing Link in Design Systems
soysaucechin
0
880
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
46
2.8k
How to Ace a Technical Interview
jacobian
281
24k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
231
23k
Transcript
Christophe Bourguignat zelros.com /
[email protected]
/ @zelrosHQ
None
Agenda Models interpretation Models production A short history of Kaggle
MODELS INTERPRETATION
WHY ? Models opacity is a major reject cause by
users Unfortunately, predictive models that are the most powerful are usually the least interpretable
None
None
None
FEATURE IMPORTANCE
None
None
None
AEROSOLVE (AirBnb) Prior = general belief, before looking at the
data Inform the model of our prior beliefs by adding them to a text configuration file during training
None
None
None
Scikit Learn
Scikit Learn March 2014
Scikit Learn March 2014 April 2015
Scikit Learn March 2014 April 2015
Scikit Learn March 2014 April 2015
Scikit Learn March 2014 April 2015
Scikit Learn https://github.com/andosa/treeinterpreter/blob/master/treeinterpreter/treeinterpreter.py
EXEMPLE ON BOSTON DATASET
None
http://blog.datadive.net/prediction-intervals-for-random-forests/ Prediction Intervals for Random Forests
None
None
PRODUCTION
None
None
TRADITIONAL B.I. DEPARTMENT DATA ANALYSTS ETL ENGINEER DBAs
“INFINITE LOOP OF SADNESS” DATA SCIENTISTS IT / DATA ENGINEERS
SOFTWARE ENGINEERS BUSINESS http://multithreaded.stitchfix.com/blog/2016/03/16/engineers-shouldnt-write-etl/
CODE http://treycausey.com/software_dev_skills.html
COMPLEXITY AND TECHNICAL DEBT Underutilized features Undeclared consumers Pipeline Jungles
- preparing data in a ML-friendly format http://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/43146.pdf
PRODUCTION FAILS Unseen category Unreproductible feat eng workflow (PMML) Leakage
in DataBase fields (churn) Monitoring
A BRIEF HISTORY OF KAGGLE
June 2013 Sept 2013 Nov 2014 Apr 2015 Mar 2016
None
None
None
None
None
None
None
Refinements : - hashing function - adaptive learning rate (different
flavours) - Vowpal Wabbit - Dropout - PyPy
None
None
None
None
None
None
None
None
QUESTIONS ? zelros.com /
[email protected]
/ @zelrosHQ