Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
"Haute Couture" and "Prêt-à-Porter" Data Science
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Christophe Bourguignat
April 15, 2016
Technology
0
470
"Haute Couture" and "Prêt-à-Porter" Data Science
Talk given @ Telecom ParisTech on April 2016
Christophe Bourguignat
April 15, 2016
Tweet
Share
More Decks by Christophe Bourguignat
See All by Christophe Bourguignat
Adding Neurons to your Assistants
kriss
1
360
Software Engineers, the New Data Scientists
kriss
1
140
Machine Learning for Chief Future Officers
kriss
1
140
Whitening The Blackbox : Why And How To Explain Machine Learning Predictions ?
kriss
1
1.2k
Building a Data Science Team
kriss
2
410
Lean Machine Learning
kriss
5
780
Kaggle Criteo Challenge and Online Learning
kriss
1
290
The #FrenchData landscape
kriss
0
490
Other Decks in Technology
See All in Technology
AI駆動開発を事業のコアに置く
tasukuonizawa
1
330
Bedrock PolicyでAmazon Bedrock Guardrails利用を強制してみた
yuu551
0
260
Why Organizations Fail: ノーベル経済学賞「国家はなぜ衰退するのか」から考えるアジャイル組織論
kawaguti
PRO
1
150
予期せぬコストの急増を障害のように扱う――「コスト版ポストモーテム」の導入とその後の改善
muziyoshiz
1
2k
AWS Network Firewall Proxyを触ってみた
nagisa53
1
240
StrandsとNeptuneを使ってナレッジグラフを構築する
yakumo
1
120
SchooでVue.js/Nuxtを技術選定している理由
yamanoku
3
160
私たち準委任PdEは2つのプロダクトに挑戦する ~ソフトウェア、開発支援という”二重”のプロダクトエンジニアリングの実践~ / 20260212 Naoki Takahashi
shift_evolve
PRO
1
110
Claude_CodeでSEOを最適化する_AI_Ops_Community_Vol.2__マーケティングx_AIはここまで進化した.pdf
riku_423
2
610
登壇駆動学習のすすめ — CfPのネタの見つけ方と書くときに意識していること
bicstone
3
120
会社紹介資料 / Sansan Company Profile
sansan33
PRO
15
400k
Amazon S3 Vectorsを使って資格勉強用AIエージェントを構築してみた
usanchuu
3
450
Featured
See All Featured
Applied NLP in the Age of Generative AI
inesmontani
PRO
4
2.1k
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
4.2k
SEO in 2025: How to Prepare for the Future of Search
ipullrank
3
3.3k
GraphQLとの向き合い方2022年版
quramy
50
14k
Ethics towards AI in product and experience design
skipperchong
2
200
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
287
14k
Making Projects Easy
brettharned
120
6.6k
Faster Mobile Websites
deanohume
310
31k
How to Get Subject Matter Experts Bought In and Actively Contributing to SEO & PR Initiatives.
livdayseo
0
67
Are puppies a ranking factor?
jonoalderson
1
2.7k
Redefining SEO in the New Era of Traffic Generation
szymonslowik
1
220
Game over? The fight for quality and originality in the time of robots
wayneb77
1
120
Transcript
Christophe Bourguignat zelros.com /
[email protected]
/ @zelrosHQ
None
Agenda Models interpretation Models production A short history of Kaggle
MODELS INTERPRETATION
WHY ? Models opacity is a major reject cause by
users Unfortunately, predictive models that are the most powerful are usually the least interpretable
None
None
None
FEATURE IMPORTANCE
None
None
None
AEROSOLVE (AirBnb) Prior = general belief, before looking at the
data Inform the model of our prior beliefs by adding them to a text configuration file during training
None
None
None
Scikit Learn
Scikit Learn March 2014
Scikit Learn March 2014 April 2015
Scikit Learn March 2014 April 2015
Scikit Learn March 2014 April 2015
Scikit Learn March 2014 April 2015
Scikit Learn https://github.com/andosa/treeinterpreter/blob/master/treeinterpreter/treeinterpreter.py
EXEMPLE ON BOSTON DATASET
None
http://blog.datadive.net/prediction-intervals-for-random-forests/ Prediction Intervals for Random Forests
None
None
PRODUCTION
None
None
TRADITIONAL B.I. DEPARTMENT DATA ANALYSTS ETL ENGINEER DBAs
“INFINITE LOOP OF SADNESS” DATA SCIENTISTS IT / DATA ENGINEERS
SOFTWARE ENGINEERS BUSINESS http://multithreaded.stitchfix.com/blog/2016/03/16/engineers-shouldnt-write-etl/
CODE http://treycausey.com/software_dev_skills.html
COMPLEXITY AND TECHNICAL DEBT Underutilized features Undeclared consumers Pipeline Jungles
- preparing data in a ML-friendly format http://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/43146.pdf
PRODUCTION FAILS Unseen category Unreproductible feat eng workflow (PMML) Leakage
in DataBase fields (churn) Monitoring
A BRIEF HISTORY OF KAGGLE
June 2013 Sept 2013 Nov 2014 Apr 2015 Mar 2016
None
None
None
None
None
None
None
Refinements : - hashing function - adaptive learning rate (different
flavours) - Vowpal Wabbit - Dropout - PyPy
None
None
None
None
None
None
None
None
QUESTIONS ? zelros.com /
[email protected]
/ @zelrosHQ