Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
"Haute Couture" and "Prêt-à-Porter" Data Science
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Christophe Bourguignat
April 15, 2016
Technology
0
480
"Haute Couture" and "Prêt-à-Porter" Data Science
Talk given @ Telecom ParisTech on April 2016
Christophe Bourguignat
April 15, 2016
Tweet
Share
More Decks by Christophe Bourguignat
See All by Christophe Bourguignat
Adding Neurons to your Assistants
kriss
1
370
Software Engineers, the New Data Scientists
kriss
1
150
Machine Learning for Chief Future Officers
kriss
1
140
Whitening The Blackbox : Why And How To Explain Machine Learning Predictions ?
kriss
1
1.2k
Building a Data Science Team
kriss
2
420
Lean Machine Learning
kriss
5
780
Kaggle Criteo Challenge and Online Learning
kriss
1
300
The #FrenchData landscape
kriss
0
500
Other Decks in Technology
See All in Technology
【Oracle Cloud ウェビナー】データ主権はクラウドで守れるのか?NTTデータ様のOracle Alloyで実現するソブリン対応クラウドの最適解
oracle4engineer
PRO
3
120
How to install a gem
indirect
0
1.9k
RGBに陥らないために -プロダクトの価値を届けるまで-
righttouch
PRO
0
130
DMBOKを使ってレバレジーズのデータマネジメントを評価した
leveragestech
0
450
脳が溶けた話 / Melted Brain
keisuke69
1
1.1k
Zephyr(RTOS)でOpenPLCを実装してみた
iotengineer22
0
150
Microsoft Fabricで考える非構造データのAI活用
ryomaru0825
0
440
GitHub Advanced Security × Defender for Cloudで開発とSecOpsのサイロを超える: コードとクラウドをつなぐ、開発プラットフォームのセキュリティ
yuriemori
1
110
出版記念イベントin大阪「書籍紹介&私がよく使うMCPサーバー3選と社内で安全に活用する方法」
kintotechdev
0
110
Astro Islandsの 内部実装を 「日本で一番わかりやすく」 ざっくり解説!
knj
0
310
AWS Systems Managerのハイブリッドアクティベーションを使用したガバメントクラウド環境の統合管理
toru_kubota
1
190
Why we keep our community?
kawaguti
PRO
0
330
Featured
See All Featured
What the history of the web can teach us about the future of AI
inesmontani
PRO
1
500
Chasing Engaging Ingredients in Design
codingconduct
0
150
Agile Leadership in an Agile Organization
kimpetersen
PRO
0
120
Rails Girls Zürich Keynote
gr2m
96
14k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
52
5.9k
Visualization
eitanlees
150
17k
What Being in a Rock Band Can Teach Us About Real World SEO
427marketing
0
200
Visual Storytelling: How to be a Superhuman Communicator
reverentgeek
2
480
The Cult of Friendly URLs
andyhume
79
6.8k
The Art of Programming - Codeland 2020
erikaheidi
57
14k
Speed Design
sergeychernyshev
33
1.6k
Principles of Awesome APIs and How to Build Them.
keavy
128
17k
Transcript
Christophe Bourguignat zelros.com /
[email protected]
/ @zelrosHQ
None
Agenda Models interpretation Models production A short history of Kaggle
MODELS INTERPRETATION
WHY ? Models opacity is a major reject cause by
users Unfortunately, predictive models that are the most powerful are usually the least interpretable
None
None
None
FEATURE IMPORTANCE
None
None
None
AEROSOLVE (AirBnb) Prior = general belief, before looking at the
data Inform the model of our prior beliefs by adding them to a text configuration file during training
None
None
None
Scikit Learn
Scikit Learn March 2014
Scikit Learn March 2014 April 2015
Scikit Learn March 2014 April 2015
Scikit Learn March 2014 April 2015
Scikit Learn March 2014 April 2015
Scikit Learn https://github.com/andosa/treeinterpreter/blob/master/treeinterpreter/treeinterpreter.py
EXEMPLE ON BOSTON DATASET
None
http://blog.datadive.net/prediction-intervals-for-random-forests/ Prediction Intervals for Random Forests
None
None
PRODUCTION
None
None
TRADITIONAL B.I. DEPARTMENT DATA ANALYSTS ETL ENGINEER DBAs
“INFINITE LOOP OF SADNESS” DATA SCIENTISTS IT / DATA ENGINEERS
SOFTWARE ENGINEERS BUSINESS http://multithreaded.stitchfix.com/blog/2016/03/16/engineers-shouldnt-write-etl/
CODE http://treycausey.com/software_dev_skills.html
COMPLEXITY AND TECHNICAL DEBT Underutilized features Undeclared consumers Pipeline Jungles
- preparing data in a ML-friendly format http://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/43146.pdf
PRODUCTION FAILS Unseen category Unreproductible feat eng workflow (PMML) Leakage
in DataBase fields (churn) Monitoring
A BRIEF HISTORY OF KAGGLE
June 2013 Sept 2013 Nov 2014 Apr 2015 Mar 2016
None
None
None
None
None
None
None
Refinements : - hashing function - adaptive learning rate (different
flavours) - Vowpal Wabbit - Dropout - PyPy
None
None
None
None
None
None
None
None
QUESTIONS ? zelros.com /
[email protected]
/ @zelrosHQ