Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
"Haute Couture" and "Prêt-à-Porter" Data Science
Search
Christophe Bourguignat
April 15, 2016
Technology
0
470
"Haute Couture" and "Prêt-à-Porter" Data Science
Talk given @ Telecom ParisTech on April 2016
Christophe Bourguignat
April 15, 2016
Tweet
Share
More Decks by Christophe Bourguignat
See All by Christophe Bourguignat
Adding Neurons to your Assistants
kriss
1
360
Software Engineers, the New Data Scientists
kriss
1
140
Machine Learning for Chief Future Officers
kriss
1
140
Whitening The Blackbox : Why And How To Explain Machine Learning Predictions ?
kriss
1
1.2k
Building a Data Science Team
kriss
2
410
Lean Machine Learning
kriss
5
780
Kaggle Criteo Challenge and Online Learning
kriss
1
290
The #FrenchData landscape
kriss
0
490
Other Decks in Technology
See All in Technology
入社1ヶ月でデータパイプライン講座を作った話
waiwai2111
1
120
Azure SQL Databaseでベクター検索を活用しよう
nakasho
0
120
開発メンバーが語るFindy Conferenceの裏側とこれから
sontixyou
2
340
Sansan Engineering Unit 紹介資料
sansan33
PRO
1
3.8k
Lambda Durable FunctionsでStep Functionsの代わりはできるのかを試してみた
smt7174
2
150
サイボウズ 開発本部採用ピッチ / Cybozu Engineer Recruit
cybozuinsideout
PRO
10
72k
オープンウェイトのLLMリランカーを契約書で評価する / searchtechjp
sansan_randd
3
350
それぞれのペースでやっていく Bet AI / Bet AI at Your Own Pace
yuyatakeyama
1
670
AI時代のPMに求められるのは 「Ops」と「Enablement」
shimotaroo
1
410
Regional_NAT_Gatewayについて_basicとの違い_試した内容スケールアウト_インについて_IPv6_dual_networkでの使い分けなど.pdf
cloudevcode
1
170
AIとともに歩む情報セキュリティ / Information Security with AI
kanny
4
2.7k
書籍執筆での生成AIの活用
sat
PRO
1
230
Featured
See All Featured
Reflections from 52 weeks, 52 projects
jeffersonlam
356
21k
New Earth Scene 8
popppiees
1
1.4k
Utilizing Notion as your number one productivity tool
mfonobong
2
200
Stewardship and Sustainability of Urban and Community Forests
pwiseman
0
110
How to Talk to Developers About Accessibility
jct
2
110
Avoiding the “Bad Training, Faster” Trap in the Age of AI
tmiket
0
64
Impact Scores and Hybrid Strategies: The future of link building
tamaranovitovic
0
190
GitHub's CSS Performance
jonrohan
1032
470k
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
Intergalactic Javascript Robots from Outer Space
tanoku
273
27k
Everyday Curiosity
cassininazir
0
120
Large-scale JavaScript Application Architecture
addyosmani
515
110k
Transcript
Christophe Bourguignat zelros.com /
[email protected]
/ @zelrosHQ
None
Agenda Models interpretation Models production A short history of Kaggle
MODELS INTERPRETATION
WHY ? Models opacity is a major reject cause by
users Unfortunately, predictive models that are the most powerful are usually the least interpretable
None
None
None
FEATURE IMPORTANCE
None
None
None
AEROSOLVE (AirBnb) Prior = general belief, before looking at the
data Inform the model of our prior beliefs by adding them to a text configuration file during training
None
None
None
Scikit Learn
Scikit Learn March 2014
Scikit Learn March 2014 April 2015
Scikit Learn March 2014 April 2015
Scikit Learn March 2014 April 2015
Scikit Learn March 2014 April 2015
Scikit Learn https://github.com/andosa/treeinterpreter/blob/master/treeinterpreter/treeinterpreter.py
EXEMPLE ON BOSTON DATASET
None
http://blog.datadive.net/prediction-intervals-for-random-forests/ Prediction Intervals for Random Forests
None
None
PRODUCTION
None
None
TRADITIONAL B.I. DEPARTMENT DATA ANALYSTS ETL ENGINEER DBAs
“INFINITE LOOP OF SADNESS” DATA SCIENTISTS IT / DATA ENGINEERS
SOFTWARE ENGINEERS BUSINESS http://multithreaded.stitchfix.com/blog/2016/03/16/engineers-shouldnt-write-etl/
CODE http://treycausey.com/software_dev_skills.html
COMPLEXITY AND TECHNICAL DEBT Underutilized features Undeclared consumers Pipeline Jungles
- preparing data in a ML-friendly format http://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/43146.pdf
PRODUCTION FAILS Unseen category Unreproductible feat eng workflow (PMML) Leakage
in DataBase fields (churn) Monitoring
A BRIEF HISTORY OF KAGGLE
June 2013 Sept 2013 Nov 2014 Apr 2015 Mar 2016
None
None
None
None
None
None
None
Refinements : - hashing function - adaptive learning rate (different
flavours) - Vowpal Wabbit - Dropout - PyPy
None
None
None
None
None
None
None
None
QUESTIONS ? zelros.com /
[email protected]
/ @zelrosHQ