Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
"Haute Couture" and "Prêt-à-Porter" Data Science
Search
Christophe Bourguignat
April 15, 2016
Technology
480
0
Share
"Haute Couture" and "Prêt-à-Porter" Data Science
Talk given @ Telecom ParisTech on April 2016
Christophe Bourguignat
April 15, 2016
More Decks by Christophe Bourguignat
See All by Christophe Bourguignat
Adding Neurons to your Assistants
kriss
1
370
Software Engineers, the New Data Scientists
kriss
1
150
Machine Learning for Chief Future Officers
kriss
1
140
Whitening The Blackbox : Why And How To Explain Machine Learning Predictions ?
kriss
1
1.2k
Building a Data Science Team
kriss
2
420
Lean Machine Learning
kriss
5
790
Kaggle Criteo Challenge and Online Learning
kriss
1
300
The #FrenchData landscape
kriss
0
500
Other Decks in Technology
See All in Technology
CloudSec JP #005 後締め ~ソフトウェアサプライチェーン攻撃から開発者のシークレットを守る~
lhazy
0
220
Standards et agents IA : un tour d’horizon de MCP, A2A, ADK et plus encore
glaforge
0
110
[最強DB講義]推薦システム | 基礎編
recsyslab
PRO
1
150
明日からドヤれる!超マニアックなAWSセキュリティTips10連発 / 10 Ultra-Niche AWS Security Tips
yuj1osm
0
530
クラウドネイティブな開発 ~ 認知負荷に立ち向かうためのコンテナ活用
literalice
0
100
Introduction to Sansan for Engineers / エンジニア向け会社紹介
sansan33
PRO
6
74k
新メンバーのために、シニアエンジニアが環境を作る時代
puku0x
0
1.1k
え!?初参加で 300冊以上 も頒布!? これは大成功!そのはずなのに わいの財布は 赤字 の件
hellohazime
0
150
名刺メーカーDevグループ 紹介資料
sansan33
PRO
0
1.1k
AWS DevOps Agentはチームメイトになれるのか?/ Can AWS DevOps Agent become a teammate
kinunori
6
670
研究開発部メンバーの働き⽅ / Sansan R&D Profile
sansan33
PRO
4
23k
マルチエージェント × ハーネスエンジニアリング × GitLab Duo Agent Platformで実現する「AIエージェントに仕事をさせる時代へ。」 / 20260421 GitLab Duo Agent Platform
n11sh1
0
140
Featured
See All Featured
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
31
10k
How to build a perfect <img>
jonoalderson
1
5.4k
Evolving SEO for Evolving Search Engines
ryanjones
0
180
Neural Spatial Audio Processing for Sound Field Analysis and Control
skoyamalab
0
250
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
47
8k
The #1 spot is gone: here's how to win anyway
tamaranovitovic
2
1k
Highjacked: Video Game Concept Design
rkendrick25
PRO
1
340
The Illustrated Guide to Node.js - THAT Conference 2024
reverentgeek
1
330
So, you think you're a good person
axbom
PRO
2
2k
The Curious Case for Waylosing
cassininazir
0
300
4 Signs Your Business is Dying
shpigford
187
22k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
231
23k
Transcript
Christophe Bourguignat zelros.com /
[email protected]
/ @zelrosHQ
None
Agenda Models interpretation Models production A short history of Kaggle
MODELS INTERPRETATION
WHY ? Models opacity is a major reject cause by
users Unfortunately, predictive models that are the most powerful are usually the least interpretable
None
None
None
FEATURE IMPORTANCE
None
None
None
AEROSOLVE (AirBnb) Prior = general belief, before looking at the
data Inform the model of our prior beliefs by adding them to a text configuration file during training
None
None
None
Scikit Learn
Scikit Learn March 2014
Scikit Learn March 2014 April 2015
Scikit Learn March 2014 April 2015
Scikit Learn March 2014 April 2015
Scikit Learn March 2014 April 2015
Scikit Learn https://github.com/andosa/treeinterpreter/blob/master/treeinterpreter/treeinterpreter.py
EXEMPLE ON BOSTON DATASET
None
http://blog.datadive.net/prediction-intervals-for-random-forests/ Prediction Intervals for Random Forests
None
None
PRODUCTION
None
None
TRADITIONAL B.I. DEPARTMENT DATA ANALYSTS ETL ENGINEER DBAs
“INFINITE LOOP OF SADNESS” DATA SCIENTISTS IT / DATA ENGINEERS
SOFTWARE ENGINEERS BUSINESS http://multithreaded.stitchfix.com/blog/2016/03/16/engineers-shouldnt-write-etl/
CODE http://treycausey.com/software_dev_skills.html
COMPLEXITY AND TECHNICAL DEBT Underutilized features Undeclared consumers Pipeline Jungles
- preparing data in a ML-friendly format http://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/43146.pdf
PRODUCTION FAILS Unseen category Unreproductible feat eng workflow (PMML) Leakage
in DataBase fields (churn) Monitoring
A BRIEF HISTORY OF KAGGLE
June 2013 Sept 2013 Nov 2014 Apr 2015 Mar 2016
None
None
None
None
None
None
None
Refinements : - hashing function - adaptive learning rate (different
flavours) - Vowpal Wabbit - Dropout - PyPy
None
None
None
None
None
None
None
None
QUESTIONS ? zelros.com /
[email protected]
/ @zelrosHQ