Better than Deep Learning:
Gradient Boosting Machines (GBM)
Szilárd Pafka, PhD
Chief Scientist, Epoch
LA Data Science Meetup, Santa Monica
July 2018
Slide 2
Slide 2 text
No content
Slide 3
Slide 3 text
Disclaimer:
I am not representing my employer (Epoch) in this talk
I cannot confirm nor deny if Epoch is using any of the methods, tools,
results etc. mentioned in this talk
structured/tabular data: GBM (or RF)
very small data: LR
very large sparse data: LR with SGD (+L1/L2)
images/videos, speech: DL
Slide 23
Slide 23 text
structured/tabular data: GBM (or RF)
very small data: LR
very large sparse data: LR with SGD (+L1/L2)
images/videos, speech: DL
it depends
Slide 24
Slide 24 text
structured/tabular data: GBM (or RF)
very small data: LR
very large sparse data: LR with SGD (+L1/L2)
images/videos, speech: DL
it depends / try them all
Slide 25
Slide 25 text
structured/tabular data: GBM (or RF)
very small data: LR
very large sparse data: LR with SGD (+L1/L2)
images/videos, speech: DL
it depends / try them all / hyperparam tuning
Slide 26
Slide 26 text
structured/tabular data: GBM (or RF)
very small data: LR
very large sparse data: LR with SGD (+L1/L2)
images/videos, speech: DL
it depends / try them all / hyperparam tuning / ensembles
Slide 27
Slide 27 text
structured/tabular data: GBM (or RF)
very small data: LR
very large sparse data: LR with SGD (+L1/L2)
images/videos, speech: DL
it depends / try them all / hyperparam tuning / ensembles
feature engineering
Slide 28
Slide 28 text
structured/tabular data: GBM (or RF)
very small data: LR
very large sparse data: LR with SGD (+L1/L2)
images/videos, speech: DL
it depends / try them all / hyperparam tuning / ensembles
feature engineering / other goals e.g. interpretability
Slide 29
Slide 29 text
structured/tabular data: GBM (or RF)
very small data: LR
very large sparse data: LR with SGD (+L1/L2)
images/videos, speech: DL
it depends / try them all / hyperparam tuning / ensembles
feature engineering / other goals e.g. interpretability
the title of this talk was misguided
Slide 30
Slide 30 text
structured/tabular data: GBM (or RF)
very small data: LR
very large sparse data: LR with SGD (+L1/L2)
images/videos, speech: DL
it depends / try them all / hyperparam tuning / ensembles
feature engineering / other goals e.g. interpretability
the title of this talk was misguided
but so is recently almost every use of the term AI
Slide 31
Slide 31 text
Source: Hastie etal, ESL 2ed
Slide 32
Slide 32 text
Source: Hastie etal, ESL 2ed
Slide 33
Slide 33 text
Source: Hastie etal, ESL 2ed
Slide 34
Slide 34 text
Source: Hastie etal, ESL 2ed
Slide 35
Slide 35 text
No content
Slide 36
Slide 36 text
I usually use other people’s code [...] I can find open source code for
what I want to do, and my time is much better spent doing research
and feature engineering -- Owen Zhang