Slide 1

Slide 1 text

Machine Learning in Production Szilárd Pafka, PhD Chief Scientist, Epoch LA Data Science/Machine Learning Meetup May 2017

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

Disclaimer: I am not representing my employer (Epoch) in this talk I cannot confirm nor deny if Epoch is using any of the methods, tools, results etc. mentioned in this talk

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

data size [M] training time [s] 10x Gradient Boosting Machines

Slide 11

Slide 11 text

...

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

Machine Learning as a Service [...] The bottom line on why it doesn’t work: the people that know what they’re doing just use open source, and the people that don’t will not get anything to work, ever, even with APIs. [...] Amazon, Google, and Microsoft are all trying to sell a MLaaS layer as a component of their cloud strategy. [...] The problem here is a very practical matter; the MLaaS solutions have no customer segment -- they serve neither the competent nor the incompetent customer segment. The competent segment: you need machine learning people to build real production machine learning models, because it is hard to train and debug these things properly, and it requires a mix of understanding both theory and practice. These machine learning people tend to just use the same open source tools that the MLaaS services offer. So this knocks out the competent customer segment. [...] The incompetent segment: [...] http://www.bradfordcross.com/blog/2017/3/3/five-ai-startup-predictions-for-2017

Slide 14

Slide 14 text

- kaggle already pre-processed data (focus on ML) - sometimes less domain knowledge (or deliberately hidden) - focus on some measure of acc only, 0.0001 increases "relevant" - ensembles create too complex models - never deployed in prod - data leakage: for kaggle hard to fix even if discovered

Slide 15

Slide 15 text

No content