Machine Learning Software in Practice: Quo Vadis? - Invited Talk, KDD Conference, Applied Data Science Track - August 2017, Halifax, Canada

Machine Learning Software in Practice: Quo Vadis? Szilárd Pafka, PhD
Chief Scientist, Epoch KDD Conference - Applied Data Science Track Invited Talk August 2017, Halifax, Canada

Disclaimer: I am not representing my employer (Epoch) in this
talk I cannot confirm nor deny if Epoch is using any of the methods, tools, results etc. mentioned in this talk

ML Tools Mismatch: - What practitioners wish for - What
they truly need

ML Tools Mismatch: - What practitioners wish for - What
they truly need - What’s available - What’s advertised - What developers/researchers focus on

This talk is mostly in the context of (binary) classification

Warning: This talk is a series or rants observations with
the aim to provoke encourage thinking and constructive discussions about topics of impact on our industry.

Warning: This talk is a series or rants observations with
the aim to provoke encourage thinking and constructive discussions about topics of impact on our industry. Rantometer:

Our tools are optimized for what use cases?

Is building this the best allocation of our developer resources?

Efficiency for users during usage?

Big Data

Machine Learning Tools Speed, Memory, Accuracy

I usually use other people’s code [...] I can find
open source code for what I want to do, and my time is much better spent doing research and feature engineering -- Owen Zhang http://blog.kaggle.com/2015/06/22/profiling-top-kagglers-owen-zhang-currently-1-in-the-world/

binary classification, 10M records numeric & categorical features, non-sparse

http://www.cs.cornell.edu/~alexn/papers/empirical.icml06.pdf http://lowrank.net/nikos/pubs/empirical.pdf

- R packages - Python scikit-learn - Vowpal Wabbit -
H2O - xgboost - Spark MLlib - a few others

- R packages 30% - Python scikit-learn 40% - Vowpal
Wabbit 8% - H2O 10% - xgboost 8% - Spark MLlib 6% - a few others

n = 10K, 100K, 1M, 10M, 100M Training time RAM
usage AUC CPU % by core read data, pre-process, score test data

http://datascience.la/benchmarking-random-forest-implementations/#comment-53599

Best linear: 71.1

learn_rate = 0.1, max_depth = 6, n_trees = 300 learn_rate
= 0.01, max_depth = 16, n_trees = 1000

Deep Learning AI Oh my...

Source: Andrew Ng

Distributed ML

Multicore ML

1M: CPU cache effects

(lightgbm 10M)

16 cores vs 1: 16 cores:

Aggregation 100M rows 1M groups Join 100M rows x 1M
rows time [s] time [s]

Benchmarks

Wishlist: - more datasets (10-100, structure, size) - automation: upgrading
tools, re-running ($$)

tools, re-running ($$) - more algos, more tools (OS/commercial?) - (even) more tuning of parameters

tools, re-running ($$) - more algos, more tools (OS/commercial?) - (even) more tuning of parameters - BaaS? crowdsourcing (data, tools/tuning)? - other ML problems (recsys, NLP…)

so far we discussed performance + (some) system architecture but
for training only

APIs (and GUIs)

Cloud (MLaaS)

“people that know what they’re doing just use open source
[...] the same open source tools that the MLaaS services offer” - Bradford Cross

Real-Time Scoring

R/Python: - Slow(er) - Encoding of categ. variables

Kaggle

already pre-processed data less domain knowledge (or deliberately hidden) AUC
0.0001 increases "relevant" no business metric no actual deployment models too complex no online evaluation no monitoring data leakage

Tuning & AutoML

Ben Recht, Kevin Jamieson: http://www.argmin.net/2016/06/20/hypertuning/

Model Understanding, Accountability

Evaluation Metrics

More... “Will We Ever Get Over the Mess?” KDD panel
next

Machine Learning Software in Practice: Quo Vadi...

Machine Learning Software in Practice: Quo Vadis? - Invited Talk, KDD Conference, Applied Data Science Track - August 2017, Halifax, Canada

More Decks by szilard

Featured

Transcript