Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Machine Learning in Production with R or Python...
Search
szilard
June 08, 2017
0
190
Machine Learning in Production with R or Python - Budapest Data Forum - June 2017
szilard
June 08, 2017
Tweet
Share
More Decks by szilard
See All by szilard
Gradient Boosting Machines (GBM): From Zero to Hero (with R and Python Code) - Data Con LA - Oct 2020
szilard
0
190
Make Machine Learning Boring Again: Best Practices for Using Machine Learning in Businesses - Albuquerque Machine Learning Meetup (Online) - Aug 2020
szilard
0
140
Better than Deep Learning: Gradient Boosting Machines (GBM) - eRum conference - invited talk - June 2020
szilard
0
130
Gradient Boosting Machines (GBM): From Zero to Hero (with R and Python Code) - LA Data Science Meetup - February 2020
szilard
0
120
A Random Walk in Data Science and Machine Learning in Practice - CEU, Business Analytics Masters - Budapest, Febr 2020
szilard
0
300
Better than My Meetup/Conference Talks: Going Deeper in Various GBM Topics - GBM Advanced Workshop - Budapest, Nov 2019
szilard
0
82
Gradient Boosting Machines (GBM): From Zero to Hero (with R and Python Code) - Budapest BI Forum, Budapest, Nov 2019
szilard
0
150
Make Machine Learning Boring Again: Best Practices for Using Machine Learning in Businesses - LA Data Science Meetup - Playa Vista, August 2019
szilard
0
130
Better than Deep Learning: Gradient Boosting Machines (GBM) / 2019 edition - Budapest R and Data Science Meetups - Budapest, June 2019
szilard
0
97
Featured
See All Featured
GitHub's CSS Performance
jonrohan
1032
470k
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
11
930
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
285
14k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
12
1.3k
YesSQL, Process and Tooling at Scale
rocio
174
15k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
359
30k
VelocityConf: Rendering Performance Case Studies
addyosmani
333
24k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
10
670
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
31
9.7k
Embracing the Ebb and Flow
colly
88
4.9k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
32
1.8k
What's in a price? How to price your products and services
michaelherold
246
12k
Transcript
Machine Learning in Production Szilárd Pafka, PhD Chief Scientist, Epoch
Budapest Data Forum June 2017
Machine Learning in Production with R Szilárd Pafka, PhD Chief
Scientist, Epoch Budapest Data Forum June 2017
Machine Learning in Production with R or Python Szilárd Pafka,
PhD Chief Scientist, Epoch Budapest Data Forum June 2017
Machine Learning in Production with R or maybe Python Szilárd
Pafka, PhD Chief Scientist, Epoch Budapest Data Forum June 2017
None
Disclaimer: I am not representing my employer (Epoch) in this
talk I cannot confirm nor deny if Epoch is using any of the methods, tools, results etc. mentioned in this talk
None
http://datascience.la/meetup-machine-learning-in-production-with-szilard-pafka/
None
None
None
None
None
None
None
None
None
None
None
None
None
Aggregation 100M rows 1M groups Join 100M rows x 1M
rows time [s] time [s]
Aggregation 100M rows 1M groups Join 100M rows x 1M
rows time [s] time [s]
None
None
None
None
binary classification, 10M records numeric & categorical features, non-sparse
http://www.cs.cornell.edu/~alexn/papers/empirical.icml06.pdf http://lowrank.net/nikos/pubs/empirical.pdf
http://www.cs.cornell.edu/~alexn/papers/empirical.icml06.pdf http://lowrank.net/nikos/pubs/empirical.pdf
None
None
None
None
None
EC2
n = 10K, 100K, 1M, 10M, 100M Training time RAM
usage AUC CPU % by core read data, pre-process, score test data
n = 10K, 100K, 1M, 10M, 100M Training time RAM
usage AUC CPU % by core read data, pre-process, score test data
None
None
None
None
None
None
None
10x
None
None
None
None
None
http://datascience.la/benchmarking-random-forest-implementations/#comment-53599
None
None
None
None
None
None
None
Best linear: 71.1
None
None
learn_rate = 0.1, max_depth = 6, n_trees = 300 learn_rate
= 0.01, max_depth = 16, n_trees = 1000
None
None
None
None
None
None
None
None
None
None
None
None
None
None
...
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None