Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
PyData Meetup Group Presentation
Search
Jason Rudy
May 29, 2013
Programming
2
800
PyData Meetup Group Presentation
Presentation on py-earth to the San Francisco PyData Meetup group on 2013-05-29.
Jason Rudy
May 29, 2013
Tweet
Share
Other Decks in Programming
See All in Programming
AWS Lambda functions with C# 用の Dev Container Template を作ってみた件
mappie_kochi
0
230
Rubyでつくるパケットキャプチャツール
ydah
1
650
昭和の職場からアジャイルの世界へ
kumagoro95
1
160
traP の部内 ISUCON とそれを支えるポータル / PISCON Portal
ikura_hamu
0
240
社内フレームワークとその依存性解決 / in-house framework and its dependency management
vvakame
1
520
Multi Step Form, Decentralized Autonomous Organization
pumpkiinbell
1
220
さいきょうのレイヤードアーキテクチャについて考えてみた
yahiru
3
650
時計仕掛けのCompose
mkeeda
1
250
Lookerは可視化だけじゃない。UIコンポーネントもあるんだ!
ymd65536
1
150
富山発の個人開発サービスで日本中の学校の業務を改善した話
krpk1900
4
340
カンファレンス動画鑑賞会のススメ / Osaka.swift #1
hironytic
0
210
[Fin-JAWS 第38回 ~re:Invent 2024 金融re:Cap~]FaultInjectionServiceアップデート@pre:Invent2024
shintaro_fukatsu
0
390
Featured
See All Featured
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
232
17k
Building a Modern Day E-commerce SEO Strategy
aleyda
38
7.1k
Navigating Team Friction
lara
183
15k
How to Ace a Technical Interview
jacobian
276
23k
Art, The Web, and Tiny UX
lynnandtonic
298
20k
Rebuilding a faster, lazier Slack
samanthasiow
79
8.8k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
160
15k
Rails Girls Zürich Keynote
gr2m
94
13k
StorybookのUI Testing Handbookを読んだ
zakiyama
28
5.5k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
47
5.2k
Raft: Consensus for Rubyists
vanstee
137
6.8k
How To Stay Up To Date on Web Technology
chriscoyier
790
250k
Transcript
MARS in Python or A Tale of Two Planets 1
Outline • Motivating use case • MARS algorithm • Py-earth
• Examples 2
3
4
M A R S ultivariate daptive egression plines 5
Not MARS •MARSplines •MARegressionSplines •ARES •earth 6
7
HbA1c Age Gender Etc. Cost X X X X X
X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X 8
Constraints •Non-monotone relationships among variables •Interactions among predictors •Simple model
9
10
11
Illustration by Yi-Ke Peng 12
13
Python R My Brain Raw data processing Object-relational mapping Feature
extraction Plotting Bootstrapping Normalization Multivariate Adaptive Regression Splines 14
15
16
Regression: The search for f(x) yj = f ( x1j,
. . . , xnj) + ✏j 17
Linear Regression ˆ f ( x ) = a0 +
P X i=1 aixi 18
Multivariate Adaptive Regression Splines ˆ f ( x ) =
a0 + M X m=1 am Km Y k=1 ⇥ skm xv(k,m) tkm ⇤ + 19
Hinge Functions CDify h ( x t ) = [
x t ]+ = ( x t, x > t 0 , x t 20
Multivariate Adaptive Regression Splines ˆ f ( x ) =
a0 + M X m=1 am Km Y k=1 ⇥ skm xv(k,m) tkm ⇤ + 21
y = 1 2h (1 x ) + 1 2
h ( x 1) Multivariate Adaptive Regression Splines 22
Multivariate Adaptive Regression Splines y = h ( x 1)
h ( x 1) + h (1 x ) h (1 x ) 23
Multivariate Adaptive Regression Splines y = 2 + 0 .
1h ( x 1) + h (1 x ) + 3h ( x 1) h (4 x ) 24
Multivariate Example z = h ( 3 x ) +
h ( 3 x ) h (5 y ) 25
Multivariate Adaptive Regression Splines ˆ f ( x ) =
a0 + M X m=1 am Km Y k=1 ⇥ skm xv(k,m) tkm ⇤ + 26
Forward Pass Pruning Pass 27
Forward Pass • while True: • best_err = Infinity •
for each term, predictor, knot candidate: • err = get_squared_error(term, predictor, knot) • if err < best_err: • best_err = err • best_term, best_pred, best_knot = term, predictor, knot • add term pair for best_term, best_pred, best_knot • check stopping conditions 28
Forward Pass 1 Start Iteration 1 Iteration 2 h( x
t ) h( t x ) h( x t ) ⇥ h ( x s ) h( x t ) ⇥ h ( s x ) 29
Forward Pass • while True: • best_err = Infinity •
for each term, predictor, knot candidate: • err = get_squared_error(term, predictor, knot) • if err < best_err: • best_err = err • best_term, best_pred, best_knot = term, predictor, knot • add term pair for best_term, best_pred, best_knot • check stopping conditions 30
O N2P3 31
Forward Pass 1 Start Iteration 1 Iteration 2 h( x
t ) h( t x ) h( x t ) ⇥ h ( x s ) h( x t ) ⇥ h ( s x ) 32
Generalized Cross Validation GCV = 1 N PN i=1 [yi
ˆ yi]2 1 N2 (N Q d (Q 1))2 33
Pruning Pass • for i in range(num_terms): • best_score =
Infinity • for term in terms: • score = GCV(model \ term) • if score < best_score: • best_score = score • term_to_drop = term • remove term_to_drop from model • models[i] = model.copy() • scores[i] = score • selected_model = models[argmin(scores)] 34
Pruning Pass 1 h( x t ) h( t x
) h( x t ) ⇥ h ( x s ) h( x t ) ⇥ h ( s x ) 35
Final Model [yi ˆ yi]2 d(Q 1))2 y = a0
+ a1 h ( t x ) + a2 h ( x t ) h ( x s ) 36
37
Implementation Goals •Compatible with numpy ecosystem •Fast and reliable •Easy
to maintain 38
39
40
>git clone git://github.com/jcrudy/py-earth.git >cd py-earth >sudo python setup.py install Installation
41
Important Earth Methods •fit(X,y) •transform(X) •predict(X) 42
Simple Example 43
Simple Example 44
45
With Pandas 46
With Patsy 47
Classification 48
Classification 49
50
Future Plans •Documentation •Integrate into scikit-learn •Multiple responses •Sample weights
51
Summary • MARS is a simple but flexible regression method
• py-earth is MARS for Python data stack • Try it! 52
py-earth A far better thing than I have ever done
• https://github.com/jcrudy/py-earth 53