Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
PyData Meetup Group Presentation
Search
Jason Rudy
May 29, 2013
Programming
2
820
PyData Meetup Group Presentation
Presentation on py-earth to the San Francisco PyData Meetup group on 2013-05-29.
Jason Rudy
May 29, 2013
Tweet
Share
Other Decks in Programming
See All in Programming
izumin5210のプロポーザルのネタ探し #tskaigi_msup
izumin5210
1
510
DSPy入門 Pythonで実現する自動プロンプト最適化 〜人手によるプロンプト調整からの卒業〜
seaturt1e
1
470
The Past, Present, and Future of Enterprise Java
ivargrimstad
0
350
Rubyと楽しいをつくる / Creating joy with Ruby
chobishiba
0
200
Railsの気持ちを考えながらコントローラとビューを整頓する/tidying-rails-controllers-and-views-as-rails-think
moro
4
370
文字コードの話
qnighy
43
17k
API Platformを活用したPHPによる本格的なWeb API開発 / api-platform-book-intro
ttskch
1
120
Go 1.26でのsliceのメモリアロケーション最適化 / Go 1.26 リリースパーティ #go126party
mazrean
1
330
encoding/json/v2のUnmarshalはこう変わった:内部実装で見る設計改善
kurakura0916
0
290
日本だけで解禁されているアプリ起動の方法
ryunakayama
0
370
SourceGeneratorのマーカー属性問題について
htkym
0
130
株式会社 Sun terras カンパニーデック
sunterras
0
2k
Featured
See All Featured
Building a Scalable Design System with Sketch
lauravandoore
463
34k
Designing for Performance
lara
611
70k
The Invisible Side of Design
smashingmag
302
51k
Building Adaptive Systems
keathley
44
2.9k
コードの90%をAIが書く世界で何が待っているのか / What awaits us in a world where 90% of the code is written by AI
rkaga
60
42k
Ruling the World: When Life Gets Gamed
codingconduct
0
160
From Legacy to Launchpad: Building Startup-Ready Communities
dugsong
0
170
WENDY [Excerpt]
tessaabrams
9
36k
Visual Storytelling: How to be a Superhuman Communicator
reverentgeek
2
460
A Tale of Four Properties
chriscoyier
162
24k
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
9
1.2k
Statistics for Hackers
jakevdp
799
230k
Transcript
MARS in Python or A Tale of Two Planets 1
Outline • Motivating use case • MARS algorithm • Py-earth
• Examples 2
3
4
M A R S ultivariate daptive egression plines 5
Not MARS •MARSplines •MARegressionSplines •ARES •earth 6
7
HbA1c Age Gender Etc. Cost X X X X X
X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X 8
Constraints •Non-monotone relationships among variables •Interactions among predictors •Simple model
9
10
11
Illustration by Yi-Ke Peng 12
13
Python R My Brain Raw data processing Object-relational mapping Feature
extraction Plotting Bootstrapping Normalization Multivariate Adaptive Regression Splines 14
15
16
Regression: The search for f(x) yj = f ( x1j,
. . . , xnj) + ✏j 17
Linear Regression ˆ f ( x ) = a0 +
P X i=1 aixi 18
Multivariate Adaptive Regression Splines ˆ f ( x ) =
a0 + M X m=1 am Km Y k=1 ⇥ skm xv(k,m) tkm ⇤ + 19
Hinge Functions CDify h ( x t ) = [
x t ]+ = ( x t, x > t 0 , x t 20
Multivariate Adaptive Regression Splines ˆ f ( x ) =
a0 + M X m=1 am Km Y k=1 ⇥ skm xv(k,m) tkm ⇤ + 21
y = 1 2h (1 x ) + 1 2
h ( x 1) Multivariate Adaptive Regression Splines 22
Multivariate Adaptive Regression Splines y = h ( x 1)
h ( x 1) + h (1 x ) h (1 x ) 23
Multivariate Adaptive Regression Splines y = 2 + 0 .
1h ( x 1) + h (1 x ) + 3h ( x 1) h (4 x ) 24
Multivariate Example z = h ( 3 x ) +
h ( 3 x ) h (5 y ) 25
Multivariate Adaptive Regression Splines ˆ f ( x ) =
a0 + M X m=1 am Km Y k=1 ⇥ skm xv(k,m) tkm ⇤ + 26
Forward Pass Pruning Pass 27
Forward Pass • while True: • best_err = Infinity •
for each term, predictor, knot candidate: • err = get_squared_error(term, predictor, knot) • if err < best_err: • best_err = err • best_term, best_pred, best_knot = term, predictor, knot • add term pair for best_term, best_pred, best_knot • check stopping conditions 28
Forward Pass 1 Start Iteration 1 Iteration 2 h( x
t ) h( t x ) h( x t ) ⇥ h ( x s ) h( x t ) ⇥ h ( s x ) 29
Forward Pass • while True: • best_err = Infinity •
for each term, predictor, knot candidate: • err = get_squared_error(term, predictor, knot) • if err < best_err: • best_err = err • best_term, best_pred, best_knot = term, predictor, knot • add term pair for best_term, best_pred, best_knot • check stopping conditions 30
O N2P3 31
Forward Pass 1 Start Iteration 1 Iteration 2 h( x
t ) h( t x ) h( x t ) ⇥ h ( x s ) h( x t ) ⇥ h ( s x ) 32
Generalized Cross Validation GCV = 1 N PN i=1 [yi
ˆ yi]2 1 N2 (N Q d (Q 1))2 33
Pruning Pass • for i in range(num_terms): • best_score =
Infinity • for term in terms: • score = GCV(model \ term) • if score < best_score: • best_score = score • term_to_drop = term • remove term_to_drop from model • models[i] = model.copy() • scores[i] = score • selected_model = models[argmin(scores)] 34
Pruning Pass 1 h( x t ) h( t x
) h( x t ) ⇥ h ( x s ) h( x t ) ⇥ h ( s x ) 35
Final Model [yi ˆ yi]2 d(Q 1))2 y = a0
+ a1 h ( t x ) + a2 h ( x t ) h ( x s ) 36
37
Implementation Goals •Compatible with numpy ecosystem •Fast and reliable •Easy
to maintain 38
39
40
>git clone git://github.com/jcrudy/py-earth.git >cd py-earth >sudo python setup.py install Installation
41
Important Earth Methods •fit(X,y) •transform(X) •predict(X) 42
Simple Example 43
Simple Example 44
45
With Pandas 46
With Patsy 47
Classification 48
Classification 49
50
Future Plans •Documentation •Integrate into scikit-learn •Multiple responses •Sample weights
51
Summary • MARS is a simple but flexible regression method
• py-earth is MARS for Python data stack • Try it! 52
py-earth A far better thing than I have ever done
• https://github.com/jcrudy/py-earth 53