Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PyData Meetup Group Presentation

PyData Meetup Group Presentation

Presentation on py-earth to the San Francisco PyData Meetup group on 2013-05-29.

Jason Rudy

May 29, 2013
Tweet

Other Decks in Programming

Transcript

  1. 3

  2. 4

  3. 7

  4. HbA1c Age Gender Etc. Cost X X X X X

    X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X 8
  5. 10

  6. 11

  7. 13

  8. Python R My Brain Raw data processing Object-relational mapping Feature

    extraction Plotting Bootstrapping Normalization Multivariate Adaptive Regression Splines 14
  9. 15

  10. 16

  11. Multivariate Adaptive Regression Splines ˆ f ( x ) =

    a0 + M X m=1 am Km Y k=1 ⇥ skm xv(k,m) tkm ⇤ + 19
  12. Hinge Functions CDify h ( x t ) = [

    x t ]+ = ( x t, x > t 0 , x  t 20
  13. Multivariate Adaptive Regression Splines ˆ f ( x ) =

    a0 + M X m=1 am Km Y k=1 ⇥ skm xv(k,m) tkm ⇤ + 21
  14. y = 1 2h (1 x ) + 1 2

    h ( x 1) Multivariate Adaptive Regression Splines 22
  15. Multivariate Adaptive Regression Splines y = 2 + 0 .

    1h ( x 1) + h (1 x ) + 3h ( x 1) h (4 x ) 24
  16. Multivariate Example z = h ( 3 x ) +

    h ( 3 x ) h (5 y ) 25
  17. Multivariate Adaptive Regression Splines ˆ f ( x ) =

    a0 + M X m=1 am Km Y k=1 ⇥ skm xv(k,m) tkm ⇤ + 26
  18. Forward Pass • while True: • best_err = Infinity •

    for each term, predictor, knot candidate: • err = get_squared_error(term, predictor, knot) • if err < best_err: • best_err = err • best_term, best_pred, best_knot = term, predictor, knot • add term pair for best_term, best_pred, best_knot • check stopping conditions 28
  19. Forward Pass 1 Start Iteration 1 Iteration 2 h( x

    t ) h( t x ) h( x t ) ⇥ h ( x s ) h( x t ) ⇥ h ( s x ) 29
  20. Forward Pass • while True: • best_err = Infinity •

    for each term, predictor, knot candidate: • err = get_squared_error(term, predictor, knot) • if err < best_err: • best_err = err • best_term, best_pred, best_knot = term, predictor, knot • add term pair for best_term, best_pred, best_knot • check stopping conditions 30
  21. Forward Pass 1 Start Iteration 1 Iteration 2 h( x

    t ) h( t x ) h( x t ) ⇥ h ( x s ) h( x t ) ⇥ h ( s x ) 32
  22. Generalized Cross Validation GCV = 1 N PN i=1 [yi

    ˆ yi]2 1 N2 (N Q d (Q 1))2 33
  23. Pruning Pass • for i in range(num_terms): • best_score =

    Infinity • for term in terms: • score = GCV(model \ term) • if score < best_score: • best_score = score • term_to_drop = term • remove term_to_drop from model • models[i] = model.copy() • scores[i] = score • selected_model = models[argmin(scores)] 34
  24. Pruning Pass 1 h( x t ) h( t x

    ) h( x t ) ⇥ h ( x s ) h( x t ) ⇥ h ( s x ) 35
  25. Final Model [yi ˆ yi]2 d(Q 1))2 y = a0

    + a1 h ( t x ) + a2 h ( x t ) h ( x s ) 36
  26. 37

  27. 39

  28. 40

  29. 45

  30. 50

  31. Summary • MARS is a simple but flexible regression method

    • py-earth is MARS for Python data stack • Try it! 52
  32. py-earth A far better thing than I have ever done

    • https://github.com/jcrudy/py-earth 53