Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Py-Earth : Multivariate Adaptive Regression Splines in Python

Mehdi
October 25, 2015

Py-Earth : Multivariate Adaptive Regression Splines in Python

Mehdi

October 25, 2015
Tweet

More Decks by Mehdi

Other Decks in Science

Transcript

  1. ; Mehdi Cherti (Appstat, LAL/CNRS) Supervised by Balázs Kégl (LAL/CNRS)

    and Alexandre Gramfort (CNRS LTCI) Py-Earth: Multivariate Adaptive Regression Splines ( MARS) in Python October 26, 2015 1 / 25
  2. ; Py-Earth: Multivariate Adaptive Regression Splines ( MARS) in Python

    Mehdi Cherti (Appstat, LAL/CNRS) Supervised by Balázs Kégl (LAL/CNRS) and Alexandre Gramfort (CNRS LTCI) October 26, 2015 Mehdi Cherti (Appstat, LAL/CNRS) Supervised by Balázs Kégl (LAL/CNRS) and Alexandre Gramfort (CNRS LTCI) Py-Earth: Multivariate Adaptive Regression Splines ( MARS) in Python October 26, 2015 1 / 25
  3. Introduction MARS is a regression technique for high dimensional data:

    introduced rst by Jerome H.Friedman in 1991 it is non-linear and non-parametric Py-earth is an implementation of MARS in Python, created by Jason Crudy The goal is to bring Py-earth into scikit-learn October 26, 2015 2 / 25
  4. Introduction MARS is a regression technique for high dimensional data:

    introduced rst by Jerome H.Friedman in 1991 it is non-linear and non-parametric Py-earth is an implementation of MARS in Python, created by Jason Crudy The goal is to bring Py-earth into scikit-learn October 26, 2015 2 / 25
  5. Introduction MARS is a regression technique for high dimensional data:

    introduced rst by Jerome H.Friedman in 1991 it is non-linear and non-parametric Py-earth is an implementation of MARS in Python, created by Jason Crudy The goal is to bring Py-earth into scikit-learn October 26, 2015 2 / 25
  6. Introduction MARS is a regression technique for high dimensional data:

    introduced rst by Jerome H.Friedman in 1991 it is non-linear and non-parametric Py-earth is an implementation of MARS in Python, created by Jason Crudy The goal is to bring Py-earth into scikit-learn October 26, 2015 2 / 25
  7. Introduction MARS is a regression technique for high dimensional data:

    introduced rst by Jerome H.Friedman in 1991 it is non-linear and non-parametric Py-earth is an implementation of MARS in Python, created by Jason Crudy The goal is to bring Py-earth into scikit-learn October 26, 2015 2 / 25
  8. Setup Setup : Multivariate regression with multiple outputs : (X

    i , Y i ) where i is the ith example Xi is a vector describing each example i Yi is a real-valued vector describing the true outputs of the example i We want to nd a model f which predicts Y from X with low generalization mean squared error (MSE) October 26, 2015 3 / 25
  9. Setup Setup : Multivariate regression with multiple outputs : (X

    i , Y i ) where i is the ith example Xi is a vector describing each example i Yi is a real-valued vector describing the true outputs of the example i We want to nd a model f which predicts Y from X with low generalization mean squared error (MSE) October 26, 2015 3 / 25
  10. Setup Setup : Multivariate regression with multiple outputs : (X

    i , Y i ) where i is the ith example Xi is a vector describing each example i Yi is a real-valued vector describing the true outputs of the example i We want to nd a model f which predicts Y from X with low generalization mean squared error (MSE) October 26, 2015 3 / 25
  11. Setup Setup : Multivariate regression with multiple outputs : (X

    i , Y i ) where i is the ith example Xi is a vector describing each example i Yi is a real-valued vector describing the true outputs of the example i We want to nd a model f which predicts Y from X with low generalization mean squared error (MSE) October 26, 2015 3 / 25
  12. How does MARS work ? Basic building block : Hinge

    functions, y = max(x − k, 0) or y = max(k − x, 0) October 26, 2015 4 / 25
  13. How does MARS work ? The algorithm constructs adaptively a

    set of basis functions : B k (X) Each basis function is a product of hinge functions, for instance : Bk(X) = max(X1 − 5, 0)max(X2 + 4, 0) The model is a linear combination of those basis functions Y = K k=1 αkBk(X) The specicity of MARS comes from how the basis functions are created and added to the model October 26, 2015 5 / 25
  14. How does MARS work ? The algorithm constructs adaptively a

    set of basis functions : B k (X) Each basis function is a product of hinge functions, for instance : Bk(X) = max(X1 − 5, 0)max(X2 + 4, 0) The model is a linear combination of those basis functions Y = K k=1 αkBk(X) The specicity of MARS comes from how the basis functions are created and added to the model October 26, 2015 5 / 25
  15. How does MARS work ? The algorithm constructs adaptively a

    set of basis functions : B k (X) Each basis function is a product of hinge functions, for instance : Bk(X) = max(X1 − 5, 0)max(X2 + 4, 0) The model is a linear combination of those basis functions Y = K k=1 αkBk(X) The specicity of MARS comes from how the basis functions are created and added to the model October 26, 2015 5 / 25
  16. How does MARS work ? The algorithm constructs adaptively a

    set of basis functions : B k (X) Each basis function is a product of hinge functions, for instance : Bk(X) = max(X1 − 5, 0)max(X2 + 4, 0) The model is a linear combination of those basis functions Y = K k=1 αkBk(X) The specicity of MARS comes from how the basis functions are created and added to the model October 26, 2015 5 / 25
  17. How does MARS work ? The algorithm constructs adaptively a

    set of basis functions : B k (X) Each basis function is a product of hinge functions, for instance : Bk(X) = max(X1 − 5, 0)max(X2 + 4, 0) The model is a linear combination of those basis functions Y = K k=1 αkBk(X) The specicity of MARS comes from how the basis functions are created and added to the model October 26, 2015 5 / 25
  18. How does MARS work ? The algorithm constructs adaptively a

    set of basis functions : B k (X) Each basis function is a product of hinge functions, for instance : Bk(X) = max(X1 − 5, 0)max(X2 + 4, 0) The model is a linear combination of those basis functions Y = K k=1 αkBk(X) The specicity of MARS comes from how the basis functions are created and added to the model October 26, 2015 5 / 25
  19. How does MARS work ? Two steps, the forward pass

    and the pruning pass We over-generate a set of basis functions in the forward pass We prune unncessary basis functions in the pruning pass October 26, 2015 6 / 25
  20. How does MARS work ? Two steps, the forward pass

    and the pruning pass We over-generate a set of basis functions in the forward pass We prune unncessary basis functions in the pruning pass October 26, 2015 6 / 25
  21. How does MARS work ? Two steps, the forward pass

    and the pruning pass We over-generate a set of basis functions in the forward pass We prune unncessary basis functions in the pruning pass October 26, 2015 6 / 25
  22. How does MARS work ? The forward pass y =

    (α0 + α1 max(x1 − 3, 0) + α2 max(3 − x1, 0)+ α3 max(x1 − 3, 0)max(x2 − 7, 0) + α4 max(x1 − 3, 0)max(7 − x2, 0)+ α5 max(x2 − 12, 0) + α6 max(12 − x2, 0)) (1) October 26, 2015 11 / 25
  23. What have been done ? Github repo : https://github.com/jcrudy/py-earth git

    clone https://github.com/jcrudy/py-earth cd py-earth python setup.py install The state of the code: Py-earth supported already a lot of features and the important parts were there However, it was not ready to be deployable to scikit-learn it was not supporting multiple outputs October 26, 2015 12 / 25
  24. What have been done ? Github repo : https://github.com/jcrudy/py-earth git

    clone https://github.com/jcrudy/py-earth cd py-earth python setup.py install The state of the code: Py-earth supported already a lot of features and the important parts were there However, it was not ready to be deployable to scikit-learn it was not supporting multiple outputs October 26, 2015 12 / 25
  25. What have been done ? improve code quality  Clean

    the code (pep8) and adapt it to coding guidelines of scikit-learn  Enchance documentation  Add more unit tests October 26, 2015 13 / 25
  26. What have been done ? improve code quality  Clean

    the code (pep8) and adapt it to coding guidelines of scikit-learn  Enchance documentation  Add more unit tests October 26, 2015 13 / 25
  27. What have been done ? improve code quality  Clean

    the code (pep8) and adapt it to coding guidelines of scikit-learn  Enchance documentation  Add more unit tests October 26, 2015 13 / 25
  28. What have been done ? new features Support for multiple

    outputs Support for output weights Support of estimation of variable importance Implement FastMARS (Jerome H.Friedman, 1993) October 26, 2015 14 / 25
  29. What have been done ? new features Support for multiple

    outputs Support for output weights Support of estimation of variable importance Implement FastMARS (Jerome H.Friedman, 1993) October 26, 2015 14 / 25
  30. What have been done ? new features Support for multiple

    outputs Support for output weights Support of estimation of variable importance Implement FastMARS (Jerome H.Friedman, 1993) October 26, 2015 14 / 25
  31. What have been done ? new features Support for multiple

    outputs Support for output weights Support of estimation of variable importance Implement FastMARS (Jerome H.Friedman, 1993) October 26, 2015 14 / 25
  32. Example : multiple outputs 20 inputs, 3 outputs, only 2

    informative inputs, the rest is noise October 26, 2015 18 / 25
  33. Example : multiple outputs The graph (a crop of it)

    of basis functions looks like this : October 26, 2015 19 / 25
  34. Example : variable importance y = sin(πx0 x1) + 20(x2

    − 0.5)2 + 10x3 + 5x4 + 5 ∗ N(0, 1) The code in the previous slide gives: October 26, 2015 21 / 25
  35. Future Close current issues, keep working on code quality to

    merge it into scikit-learn Still, some features are missing, new features: Deal with missing values Support categorical variables October 26, 2015 24 / 25
  36. Future Close current issues, keep working on code quality to

    merge it into scikit-learn Still, some features are missing, new features: Deal with missing values Support categorical variables October 26, 2015 24 / 25
  37. Future Close current issues, keep working on code quality to

    merge it into scikit-learn Still, some features are missing, new features: Deal with missing values Support categorical variables October 26, 2015 24 / 25