Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Nx-Powered Decision Trees

Andrés Alejos
September 08, 2023

Nx-Powered Decision Trees

Decision Trees are an extremely popular class of supervised Machine Learning techniques that are beginner-friendly due to their intuitive and digestible design. However, they have also stood the test of time as one of the preeminent techniques for learning structured tabular data. This type of data, common among spreadsheets and relational databases, is still the gold standard used in business and enterprise environments and has now made its way to Elixir.

This talk introduces EXGBoost + Mockingjay, a Gradient Boosted Decision Tree library paired with a compilation library that converts trained Decision Trees to Nx tensor operations. We will discuss how both libraries work and work through some examples of using the libraries. Next, we will look at using trained Decision Trees in a scalable production environment using Nx’s Serving capability and a Phoenix web app. Finally, we will look at the future of Machine Learning in Elixir and how bringing this next class of machine learning techniques to the language benefits it.

Andrés Alejos

September 08, 2023
Tweet

More Decks by Andrés Alejos

Other Decks in Technology

Transcript

  1. Disclaimer The views expressed in this presentation are that of

    the speaker and do not represent the views of the United States Government, the DoD, or ARCYBER.
  2. Agenda Intro to Decision Trees Training & Prediction Compiling a

    Decision Tree Overfitting & Ensemble Trees EXGBoost + Mockingjay Livebook Demo
  3. Guess Who? • Each player chooses a character who the

    opposing player must guess • Trade turns asking Yes / No questions • 1st to correctly guess opponent’s character wins • Strategy? • Ask narrow question (“does your character’s name start with a ‘K’?) • High risk / High reward • Ask broad questions • Consistent • Binary search guarantees win in guesses log2 (Characters)
  4. Information Theory • The mathematical study of the quantification, storage,

    and communication of information. • Maximize information gained from asking a question • Ask questions that minimize the uncertainty in the system • Used throughout Decision Tree learning to build the “best” decision tree
  5. Decision Trees • Uses classical tree structure to perform prediction

    tasks • Classification & Regression • Predicts the value of a target variable by learning simple decision rules inferred from the data features. • A tree can be seen as a piecewise constant approximation.
  6. Attribute Name Role Type sepal length Feature Continuous sepal width

    Feature Continuous petal length Feature Continuous petal width Feature Continuous class Target Categorical How might you go about separating these classes?
  7. petal length (cm) ≤ 2.45 gini = 0.6667 samples =

    150 value = [50, 50, 50] class = setosa gini = 0.0 samples = 50 value = [50, 0, 0] class = setosa True petal width (cm) ≤ 1.75 gini = 0.5 samples = 100 value = [0, 50, 50] class = versicolor False petal length (cm) ≤ 4.95 gini = 0.168 samples = 54 value = [0, 49, 5] class = versicolor petal length (cm) ≤ 4.85 gini = 0.0425 samples = 46 value = [0, 1, 45] class = virginica petal width (cm) ≤ 1.65 gini = 0.0408 samples = 48 value = [0, 47, 1] class = versicolor petal width (cm) ≤ 1.55 gini = 0.4444 samples = 6 value = [0, 2, 4] class = virginica gini = 0.0 samples = 47 value = [0, 47, 0] class = versicolor gini = 0.0 samples = 1 value = [0, 0, 1] class = virginica gini = 0.0 samples = 3 value = [0, 0, 3] class = virginica sepal length (cm) ≤ 6.95 gini = 0.4444 samples = 3 value = [0, 2, 1] class = versicolor gini = 0.0 samples = 2 value = [0, 2, 0] class = versicolor gini = 0.0 samples = 1 value = [0, 0, 1] class = virginica sepal length (cm) ≤ 5.95 gini = 0.4444 samples = 3 value = [0, 1, 2] class = virginica gini = 0.0 samples = 43 value = [0, 0, 43] class = virginica gini = 0.0 samples = 1 value = [0, 1, 0] class = versicolor gini = 0.0 samples = 2 value = [0, 0, 2] class = virginica
  8. Agenda Intro to Decision Trees Training & Prediction Compiling a

    Decision Tree Overfitting & Ensemble Trees EXGBoost + Mockingjay Livebook Demo
  9. Decision Tree Training • Many ways to build a tree,

    but the focus is on the Selection Criterion, which decides which attribute to split on at each level. • Common Regression criteria are Mean Squared, Absolute Error • Some common Selection Criteria for Classification tasks InformationGain(S, A) = Entropy(S) − ∑ v∈values(A) |Sv | |S| Entropy(Sv ) χ2 = k ∑ i=1 (oi − ei )2 ei *Gini Gain is the same concept, but swap Entropy for Score
  10. Decision Tree Training • Recursively build subtrees, finding the best

    attribute according to your selection criterion • Stop building when: • No training samples left • All remaining examples belong to the same class • No features left on which to split
  11. Decision Tree Prediction F1 F2 F3 F4 F5 0.1 4.6

    1.9 0.8 3.5 0.2 1.5 2.1 0.4 6.0 Prediction
  12. Decision Tree Prediction F1 F2 F3 F4 F5 0.1 4.6

    1.9 0.8 3.5 0.2 1.5 2.1 0.4 6.0 Prediction
  13. Decision Tree Prediction F1 F2 F3 F4 F5 0.1 4.6

    1.9 0.8 3.5 0.2 1.5 2.1 0.4 6.0 Prediction
  14. Decision Tree Prediction F1 F2 F3 F4 F5 0.1 4.6

    1.9 0.8 3.5 0.2 1.5 2.1 0.4 6.0 Prediction
  15. Decision Tree Prediction F1 F2 F3 F4 F5 0.1 4.6

    1.9 0.8 3.5 0.2 1.5 2.1 0.4 6.0 Prediction
  16. Decision Tree Prediction F1 F2 F3 F4 F5 0.1 4.6

    1.9 0.8 3.5 0.2 1.5 2.1 0.4 6.0 Prediction
  17. Decision Tree Prediction F1 F2 F3 F4 F5 0.1 4.6

    1.9 0.8 3.5 0.2 1.5 2.1 0.4 6.0 Prediction
  18. Decision Tree Prediction F1 F2 F3 F4 F5 0.1 4.6

    1.9 0.8 3.5 0.2 1.5 2.1 0.4 6.0 Prediction C2
  19. Decision Tree Prediction F1 F2 F3 F4 F5 0.1 4.6

    1.9 0.8 3.5 0.2 1.5 2.1 0.4 6.0 Prediction C2
  20. Decision Tree Prediction F1 F2 F3 F4 F5 0.1 4.6

    1.9 0.8 3.5 0.2 1.5 2.1 0.4 6.0 Prediction C2
  21. Decision Tree Prediction F1 F2 F3 F4 F5 0.1 4.6

    1.9 0.8 3.5 0.2 1.5 2.1 0.4 6.0 Prediction C2
  22. Decision Tree Prediction F1 F2 F3 F4 F5 0.1 4.6

    1.9 0.8 3.5 0.2 1.5 2.1 0.4 6.0 Prediction C2
  23. Decision Tree Prediction F1 F2 F3 F4 F5 0.1 4.6

    1.9 0.8 3.5 0.2 1.5 2.1 0.4 6.0 Prediction C2
  24. Decision Tree Prediction F1 F2 F3 F4 F5 0.1 4.6

    1.9 0.8 3.5 0.2 1.5 2.1 0.4 6.0 Prediction C2 C1
  25. Decision Tree Prediction F1 F2 F3 F4 F5 0.1 4.6

    1.9 0.8 3.5 0.2 1.5 2.1 0.4 6.0 Prediction
  26. Decision Tree Prediction What problems do you notice from this

    process? F1 F2 F3 F4 F5 0.1 4.6 1.9 0.8 3.5 0.2 1.5 2.1 0.4 6.0 Prediction
  27. Decision Tree Prediction What problems do you notice from this

    process? Sequential in nature — Must traverse the whole path to find class F1 F2 F3 F4 F5 0.1 4.6 1.9 0.8 3.5 0.2 1.5 2.1 0.4 6.0 Prediction
  28. Decision Tree Prediction What problems do you notice from this

    process? Sequential in nature — Must traverse the whole path to find class Parallelization not conducive to operations on GPU / TPU F1 F2 F3 F4 F5 0.1 4.6 1.9 0.8 3.5 0.2 1.5 2.1 0.4 6.0 Prediction
  29. Decision Tree Prediction What problems do you notice from this

    process? Sequential in nature — Must traverse the whole path to find class Parallelization not conducive to operations on GPU / TPU Model structure “optimized” for training F1 F2 F3 F4 F5 0.1 4.6 1.9 0.8 3.5 0.2 1.5 2.1 0.4 6.0 Prediction
  30. Decision Tree Prediction What problems do you notice from this

    process? Sequential in nature — Must traverse the whole path to find class Parallelization not conducive to operations on GPU / TPU Model structure “optimized” for training Do we need the same data structures for inference? F1 F2 F3 F4 F5 0.1 4.6 1.9 0.8 3.5 0.2 1.5 2.1 0.4 6.0 Prediction
  31. Decision Tree Prediction What problems do you notice from this

    process? Sequential in nature — Must traverse the whole path to find class Parallelization not conducive to operations on GPU / TPU Model structure “optimized” for training Do we need the same data structures for inference? F1 F2 F3 F4 F5 0.1 4.6 1.9 0.8 3.5 0.2 1.5 2.1 0.4 6.0 Prediction C2 C1
  32. Agenda Intro to Decision Trees Training & Prediction Compiling a

    Decision Tree Overfitting & Ensemble Trees EXGBoost + Mockingjay Livebook Demo
  33. Compiling a Decision Tree Given a Trained decision tree, we

    can convert the series of decisions into matrix operations https://scnakandala.github.io/papers/TR_2020_Hummingbird.pdf
  34. Compiling a Decision Tree Given a Trained decision tree, we

    can convert the series of decisions into matrix operations Allows for batching of predictions https://scnakandala.github.io/papers/TR_2020_Hummingbird.pdf
  35. Compiling a Decision Tree Given a Trained decision tree, we

    can convert the series of decisions into matrix operations Allows for batching of predictions Can be parallelized on GPU https://scnakandala.github.io/papers/TR_2020_Hummingbird.pdf
  36. Compiling a Decision Tree Given a Trained decision tree, we

    can convert the series of decisions into matrix operations Allows for batching of predictions Can be parallelized on GPU https://scnakandala.github.io/papers/TR_2020_Hummingbird.pdf
  37. Compiling a Decision Tree Given a Trained decision tree, we

    can convert the series of decisions into matrix operations Allows for batching of predictions Can be parallelized on GPU F1 F2 F3 F4 F5 0.1 4.6 1.9 0.8 3.5 Prediction C2 https://scnakandala.github.io/papers/TR_2020_Hummingbird.pdf
  38. Agenda Intro to Decision Trees Training & Prediction Compiling a

    Decision Tree Overfitting & Ensemble Trees EXGBoost + Mockingjay Livebook Demo
  39. Avoiding Overfitting • Full Tree Pruning - Trees we’ve talked

    about thus far are “full” trees • Pre-Pruning - Use heuristic to determine if a subtree is worth building • Post-Pruning - Build full tree and retroactively prune uninformative splits • Ensemble Methods - Train many “weak” learners (trees) and aggregate their predictions • Random Forest • Gradient-Boosted Decision Trees
  40. Agenda Intro to Decision Trees Training & Prediction Compiling a

    Decision Tree Overfitting & Ensemble Trees EXGBoost + Mockingjay Livebook Demo
  41. XGBoost “XGBoost is an optimized distributed gradient boosting library designed

    to be highly efficient, flexible and portable. It implements machine learning algorithms under the Gradient Boosting framework.”
  42. XGBoost “XGBoost is an optimized distributed gradient boosting library designed

    to be highly efficient, flexible and portable. It implements machine learning algorithms under the Gradient Boosting framework.”
  43. XGBoost “XGBoost is an optimized distributed gradient boosting library designed

    to be highly efficient, flexible and portable. It implements machine learning algorithms under the Gradient Boosting framework.”
  44. XGBoost “XGBoost is an optimized distributed gradient boosting library designed

    to be highly efficient, flexible and portable. It implements machine learning algorithms under the Gradient Boosting framework.”
  45. XGBoost “XGBoost is an optimized distributed gradient boosting library designed

    to be highly efficient, flexible and portable. It implements machine learning algorithms under the Gradient Boosting framework.”
  46. XGBoost “XGBoost is an optimized distributed gradient boosting library designed

    to be highly efficient, flexible and portable. It implements machine learning algorithms under the Gradient Boosting framework.”
  47. XGBoost “XGBoost is an optimized distributed gradient boosting library designed

    to be highly efficient, flexible and portable. It implements machine learning algorithms under the Gradient Boosting framework.”
  48. XGBoost “XGBoost is an optimized distributed gradient boosting library designed

    to be highly efficient, flexible and portable. It implements machine learning algorithms under the Gradient Boosting framework.”
  49. EXGBoost + Mockingjay EXGBoost Elixir bindings to XGBoost C API

    using Native Implemented Functions (NIFs)
  50. EXGBoost + Mockingjay EXGBoost Elixir bindings to XGBoost C API

    using Native Implemented Functions (NIFs) Mockingjay Decision Tree compilation
  51. EXGBoost + Mockingjay EXGBoost Elixir bindings to XGBoost C API

    using Native Implemented Functions (NIFs) Mockingjay Decision Tree compilation Distributed Decision Tree inference serving
  52. EXGBoost • GitHub: https:// github.com/acalejos/ exgboost • Documentation: https:// hexdocs.pm/exgboost/

    EXGBoost.html • Current Version: v0.3.1 • Notebook examples available at https:// github.com/acalejos/ exgboost/tree/main/ notebooks • Inputs & Outputs == Nx.Tensor • Booster model is opaque reference to NIF struct
  53. Mockingjay • GitHub: https:// github.com/acalejos/ mockingjay • Documentation: https:// hexdocs.pm/mockingjay/

    Mockingjay.html • Current Version: v0.1.0 • Based on Microsoft’s Hummingbird Library • Top-level AP consists of single `convert` function • Exposes DecisionTree Protocol for extensible inputs
  54. Agenda Intro to Decision Trees Training & Prediction Compiling a

    Decision Tree Overfitting & Ensemble Trees EXGBoost + Mockingjay Livebook Demo