Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Interpretable ML & Financial Machine Learning (I)

Yu-Chen, Den
November 09, 2024

Interpretable ML & Financial Machine Learning (I)

A slide that makes you know what you're doing on Machine Learning projects

Yu-Chen, Den

November 09, 2024
Tweet

More Decks by Yu-Chen, Den

Other Decks in Science

Transcript

  1. ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University Interpretable

    ML Interpretable ML & Financial Machine Learning (I) Author | Yu-Chen (Abner) Den Date | Oct 13th, 2024
  2. Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

    Interpretable ML • Preface • How decision tree do decisions? • Ensemble methods • ML in Finance / Trading (I) Outline 2 |
  3. Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

    Interpretable ML • Preface • How decision tree do decisions? • Ensemble methods • ML in Finance / Trading (I) Outline 3 |
  4. Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

    Interpretable ML Preface • Objectives ◦ Understands what happens during training & how model works ◦ Learn how to tune hyperparameters in reason ◦ Explore how to make model better • Coverage ◦ Some code implementation of machine learning models & feature engineering methods ◦ A little bit about the engineering side of ML projects, just know that being a DS / MLE / Quant in top-tier firm, you’ll need those skills and tools to enhance your efficiency. ◦ Epiphanies of ML from my observations of life ◦ Models will not be covered: SVM, Naive Bayes, Linear regression, Logistic regression ◦ Course materials: https://github.com/AbnerTeng/better-ml 4 |
  5. Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

    Interpretable ML Preface • Why interpretability matters? ◦ Better decision making ◦ Debugging & improvement ◦ Reliable model output 5 |
  6. Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

    Interpretable ML • Preface • How decision tree do decisions? • Ensemble methods • ML in Finance / Trading (I) Outline 6 |
  7. Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

    Interpretable ML How decision tree do decisions? 7 | • When it comes to decision trees, we probably think of a figure • Of course everyone knows that decision trees use decisions to do regression or classification, but how?
  8. Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

    Interpretable ML Information Gain - Entropy & Gini Index 8 | • In probability & Information theory, Entropy / Gini Index represents the level of uncertainty of random variables • What’s this? ◦ Assume we have label vectors ▪ v = [0, 0, 0, 1, 1, 1, 1, 1, 1, 1] ▪ w = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1] ◦ Entropy(v) = 0.881 Entropy(w) = 1.0 ◦ Gini(v) = 0.42 Gini(w) = 0.5
  9. Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

    Interpretable ML Information Gain - Entropy & Gini Index 9 | • Based on Entropy / Gini Index, we can calculate the Information gain, which represents the Entropy / Gini index after splitting given specific threshold • Decision trees choose the column that maximize information gain between every feature and label • Where v is the amount of distinct value of specific column C
  10. Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

    Interpretable ML We are family - Different kinds of DTs 10 | • ID3 ◦ Using information gain based on entropy for splitting criteria ◦ Handles only categorical data, and only for classification tasks ◦ No pruning • C4.5 ◦ Uses gain ratio, a modification of information gain ◦ Handles categorical and numerical data, but also only for classification tasks • CART (The DT we use now) ◦ Use Gini Index for classification and MSE for regression ◦ CART construct strictly binary trees
  11. Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

    Interpretable ML • Preface • How decision tree do decisions? • Ensemble methods • ML in Finance / Trading (I) Outline 11 |
  12. Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

    Interpretable ML Bootstrap AGGregatING (Bagging) 12 | • Easiest ensemble method • Classification ◦ Output the mode of classes from every weak learner • Regression ◦ Output the mean of predictions from every weak learner • Bagging improves the model by lowering the variance of the individual base learners Fig: https://anasbrital98.github.io/blog/2021/Random-Forest/ Random Forest
  13. Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

    Interpretable ML Where’s the bootstrap? 13 | • Each decision tree doesn’t get the whole dataset as input! ◦ Instead, we randomly sample some features from the whole dataset with replacement to train the single decision tree ◦ The action sample with replacement is the simplest bootstrap method
  14. Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

    Interpretable ML Where’s the bootstrap? (Cont’d) 14 | • As the number of bootstrap sample increases, the variance of bagged trees is averaged out • The only concern we should have is the highly correlated trees among bootstrapped trees ◦ High correlated trees means high covariance between trees, and that means we cannot efficiently lower the total variance of the forest
  15. Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

    Interpretable ML Parameters of Random Forest 15 | • n_estimators ◦ number of trees in a forest • criterion ◦ the splitting criterion of a single tree • max_depth ◦ The maximum depth of the tree Random Forest scikit-learn page
  16. Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

    Interpretable ML How to tune Random Forest when evaluation? 16 | • When we occurs overfitting when training (that is, test loss / validation loss start to goes up), we may want to set a smaller number of n_estimators or max_depth • When underfitting, we may know that the model isn’t complex enough to learn from the dataset, just try to set a bigger number of the above parameters
  17. Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

    Interpretable ML Boosting 17 | • Unlike bagging, boosting method sequentially trained weak classifiers (decision tree) to do ensemble • Boosting mechanism emphasized misclassified samples from prior iteration. The goal is to prioritize samples that were incorrectly categorized in previous iterations, allowing the model to learn from its mistakes and improve its performance iteratively fig:https://www.geeksforgeeks.org/boosting-in-ma chine-learning-boosting-and-adaboost/
  18. Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

    Interpretable ML Gradient Boosting 18 | • Gradient descent + boosting mechanism • Simplify the equation and theory of gradient boosting ◦ Assume we are at stage ◦ In order to improve the existing model , we add a new weak learner ◦ So the whole equation will be ◦ We can found that , and that is the so-called residual ◦ Then, instead of only minimizing the loss function, we follow the concept of gradient descent and found that the negative gradient is the function of the new weak learner
  19. Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

    Interpretable ML How to choose between XGBoost & LightGBM? 19 | • Tree growth strategy ◦ LightGBM uses leaf-wise growth strategy (DFS), where the model expands the most promising leaf instead level-by-level ▪ This approach would make the tree deeper and also decrease the bias (higher accuracy) ▪ However, low bias usually leads to high variance. That is, LightGBM will more likely face the overfitting problem than XGBoost ◦ XGBoost uses level-wise growth strategy (BFS), that is, model at given depth are fully expanded before growing deeper ▪ More general model, and XGBoost also provides different regularization methods to avoid overfitting ▪ Since it prevents overfitting better than LightGBM, it’s also easier for XGBoost to do hyperparameter tuning XGBoost: A Scalable Tree Boosting System LightGBM: a highly efficient gradient boosting decision tree
  20. Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

    Interpretable ML But … Does the growth strategy really matters? 20 | • The figure shows that GBDT related models still outperform lots of Tabular Neural Networks • So just use them! fig: https://arxiv.org/abs/2305.18446
  21. Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

    Interpretable ML Supplementary - Model configuration file 21 | • We know that XGBoost, LightGBM, or other ensemble methods have a lot of hyperparameters to tune, but you won’t want to open the training Python script every time you tune the model. • We can use .json / .yaml configuration file to manage the models’ hyperparameters.
  22. Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

    Interpretable ML Supplementary - Model configuration file (Cont’d) 22 | • To load those configuration files
  23. Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

    Interpretable ML Supplementary - Best practice of coding multiple sklearn-like model structure 23 | • Every sklearn-like model class will be performed as below • The code will be reused every time when you want to train a different model ◦ So why don’t we just write it once and inherit the backbone?
  24. Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

    Interpretable ML Base Class Inheritance 24 |
  25. Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

    Interpretable ML • Preface • How decision tree do decisions? • Ensemble methods • ML in Finance / Trading (I) Outline 25 |
  26. Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

    Interpretable ML Portfolio construction 26 | • As I mentioned, utilizing machine learning models is usually a part of finance / trading tasks, not the whole process • One of the most popular research direction is to form a long-short portfolio with stocks chosen by machine learning models ◦ Price / Volume as input ▪ Prediction can be relative return, Sharpe ratio, etc ◦ Text as input ▪ Prediction can be sentiment score, market confidence, etc • The key point to success will be how you define your prediction (We will need the sense of financial market / asset pricing here)
  27. Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

    Interpretable ML Portfolio construction (Cont’d) 27 | • Long if the prediction of a stock exceed a specific threshold and short if opposite • The proportion of long / short portfolio value can be a hyperparameter
  28. Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

    Interpretable ML Portfolio construction (Cont’d) 28 | • Of course you don’t need tabular / text data as input, instead, image can also be input to form portfolio ◦ Jiang, J., Kelly, B., & Xiu, D. (2023) first use CNN-like model structure to make k-bar images as input, see the prediction class as binary classification to form a long-short portfolio ◦ Obaid, K., & Pukthuanthong, K. (2022) introduce a daily market sentiment index based on the news photos (photo pessimism) ◦ Den, Y., & Vincent, K. (2024) extend the idea from Jiang, J., Kelly, B., & Xiu, D. (2023), seeing the prediction class as multi-class classification which gives a relationship of returns between stocks Jiang, J., Kelly, B., & Xiu, D. (2023) Obaid, K., & Pukthuanthong, K. (2022) Den, Y., & Vincent, K. (2024)
  29. Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

    Interpretable ML Factor investing 29 | • We have tons on firm characteristics & macroeconomic characteristics every month, so how to choose which to use as features input to the regression / classification models? • Calculate the feature importance of each characteristics with prediction to eliminate bias and useless characteristics Characteristic data (At Publication -> Empirical Asset Pricing via Machine Learning)
  30. Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

    Interpretable ML Appendix (I) 30 | • Variance of Bagged trees