Interpretable ML & Financial Machine Learning (I)

©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University Interpretable
ML Interpretable ML & Financial Machine Learning (I) Author | Yu-Chen (Abner) Den Date | Oct 13th, 2024

Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University
Interpretable ML • Preface • How decision tree do decisions? • Ensemble methods • ML in Finance / Trading (I) Outline 2 |

Interpretable ML Preface • Objectives ◦ Understands what happens during training & how model works ◦ Learn how to tune hyperparameters in reason ◦ Explore how to make model better • Coverage ◦ Some code implementation of machine learning models & feature engineering methods ◦ A little bit about the engineering side of ML projects, just know that being a DS / MLE / Quant in top-tier firm, you’ll need those skills and tools to enhance your efficiency. ◦ Epiphanies of ML from my observations of life ◦ Models will not be covered: SVM, Naive Bayes, Linear regression, Logistic regression ◦ Course materials: https://github.com/AbnerTeng/better-ml 4 |

Interpretable ML Preface • Why interpretability matters? ◦ Better decision making ◦ Debugging & improvement ◦ Reliable model output 5 |

Interpretable ML How decision tree do decisions? 7 | • When it comes to decision trees, we probably think of a figure • Of course everyone knows that decision trees use decisions to do regression or classification, but how?

Interpretable ML Information Gain - Entropy & Gini Index 8 | • In probability & Information theory, Entropy / Gini Index represents the level of uncertainty of random variables • What’s this? ◦ Assume we have label vectors ▪ v = [0, 0, 0, 1, 1, 1, 1, 1, 1, 1] ▪ w = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1] ◦ Entropy(v) = 0.881 Entropy(w) = 1.0 ◦ Gini(v) = 0.42 Gini(w) = 0.5

Interpretable ML Information Gain - Entropy & Gini Index 9 | • Based on Entropy / Gini Index, we can calculate the Information gain, which represents the Entropy / Gini index after splitting given specific threshold • Decision trees choose the column that maximize information gain between every feature and label • Where v is the amount of distinct value of specific column C

Interpretable ML We are family - Diﬀerent kinds of DTs 10 | • ID3 ◦ Using information gain based on entropy for splitting criteria ◦ Handles only categorical data, and only for classification tasks ◦ No pruning • C4.5 ◦ Uses gain ratio, a modification of information gain ◦ Handles categorical and numerical data, but also only for classification tasks • CART (The DT we use now) ◦ Use Gini Index for classification and MSE for regression ◦ CART construct strictly binary trees

Interpretable ML Bootstrap AGGregatING (Bagging) 12 | • Easiest ensemble method • Classification ◦ Output the mode of classes from every weak learner • Regression ◦ Output the mean of predictions from every weak learner • Bagging improves the model by lowering the variance of the individual base learners Fig: https://anasbrital98.github.io/blog/2021/Random-Forest/ Random Forest

Interpretable ML Where’s the bootstrap? 13 | • Each decision tree doesn’t get the whole dataset as input! ◦ Instead, we randomly sample some features from the whole dataset with replacement to train the single decision tree ◦ The action sample with replacement is the simplest bootstrap method

Interpretable ML Where’s the bootstrap? (Cont’d) 14 | • As the number of bootstrap sample increases, the variance of bagged trees is averaged out • The only concern we should have is the highly correlated trees among bootstrapped trees ◦ High correlated trees means high covariance between trees, and that means we cannot efficiently lower the total variance of the forest

Interpretable ML Parameters of Random Forest 15 | • n_estimators ◦ number of trees in a forest • criterion ◦ the splitting criterion of a single tree • max_depth ◦ The maximum depth of the tree Random Forest scikit-learn page

Interpretable ML How to tune Random Forest when evaluation? 16 | • When we occurs overﬁtting when training (that is, test loss / validation loss start to goes up), we may want to set a smaller number of n_estimators or max_depth • When underﬁtting, we may know that the model isn’t complex enough to learn from the dataset, just try to set a bigger number of the above parameters

Interpretable ML Boosting 17 | • Unlike bagging, boosting method sequentially trained weak classifiers (decision tree) to do ensemble • Boosting mechanism emphasized misclassified samples from prior iteration. The goal is to prioritize samples that were incorrectly categorized in previous iterations, allowing the model to learn from its mistakes and improve its performance iteratively fig:https://www.geeksforgeeks.org/boosting-in-ma chine-learning-boosting-and-adaboost/

Interpretable ML Gradient Boosting 18 | • Gradient descent + boosting mechanism • Simplify the equation and theory of gradient boosting ◦ Assume we are at stage ◦ In order to improve the existing model , we add a new weak learner ◦ So the whole equation will be ◦ We can found that , and that is the so-called residual ◦ Then, instead of only minimizing the loss function, we follow the concept of gradient descent and found that the negative gradient is the function of the new weak learner

Interpretable ML How to choose between XGBoost & LightGBM? 19 | • Tree growth strategy ◦ LightGBM uses leaf-wise growth strategy (DFS), where the model expands the most promising leaf instead level-by-level ▪ This approach would make the tree deeper and also decrease the bias (higher accuracy) ▪ However, low bias usually leads to high variance. That is, LightGBM will more likely face the overfitting problem than XGBoost ◦ XGBoost uses level-wise growth strategy (BFS), that is, model at given depth are fully expanded before growing deeper ▪ More general model, and XGBoost also provides different regularization methods to avoid overfitting ▪ Since it prevents overfitting better than LightGBM, it’s also easier for XGBoost to do hyperparameter tuning XGBoost: A Scalable Tree Boosting System LightGBM: a highly efficient gradient boosting decision tree

Interpretable ML But … Does the growth strategy really matters? 20 | • The figure shows that GBDT related models still outperform lots of Tabular Neural Networks • So just use them! fig: https://arxiv.org/abs/2305.18446

Interpretable ML Supplementary - Model conﬁguration ﬁle 21 | • We know that XGBoost, LightGBM, or other ensemble methods have a lot of hyperparameters to tune, but you won’t want to open the training Python script every time you tune the model. • We can use .json / .yaml configuration file to manage the models’ hyperparameters.

Interpretable ML Supplementary - Model conﬁguration ﬁle (Cont’d) 22 | • To load those configuration files

Interpretable ML Supplementary - Best practice of coding multiple sklearn-like model structure 23 | • Every sklearn-like model class will be performed as below • The code will be reused every time when you want to train a different model ◦ So why don’t we just write it once and inherit the backbone?

Interpretable ML Base Class Inheritance 24 |

Interpretable ML Portfolio construction 26 | • As I mentioned, utilizing machine learning models is usually a part of finance / trading tasks, not the whole process • One of the most popular research direction is to form a long-short portfolio with stocks chosen by machine learning models ◦ Price / Volume as input ▪ Prediction can be relative return, Sharpe ratio, etc ◦ Text as input ▪ Prediction can be sentiment score, market confidence, etc • The key point to success will be how you define your prediction (We will need the sense of financial market / asset pricing here)

Interpretable ML Portfolio construction (Cont’d) 27 | • Long if the prediction of a stock exceed a specific threshold and short if opposite • The proportion of long / short portfolio value can be a hyperparameter

Interpretable ML Portfolio construction (Cont’d) 28 | • Of course you don’t need tabular / text data as input, instead, image can also be input to form portfolio ◦ Jiang, J., Kelly, B., & Xiu, D. (2023) first use CNN-like model structure to make k-bar images as input, see the prediction class as binary classification to form a long-short portfolio ◦ Obaid, K., & Pukthuanthong, K. (2022) introduce a daily market sentiment index based on the news photos (photo pessimism) ◦ Den, Y., & Vincent, K. (2024) extend the idea from Jiang, J., Kelly, B., & Xiu, D. (2023), seeing the prediction class as multi-class classification which gives a relationship of returns between stocks Jiang, J., Kelly, B., & Xiu, D. (2023) Obaid, K., & Pukthuanthong, K. (2022) Den, Y., & Vincent, K. (2024)

Interpretable ML Factor investing 29 | • We have tons on firm characteristics & macroeconomic characteristics every month, so how to choose which to use as features input to the regression / classification models? • Calculate the feature importance of each characteristics with prediction to eliminate bias and useless characteristics Characteristic data (At Publication -> Empirical Asset Pricing via Machine Learning)

Interpretable ML Appendix (I) 30 | • Variance of Bagged trees

Interpretable ML 31 | Thanks

Interpretable ML & Financial Machine Learning (I)

Interpretable ML & Financial Machine Learning (I)

Yu-Chen, Den

More Decks by Yu-Chen, Den

Other Decks in Science

Featured

Transcript

©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University Interpretable

Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University

Text ©2024 Yu-Chen Den, SinoPac Holdings | National Taiwan University