How I Learned to Stop Worrying and Love the Risk

How I Learned to Stop Worrying and Love the Risk
Gianmarco De Francisci Morales Senior Researcher ISI Foundation

Risk "Possibility of something bad happening" -- Cambridge Dictionary Effect
of uncertainty on objectives Financial Risk (e.g., liquidity, systemic) Credit: possibility of default on loan Market: volatility of equity, currency, interest rate Project Risk: possibility of an event with negative outcome on the project

A bit of context Collaboration between ISI Foundation and Intesa
Sanpaolo ISI Foundation = Private, no-proﬁt, fundamental research institute Intesa Sanpaolo (ISP) = Largest bank in Italy by capitalization Applied research projects of approx. 9 months Mixed team of researchers and domain experts

Predicting Corporate Credit Risk: Network Contagion via Trade Credit PLOS
ONE 2020

Problem Setting Banks provide credit to companies in various forms
(loans, cash advance) Credit worthiness (rating) of borrowers Banks use credit risk models to assess credit rating Affects credit conditions and possible interventions Changes with time and affected by context

Pay provider in 90 days = loan Default on trade
credit more common than default towards banks Can act as buffer for distress periods Network perspective Trade network = risk propagation Can trigger chain reactions to default events Can be used to improve credit risk models? Trade Credit &XVWRPHUILUP 3URYLGHUILUP 0RQH\IORZ 'HIDXOWILUP $GMDFHQWILUP 5LVNSURSDJDWLRQ

Goal Integrate network effects into prediction of default probability P(d)
At time t, predict whether given firm will default within a short-term horizon (3 months) Online prediction task Prequential setting (predict, reveal, advance) Find risky firms in advance to enact proactive measures to avoid the default Limited resources Ranking task, act on top-k firms Probabilistic classifier to model class of interest (default) Few examples, high imbalance of class labels Main metric Recall@K (K depends on bank resources, here 5%) → → → Thus, our target variable Y t i is a logical ‘or’ of lagged versi Y t i = Dt+1 i _ Dt+2 i _ Dt+3 i . Data description Data is drawn from a proprietary dataset belonging to Inte leading Italian commercial bank. The dataset is highly rep 1 The definition of default was introduced by Directive 2006/48/EC known as the Capital Requirements Directive – CRD), later replaced (CRR). The definition of default of an obligor specified in Article 17 the days past due criterion for default identification, indications of un return to non-defaulted status and treatment of the definition of defa 2 Following the financial crisis, the European Banking Authorit standards around the definition of default to achieve greater alignme A new definition of default need to be implemented by banks by the

Longitudinal data (ﬁrms over time) Two perimeters (Target and Extended)
Features from a plethora of different sources (ﬁnancial statements, central bank registry, overdrafts, regulatory risk parameters, credit risk alerts, etc.) Challenge: Incomplete view (avg. 16% of transactions) Network enrichment via record linkage Data Model

Network Enrichment Bank transfer involving external IBAN and firm name:
does the firm have an ISP account? Match firm name linked to external IBAN with firm name in the ISP database Training data: variability of spellings of single firm inside ISP firm registry (pairs of names referring to same firm) For each pair, compute standard string distance metrics as features Application strategy for the model (multiple-bank phenomenon): if a client holds account with different banks, they are likely to transfer money between them Only test pairs of firms that are linked by a bank transfer Increase amount of traced transactions by 450%, coverage by 200%, and get from 281k links to 826k links per month Table 1. Performance of the model for record linkage on the te Precision 99.98% Recall 73.03% F1 measure 84.45%

Direct Contagion Percentage of adjacent nodes of insolvent ﬁrms at
time t that experienced a default δ months later, for δ ∈ {3, 6, 9, 12}

Models Model fragility of firms to network spillovers Capture network
spillover effects from supply chain on P(d) of each firm Sequential modeling approach: output of first single-firm model used in subsequent network model First model captures effect of single-firm’s features Predicts P(d) of each firm in isolation Second model captures network spillovers Leverages output of first model, together with network structure and position of firm in the supply chain Determines influence of neighborhood of each firm onto the P(d) of the firm Single Firm Model Single Firm P(d) Firm Features Network Network Features Network Firm P(d) Network Model

Single-Firm Model Prequential validation Trained on Extended Perimeter Random Forest
classiﬁer 500 trees Avg. R@K ~ 64%

Feature Importance Local: P(d) predicted by a model based on
financial features (amount borrowed by the firm) Rating: P(d) coming from the officially- regulated rating model of the specific firm, longer time horizon (one year), and uses features from the balance sheet of the firm. Overdraft: the numbers of days of overdraft in the last three months Hist: This boolean indicator is 1 if the firm has been in default at any point in its past.

Network Spillover Model Logistic Regression trained on Target Perimeter Features:
Only network based (no single-ﬁrm features) Fragility (client and supplier) Normalized PPR (Effective Importance) Instance weighting by how much we know of their transaction network

Personalized PageRank How close is the firm to other firms
which have had a default? Assume risk spreads as random walk Restart from nodes Q, uniform over firms in default at with =0.25 Temporal discounting (for ) does not work better Normalize for in-degree of node i t′ < t α Δt = t′ − t For our application, Q has non-zero values only for firms which h (more on the choice of Q later). Q is also called a restart vector. Finally, we compute the feature as the stationary distribution described above. This distribution exists and is unique, and can b the PPR algorithm. For every other node we compute the PPR fr as the solution to the recurrent equation PPR↵ = ↵ PPR↵ M + (1 ↵)Q, where M is the row-stochastic adjacency matrix of the transactio restart vector distribution, and ↵ is a damping parameter 2 (0, 1). the PPR↵ value obtained by the algorithm to reduce its bias tow nodes PPR(i) = PPR↵(i) | N (i)| , model that firms closer to a defaulting customer are more likely to be 568 ection Basic definitions. Therefore, we impose a restart probability 569 r: with probability ↵ the random walker follows the transaction 570 robability 1 ↵ it restarts its random walk from its origin. 571 model that being closer to multiple defaulting customers is likely to 572 firm. Therefore, we allow the random walk to (re)start from a set 573 ented by a distribution Q over the nodes of the transaction network. 574 Q has non-zero values only for firms which have been in default 575 of Q later). Q is also called a restart vector. 576 ute the feature as the stationary distribution of the random walker 577 s distribution exists and is unique, and can be easily computed by 578 For every other node we compute the PPR from the restart vector 579 e recurrent equation 580 PPR↵ = ↵ PPR↵ M + (1 ↵)Q,

A similar reasoning can be applied in the opposite direction,
that is, how default risk spreads from suppliers to customers. In this case, the ec interpretation is more oriented to the market power of the customer with chain. Larger customers, in terms of purchases, have greater market pow reflected in the ability to obtain deferred payments and other support m suppliers in the event of a liquidity shortage. Moreover, higher is the trad the customer i owned by the supplier j, higher is the implicit stake of the business. In other words, higher is the customer trade debt to its supplier its sensitivity to the supplier’s financial soundness. The FRGs coefficien expected to be positive. The final formulas for computing the fragility is specified as: FRGc(i) = ARi Si ⇥ logit 0 @ X j2 N (i) wji P(d)j 1 A , FRGs(i) = APi Pi ⇥ logit 0 @ X j2 ! N (i) wij P(d)j 1 A , where AR and AP are account receivables and account payables, S a and purchases, N (i) and ! N (i) are the in-neighbors and out-neighbors of transaction network, wij is the normalized weight of the edge between i P(d) is the probability of default of j as computed by the model in the Fragility Exposure to risk from network Account Receivables = amount of revenue in credit to customers Sales = revenue from trading Weight = normalized transaction weight of link from j to i P(d) = output of single-ﬁrm model

Network Model Performance Instance weighting incoming amount over sales outgoing
amount over purchases R@K ~ 50% of single-ﬁrm model without any local information about the ﬁrm itself Testimony of the power of network more complete our knowledge of the supply-chain, the more reliable the 613 of the influence of the network on the risk of the company will be. 614 is definitely the case for the fragility features, which explicitly rely on how 615 the firm’s financial position the network captures, but is also true indirectly for 616 feature, as the presence or absence of a link (and its weight) clearly affects the 617 walks which the feature is based on. 618 hese reasons, we employ an instance weighting scheme, so that the model can 619 the data points which are more reliable. For each firm i, we define an instance 620 or the machine learning model as: 621 W(i) = 1 2 P j2 N (i) wji Si + P j2 ! N (i) wij Pi ! . ght is therefore the average of the in-coverage and the out-coverage of the 622 with respect to the balance sheet data (sales S and purchases P). More in 623 he first term is the ratio of the sum of in-weights of the network to the sales of 624 while the second term is the ratio of the sum of out-weights of the network to 625 hases of the firm. Therefore, for a well-mapped firm this weight will be close to 626 it will be close to 0 for firms which the network has little information on. 627 etwork spillover model. The overall combined model is as follows: 628 Y = f(PPR, FRGc , FRGs , FRGc · FRGs), PR is the personalized PageRank, FRGc and FRGs are the two fragility 629

Feature Importance Most important features PPR Fragility of clients Stable
over time

Hybrid Model XGBoost with single-ﬁrm and network features Feature engineering
Systematic feature selection Deployment in pre-production environment Fig 11. Recall@K for the XGBoost model with mixed single-firm and network feature as a function of time in the prequential setting compared to a Logistic regression mode on single-firm features (baseline). The average R@K is 68.1% and the AUC is 90.5% Table 5. Performance of the XGBoost model with respect to the baseline on 3 out-of-time snapshots. AUC P@K R@K Month Baseline XGBoost Baseline XGBoost Baseline XGBoost 2018 12 68.0 91.4 3.9 6.5 40.8 68.4 2019 03 86.3 91.9 7.6 9.9 54.3 70.0 2019 06 85.6 89.8 7.2 9.6 47.6 63.6

Summary Network-based model for short-term default forecasting Incorporates trade credit
information in credit risk model by looking at transaction network Network features based on data mining and domain expertise Network model alone achieves 50% of recall of single-ﬁrm model Hybrid model improves over baseline by almost 20 percentage points

Continuous-Action Reinforcement Learning for Portfolio Allocation of a Life Insurance
Company ECMLPKDD 2021

Problem Setting Insurance company Premia contribute common funds to an
investment portfolio Clients acquire right to compensation in case of accident (e.g., death) Assets and Liabilities are inter-dependent More complex than traditional portfolio optimization Long time horizon (30y) and sporadic rebalancing

Liabilities Derive from insurance contracts with the clients and portfolio
performances Compensation in case of adverse events Annual returns of the common ﬁnancial portfolio Withdrawals might increase whenever these returns are too low Annual minimum guaranteed requires the company to integrate the difference

Goal Optimize risk-adjusted returns of the investment portfolio Ensure future
liabilities are covered despite market ﬂuctuations Liabilities are stochastic and correlated to assets Match investment portfolio with due dates of liabilities

Baseline: Modern Portfolio Theory Markowitz's mean-variance optimization (1952) Efﬁcient frontier:
maximum expected return for a given variance level Problems: Does not consider liabilities and negative cash ﬂows Single decision point (rebalancing), no path dependency

Solution Model system as a Markov Decision Process (Markov Chain
with decisions) MDP = <States, Actions, Transition Probabilities, Rewards> State = current portfolio, future liabilities (continuous) Action = portfolio allocation (point on a k-1 simplex, k available assets, continuous) Solve MDP = ﬁnd optimal policy: (stochastic) mapping of states to actions that maximizes the expected reward Use Reinforcement Learning to solve MDP

Contributions Realistic stochastic model of asset-liability management of insurance Deﬁnition
of risk-adjusted optimization problem, over a pre-determined time horizon Implementation of custom solution based on Deep Deterministic Policy Gradient (DDPG) compatible with standard python libraries for RL

Optimization Problem Given time horizon T and initial portfolio P(0)
Find asset allocation for every t ∈[1,T] that Maximizes the overall risk-adjusted returns of the portfolio Taking into account volatility (standard deviation of the annual returns) Respecting ﬁnancial constraints μ = Average return within the same realization, σ = risk measure X_i = asset allocation at i-th time unit, λ = risk-aversion as weight of the volatility, = economic scenario ε <latexit sha1_base64="WbIvtbyYe67Ttc0DuA85slV8Tbs=">AAACDnicdVDLSgMxFM3UV62vqks3wVKomzIj9dFFoehCl1X6gk4dMmmmDU1mhiQjlHG+wI3f4c6NC0Xcunbn35i2Cj4PXDg5515y73FDRqUyzTcjNTM7N7+QXswsLa+srmXXN5oyiAQmDRywQLRdJAmjPmkoqhhph4Ig7jLScofHY791SYSkgV9Xo5B0Oer71KMYKS052bzNI1iBticQjq0krie2jLgT04qVXOhX3zkvqB0nmzOL5XJpz7Lgb2IVzQly1RN4aztX/ZqTfbV7AY448RVmSMqOZYaqGyOhKGYkydiRJCHCQ9QnHU19xInsxpNzEpjXSg96gdDlKzhRv07EiEs54q7u5EgN5E9vLP7ldSLlHXZj6oeRIj6efuRFDKoAjrOBPSoIVmykCcKC6l0hHiCdjNIJZnQIn5fC/0lzt2jtF0tnOo0jMEUabIFtUAAWOABVcApqoAEwuAZ34AE8GjfGvfFkPE9bU8bHzCb4BuPlHdHHnvY=</latexit> µ = 1 T T X i=1 gR(t) <latexit sha1_base64="2RGORJdim15bmSO8QotV/2XD2ek=">AAACI3icdVC7SgNBFJ31bXxFLW0GRYiFYVd8RRBEG+1UjArZuMxOZpMhM7vrzF0hLNtZ+Bk2/oqNhSI2Flb+gJ/gJFHweeDCmXPuZe49fiy4Btt+tnp6+/oHBoeGcyOjY+MT+cmpYx0lirIyjUSkTn2imeAhKwMHwU5jxYj0BTvxmztt/+SCKc2j8AhaMatKUg95wCkBI3n5DVfzuiR4E7v6XEHqBorQ1MnSo8woifRSvulkZ+ZZqHuHBVjAi9iVycLZUubl5+xiqbS84jj4N3GKdgdzW3svV6+Nt8t9L//o1iKaSBYCFUTrimPHUE2JAk4Fy3JuollMaJPUWcXQkEimq2nnxgzPG6WGg0iZCgF31K8TKZFat6RvOiWBhv7ptcW/vEoCwXo15WGcAAtp96MgERgi3A4M17hiFETLEEIVN7ti2iAmJTCx5kwIn5fi/8nxUtFZLS4fmDS2URdDaAbNogJy0BraQrtoH5URRdfoFt2jB+vGurMeradua4/1MTONvsF6eQc7Qqh/</latexit> = v u u t 1 T T X i=1 (gR(t) µ)2 <latexit sha1_base64="EcmDIknNj84kiC/tU4JfponP2NI=">AAACd3icdZHNbtQwEMedUGhZPrqFG5WKxQLqoV0lVQv0UKkCgTgWqduutI6iidfZtWonkT1BrCK/Ak/Ao8BLtCduPAQXbng3gCgfI1n+6zczng9nlZIWo+hzEF5ZunpteeV658bNW7dXu2t3TmxZGy4GvFSlGWZghZKFGKBEJYaVEaAzJU6zsxdz/+lbYawsi2OcVSLRMClkLjmgR2n3PdOA07JqFrfRDZiJhnfOpQ3LwDRDLyK3xcYl2q1f5Hg7do4e0EvJWda8dI4pqSXatGUcVAtFjiMfXtNtypRvbwyUcf8oZVb6gpQZOZliknZ7UX9/f3cvjunfIu5HC+sdvtr4+OHiy6ejtHvuW+O1FgVyBdaO4qjCxE+BkivhOqy2ogJ+BhMx8rIALWzSLPbm6CNPxjQvjT8F0gX9PaMBbe1MZz5yPoz90zeH//KNasyfJY0sqhpFwdtCea0olnT+CXQsjeCoZl4AN9L3SvkUDHD0X9XxS/g5Kf2/ONnpx0/6u2/8Np6T1lbIOnlANklMnpJD8pockQHh5GtwL+gFD4Nv4f3wcbjZhobBj5y75JKF8Xe8EccE</latexit> argmax ¯ X0,..., ¯ XT 1 = E E [µ · ]

RL Implementation Reinforcement Learning: learn how an agent should interact
with an environment to maximize the expected (across stochastic realizations) cumulative reward Actor-Critic schema: agent composed by two modules Critic, learns to approximate the reward of an action on a given state (approximation of the environment) Actor, given a state, learns to produce actions that maximize the value estimated by the critic Deep Deterministic Policy Gradient algorithm to produce continuous actions Customized extension in order to ensure compliance with ﬁnancial constraints Figure 1: The agent environment - interaction in reinforcement lear In Reinforcement Learning, the conceptual framing of the problem is based interaction of an agent with an environment. The agent can perform actions that m ment, and in turn can receive updated perceptions of the evolution of the environme diagram is depicted in Figure 1. Besides perceptions, at every iteration the agent receives a reward - a scalar va the desirability of the current situation. The reward is used by the agent in or good a sequence of actions (often called strategy) is. The starting point is a bla agent’s internal parameters randomly initialized to close-to-zero values; in the R corresponds to an agent behaving randomly, all actions being initially equivalentl time the agent performs an action and receives an updated description of the wo a reward value, these bits of knowledge are used to improve the agent’s internal co actions and their e↵ects. Despite this single learning step being fundamentally supe mentioning that many RL problems present credit assignment problems: the positive of a strategy is often evident (and modeled by a reward) after several actions are exe 13

Financial Constraints 1. Structural, keep problem mathematically sound, e.g., allocation
should be positive and sum to one (implemented via softmax architecture) 2. Parametric, restrict the allocation exposition to desirable ranges, e.g., equity below 14%, and sum of all bonds between 20% and 80% (implemented via regularization) 3. State-dependent, depend on the current state of the simulation, e.g., portfolio turnover limited to 10% of the current portfolio value (implemented via optimization and projection) Additional regulatory constraints considered explicitly Keep the current discounted value of future liabilities and the market value of the assets close Capital injection/ejection to keep the constraint satisﬁed (injection equivalent to borrowing cash)

Test Scenarios Simplifying assumptions Single-decision bandit Asset allocation chosen only
at t=0 No rebalancing Assets are sold to replenish cash at t>0 whenever it becomes negative Two scenarios 3 assets: optimal solution known with 1% precision 6 assets: optimal solution unfeasible with exhaustive search Warm-up strategy (pre-training) for the critic network

Scenario 1 Three assets: Cash, Equity, Bond Parametric constraint sets
0.17 as upper bound for equity Ground truth via set of simulations with 0.01 grid step (5151 actions) Extract coarser results by increasing grid step size Use coarsest grid (step = 0.20) for warm-up phase of the Critic

Reward Surface

Scenario 1: Ground Truth t action found with the given
grid, correspond mated on the 500 ﬁxed realizations of the econ Step # actions Best action Best reward 0.20 21 [0.0, 0.20, 0.80] 2.552 0.10 66 [0.0, 0.10, 0.90] 2.707 0.05 231 [0.0, 0.15, 0.85] 2.790 0.02 1326 [0.0 ,0.16, 0.84] 2.799 0.01 5151 [0.0 ,0.17, 0.83] 2.811

Scenario 1: Results

Scenario 2 Assets = cash, and Italian Bonds with 3,
5, 10, 20, 30 years tenors No parametric constraints: any part of action space might contain optimum λ set to high value of 4 to avoid optimal solution of only most proﬁtable and most risky asset (30y bond) Negative cash ﬂows concentrated at 5 and 10 years

Scenario 2: Results

Summary RL for asset-liability management of insurance company Minimal assumptions:
generally applicable to any asset, liability, and economic scenario Improves on mean-variance optimization via grid Monte-Carlo simulations Designed for complex multi-period optimization (testing w.i.p.) Risk-adjusted optimization problem could be integrated in MDP formulation

What Could Go Wrong? Some war stories

It's a Relationship

It's a Relationship Communicate effectively Common language

It's a Relationship Communicate effectively Common language Show that you
care Understand the problem Acquire domain knowledge

It's a Relationship Communicate effectively Common language Show that you
care Understand the problem Acquire domain knowledge Build trust Deliver on your promises

It's Research

It's Research "Works on Paper" 10x more effort to run
on realistic scenarios

on realistic scenarios Scope creep "appetite comes with eating"

on realistic scenarios Scope creep "appetite comes with eating" No ground truths Hard to generate trust

The Right Level of Abstraction Computer scientists like crisp, abstract
problem deﬁnitions Not applicable nor useful Domain experts want a problem deﬁnition as realistic as possible Unfeasible or too complex Getting it right is 50% of the job © Christoph Niemann https://www.christophniemann.com

Interpretability Regulatory reasons Humans want to know Causal reasoning Trust
Simple Complex ≻

Acknowledgements Claudia Berloco Daniele Frassineti Greta Greco Hashani Kumarasinghe Marco
Lamieri Emanuele Massaro Arianna Miola Shuyi Yang Carlo Abrate Alessio Angius Alan Perotti Stefano Cozzini Francesca Iadanza Laura Li Puma Simone Pavanelli Stefano Pignataro Silvia Ronchiadin

[email protected] Thanks! @gdfm7

How I Learned to Stop Worrying and Love the Risk

How I Learned to Stop Worrying and Love the Risk

More Decks by Gianmarco De Francisci Morales

Other Decks in Research

Featured

Transcript