Risk "Possibility of something bad happening" -- Cambridge Dictionary Effect of uncertainty on objectives Financial Risk (e.g., liquidity, systemic) Credit: possibility of default on loan Market: volatility of equity, currency, interest rate Project Risk: possibility of an event with negative outcome on the project

A bit of context Collaboration between ISI Foundation and Intesa Sanpaolo ISI Foundation = Private, no-proﬁt, fundamental research institute Intesa Sanpaolo (ISP) = Largest bank in Italy by capitalization Applied research projects of approx. 9 months Mixed team of researchers and domain experts

Problem Setting Banks provide credit to companies in various forms (loans, cash advance) Credit worthiness (rating) of borrowers Banks use credit risk models to assess credit rating Affects credit conditions and possible interventions Changes with time and affected by context

Pay provider in 90 days = loan Default on trade credit more common than default towards banks Can act as buffer for distress periods Network perspective Trade network = risk propagation Can trigger chain reactions to default events Can be used to improve credit risk models? Trade Credit &XVWRPHUILUP 3URYLGHUILUP 0RQH\IORZ 'HIDXOWILUP $GMDFHQWILUP 5LVNSURSDJDWLRQ

Goal Integrate network effects into prediction of default probability P(d) At time t, predict whether given ﬁrm will default within a short-term horizon (3 months) Online prediction task Prequential setting (predict, reveal, advance) Find risky ﬁrms in advance to enact proactive measures to avoid the default Limited resources Ranking task, act on top-k ﬁrms Probabilistic classiﬁer to model class of interest (default) Few examples, high imbalance of class labels Main metric Recall@K (K depends on bank resources, here 5%) → → → Thus, our target variable Y t i is a logical ‘or’ of lagged versi Y t i = Dt+1 i _ Dt+2 i _ Dt+3 i . Data description Data is drawn from a proprietary dataset belonging to Inte leading Italian commercial bank. The dataset is highly rep 1 The definition of default was introduced by Directive 2006/48/EC known as the Capital Requirements Directive – CRD), later replaced (CRR). The definition of default of an obligor specified in Article 17 the days past due criterion for default identification, indications of un return to non-defaulted status and treatment of the definition of defa 2 Following the financial crisis, the European Banking Authorit standards around the definition of default to achieve greater alignme A new definition of default need to be implemented by banks by the

Longitudinal data (ﬁrms over time) Two perimeters (Target and Extended) Features from a plethora of different sources (ﬁnancial statements, central bank registry, overdrafts, regulatory risk parameters, credit risk alerts, etc.) Challenge: Incomplete view (avg. 16% of transactions) Network enrichment via record linkage Data Model

Network Enrichment Bank transfer involving external IBAN and ﬁrm name: does the ﬁrm have an ISP account? Match ﬁrm name linked to external IBAN with ﬁrm name in the ISP database Training data: variability of spellings of single ﬁrm inside ISP ﬁrm registry (pairs of names referring to same ﬁrm) For each pair, compute standard string distance metrics as features Application strategy for the model (multiple-bank phenomenon): if a client holds account with different banks, they are likely to transfer money between them Only test pairs of ﬁrms that are linked by a bank transfer Increase amount of traced transactions by 450%, coverage by 200%, and get from 281k links to 826k links per month Table 1. Performance of the model for record linkage on the te Precision 99.98% Recall 73.03% F1 measure 84.45%

Models Model fragility of ﬁrms to network spillovers Capture network spillover effects from supply chain on P(d) of each ﬁrm Sequential modeling approach: output of ﬁrst single-ﬁrm model used in subsequent network model First model captures effect of single-ﬁrm’s features Predicts P(d) of each ﬁrm in isolation Second model captures network spillovers Leverages output of ﬁrst model, together with network structure and position of ﬁrm in the supply chain Determines inﬂuence of neighborhood of each ﬁrm onto the P(d) of the ﬁrm Single Firm Model Single Firm P(d) Firm Features Network Network Features Network Firm P(d) Network Model

Feature Importance Local: P(d) predicted by a model based on ﬁnancial features (amount borrowed by the ﬁrm) Rating: P(d) coming from the ofﬁcially- regulated rating model of the speciﬁc ﬁrm, longer time horizon (one year), and uses features from the balance sheet of the ﬁrm. Overdraft: the numbers of days of overdraft in the last three months Hist: This boolean indicator is 1 if the ﬁrm has been in default at any point in its past.

Network Spillover Model Logistic Regression trained on Target Perimeter Features: Only network based (no single-ﬁrm features) Fragility (client and supplier) Normalized PPR (Effective Importance) Instance weighting by how much we know of their transaction network

Personalized PageRank How close is the ﬁrm to other ﬁrms which have had a default? Assume risk spreads as random walk Restart from nodes Q, uniform over ﬁrms in default at with =0.25 Temporal discounting (for ) does not work better Normalize for in-degree of node i t′ < t α Δt = t′ − t For our application, Q has non-zero values only for firms which h (more on the choice of Q later). Q is also called a restart vector. Finally, we compute the feature as the stationary distribution described above. This distribution exists and is unique, and can b the PPR algorithm. For every other node we compute the PPR fr as the solution to the recurrent equation PPR↵ = ↵ PPR↵ M + (1 ↵)Q, where M is the row-stochastic adjacency matrix of the transactio restart vector distribution, and ↵ is a damping parameter 2 (0, 1). the PPR↵ value obtained by the algorithm to reduce its bias tow nodes PPR(i) = PPR↵(i) | N (i)| , model that firms closer to a defaulting customer are more likely to be 568 ection Basic definitions. Therefore, we impose a restart probability 569 r: with probability ↵ the random walker follows the transaction 570 robability 1 ↵ it restarts its random walk from its origin. 571 model that being closer to multiple defaulting customers is likely to 572 firm. Therefore, we allow the random walk to (re)start from a set 573 ented by a distribution Q over the nodes of the transaction network. 574 Q has non-zero values only for firms which have been in default 575 of Q later). Q is also called a restart vector. 576 ute the feature as the stationary distribution of the random walker 577 s distribution exists and is unique, and can be easily computed by 578 For every other node we compute the PPR from the restart vector 579 e recurrent equation 580 PPR↵ = ↵ PPR↵ M + (1 ↵)Q,

A similar reasoning can be applied in the opposite direction, that is, how default risk spreads from suppliers to customers. In this case, the ec interpretation is more oriented to the market power of the customer with chain. Larger customers, in terms of purchases, have greater market pow reflected in the ability to obtain deferred payments and other support m suppliers in the event of a liquidity shortage. Moreover, higher is the trad the customer i owned by the supplier j, higher is the implicit stake of the business. In other words, higher is the customer trade debt to its supplier its sensitivity to the supplier’s financial soundness. The FRGs coefficien expected to be positive. The final formulas for computing the fragility is specified as: FRGc(i) = ARi Si ⇥ logit 0 @ X j2 N (i) wji P(d)j 1 A , FRGs(i) = APi Pi ⇥ logit 0 @ X j2 ! N (i) wij P(d)j 1 A , where AR and AP are account receivables and account payables, S a and purchases, N (i) and ! N (i) are the in-neighbors and out-neighbors of transaction network, wij is the normalized weight of the edge between i P(d) is the probability of default of j as computed by the model in the Fragility Exposure to risk from network Account Receivables = amount of revenue in credit to customers Sales = revenue from trading Weight = normalized transaction weight of link from j to i P(d) = output of single-ﬁrm model

Network Model Performance Instance weighting incoming amount over sales outgoing amount over purchases R@K ~ 50% of single-ﬁrm model without any local information about the ﬁrm itself Testimony of the power of network more complete our knowledge of the supply-chain, the more reliable the 613 of the influence of the network on the risk of the company will be. 614 is definitely the case for the fragility features, which explicitly rely on how 615 the firm’s financial position the network captures, but is also true indirectly for 616 feature, as the presence or absence of a link (and its weight) clearly affects the 617 walks which the feature is based on. 618 hese reasons, we employ an instance weighting scheme, so that the model can 619 the data points which are more reliable. For each firm i, we define an instance 620 or the machine learning model as: 621 W(i) = 1 2 P j2 N (i) wji Si + P j2 ! N (i) wij Pi ! . ght is therefore the average of the in-coverage and the out-coverage of the 622 with respect to the balance sheet data (sales S and purchases P). More in 623 he first term is the ratio of the sum of in-weights of the network to the sales of 624 while the second term is the ratio of the sum of out-weights of the network to 625 hases of the firm. Therefore, for a well-mapped firm this weight will be close to 626 it will be close to 0 for firms which the network has little information on. 627 etwork spillover model. The overall combined model is as follows: 628 Y = f(PPR, FRGc , FRGs , FRGc · FRGs), PR is the personalized PageRank, FRGc and FRGs are the two fragility 629

Hybrid Model XGBoost with single-ﬁrm and network features Feature engineering Systematic feature selection Deployment in pre-production environment Fig 11. Recall@K for the XGBoost model with mixed single-firm and network feature as a function of time in the prequential setting compared to a Logistic regression mode on single-firm features (baseline). The average R@K is 68.1% and the AUC is 90.5% Table 5. Performance of the XGBoost model with respect to the baseline on 3 out-of-time snapshots. AUC P@K R@K Month Baseline XGBoost Baseline XGBoost Baseline XGBoost 2018 12 68.0 91.4 3.9 6.5 40.8 68.4 2019 03 86.3 91.9 7.6 9.9 54.3 70.0 2019 06 85.6 89.8 7.2 9.6 47.6 63.6

Summary Network-based model for short-term default forecasting Incorporates trade credit information in credit risk model by looking at transaction network Network features based on data mining and domain expertise Network model alone achieves 50% of recall of single-ﬁrm model Hybrid model improves over baseline by almost 20 percentage points

Problem Setting Insurance company Premia contribute common funds to an investment portfolio Clients acquire right to compensation in case of accident (e.g., death) Assets and Liabilities are inter-dependent More complex than traditional portfolio optimization Long time horizon (30y) and sporadic rebalancing

Liabilities Derive from insurance contracts with the clients and portfolio performances Compensation in case of adverse events Annual returns of the common ﬁnancial portfolio Withdrawals might increase whenever these returns are too low Annual minimum guaranteed requires the company to integrate the difference

Goal Optimize risk-adjusted returns of the investment portfolio Ensure future liabilities are covered despite market ﬂuctuations Liabilities are stochastic and correlated to assets Match investment portfolio with due dates of liabilities

Baseline: Modern Portfolio Theory Markowitz's mean-variance optimization (1952) Efﬁcient frontier: maximum expected return for a given variance level Problems: Does not consider liabilities and negative cash ﬂows Single decision point (rebalancing), no path dependency

Solution Model system as a Markov Decision Process (Markov Chain with decisions) MDP = State = current portfolio, future liabilities (continuous) Action = portfolio allocation (point on a k-1 simplex, k available assets, continuous) Solve MDP = ﬁnd optimal policy: (stochastic) mapping of states to actions that maximizes the expected reward Use Reinforcement Learning to solve MDP

Contributions Realistic stochastic model of asset-liability management of insurance Deﬁnition of risk-adjusted optimization problem, over a pre-determined time horizon Implementation of custom solution based on Deep Deterministic Policy Gradient (DDPG) compatible with standard python libraries for RL

Optimization Problem Given time horizon T and initial portfolio P(0) Find asset allocation for every t ∈[1,T] that Maximizes the overall risk-adjusted returns of the portfolio Taking into account volatility (standard deviation of the annual returns) Respecting ﬁnancial constraints μ = Average return within the same realization, σ = risk measure X_i = asset allocation at i-th time unit, λ = risk-aversion as weight of the volatility, = economic scenario ε AAACDnicdVDLSgMxFM3UV62vqks3wVKomzIj9dFFoehCl1X6gk4dMmmmDU1mhiQjlHG+wI3f4c6NC0Xcunbn35i2Cj4PXDg5515y73FDRqUyzTcjNTM7N7+QXswsLa+srmXXN5oyiAQmDRywQLRdJAmjPmkoqhhph4Ig7jLScofHY791SYSkgV9Xo5B0Oer71KMYKS052bzNI1iBticQjq0krie2jLgT04qVXOhX3zkvqB0nmzOL5XJpz7Lgb2IVzQly1RN4aztX/ZqTfbV7AY448RVmSMqOZYaqGyOhKGYkydiRJCHCQ9QnHU19xInsxpNzEpjXSg96gdDlKzhRv07EiEs54q7u5EgN5E9vLP7ldSLlHXZj6oeRIj6efuRFDKoAjrOBPSoIVmykCcKC6l0hHiCdjNIJZnQIn5fC/0lzt2jtF0tnOo0jMEUabIFtUAAWOABVcApqoAEwuAZ34AE8GjfGvfFkPE9bU8bHzCb4BuPlHdHHnvY= µ = 1 T T X i=1 gR(t) AAACI3icdVC7SgNBFJ31bXxFLW0GRYiFYVd8RRBEG+1UjArZuMxOZpMhM7vrzF0hLNtZ+Bk2/oqNhSI2Flb+gJ/gJFHweeDCmXPuZe49fiy4Btt+tnp6+/oHBoeGcyOjY+MT+cmpYx0lirIyjUSkTn2imeAhKwMHwU5jxYj0BTvxmztt/+SCKc2j8AhaMatKUg95wCkBI3n5DVfzuiR4E7v6XEHqBorQ1MnSo8woifRSvulkZ+ZZqHuHBVjAi9iVycLZUubl5+xiqbS84jj4N3GKdgdzW3svV6+Nt8t9L//o1iKaSBYCFUTrimPHUE2JAk4Fy3JuollMaJPUWcXQkEimq2nnxgzPG6WGg0iZCgF31K8TKZFat6RvOiWBhv7ptcW/vEoCwXo15WGcAAtp96MgERgi3A4M17hiFETLEEIVN7ti2iAmJTCx5kwIn5fi/8nxUtFZLS4fmDS2URdDaAbNogJy0BraQrtoH5URRdfoFt2jB+vGurMeradua4/1MTONvsF6eQc7Qqh/ = v u u t 1 T T X i=1 (gR(t) µ)2 AAACd3icdZHNbtQwEMedUGhZPrqFG5WKxQLqoV0lVQv0UKkCgTgWqduutI6iidfZtWonkT1BrCK/Ak/Ao8BLtCduPAQXbng3gCgfI1n+6zczng9nlZIWo+hzEF5ZunpteeV658bNW7dXu2t3TmxZGy4GvFSlGWZghZKFGKBEJYaVEaAzJU6zsxdz/+lbYawsi2OcVSLRMClkLjmgR2n3PdOA07JqFrfRDZiJhnfOpQ3LwDRDLyK3xcYl2q1f5Hg7do4e0EvJWda8dI4pqSXatGUcVAtFjiMfXtNtypRvbwyUcf8oZVb6gpQZOZliknZ7UX9/f3cvjunfIu5HC+sdvtr4+OHiy6ejtHvuW+O1FgVyBdaO4qjCxE+BkivhOqy2ogJ+BhMx8rIALWzSLPbm6CNPxjQvjT8F0gX9PaMBbe1MZz5yPoz90zeH//KNasyfJY0sqhpFwdtCea0olnT+CXQsjeCoZl4AN9L3SvkUDHD0X9XxS/g5Kf2/ONnpx0/6u2/8Np6T1lbIOnlANklMnpJD8pockQHh5GtwL+gFD4Nv4f3wcbjZhobBj5y75JKF8Xe8EccE argmax ¯ X0,..., ¯ XT 1 = E E [µ · ]

RL Implementation Reinforcement Learning: learn how an agent should interact with an environment to maximize the expected (across stochastic realizations) cumulative reward Actor-Critic schema: agent composed by two modules Critic, learns to approximate the reward of an action on a given state (approximation of the environment) Actor, given a state, learns to produce actions that maximize the value estimated by the critic Deep Deterministic Policy Gradient algorithm to produce continuous actions Customized extension in order to ensure compliance with ﬁnancial constraints Figure 1: The agent environment - interaction in reinforcement lear In Reinforcement Learning, the conceptual framing of the problem is based interaction of an agent with an environment. The agent can perform actions that m ment, and in turn can receive updated perceptions of the evolution of the environme diagram is depicted in Figure 1. Besides perceptions, at every iteration the agent receives a reward - a scalar va the desirability of the current situation. The reward is used by the agent in or good a sequence of actions (often called strategy) is. The starting point is a bla agent’s internal parameters randomly initialized to close-to-zero values; in the R corresponds to an agent behaving randomly, all actions being initially equivalentl time the agent performs an action and receives an updated description of the wo a reward value, these bits of knowledge are used to improve the agent’s internal co actions and their e↵ects. Despite this single learning step being fundamentally supe mentioning that many RL problems present credit assignment problems: the positive of a strategy is often evident (and modeled by a reward) after several actions are exe 13

Financial Constraints 1. Structural, keep problem mathematically sound, e.g., allocation should be positive and sum to one (implemented via softmax architecture) 2. Parametric, restrict the allocation exposition to desirable ranges, e.g., equity below 14%, and sum of all bonds between 20% and 80% (implemented via regularization) 3. State-dependent, depend on the current state of the simulation, e.g., portfolio turnover limited to 10% of the current portfolio value (implemented via optimization and projection) Additional regulatory constraints considered explicitly Keep the current discounted value of future liabilities and the market value of the assets close Capital injection/ejection to keep the constraint satisﬁed (injection equivalent to borrowing cash)

Test Scenarios Simplifying assumptions Single-decision bandit Asset allocation chosen only at t=0 No rebalancing Assets are sold to replenish cash at t>0 whenever it becomes negative Two scenarios 3 assets: optimal solution known with 1% precision 6 assets: optimal solution unfeasible with exhaustive search Warm-up strategy (pre-training) for the critic network

Scenario 1 Three assets: Cash, Equity, Bond Parametric constraint sets 0.17 as upper bound for equity Ground truth via set of simulations with 0.01 grid step (5151 actions) Extract coarser results by increasing grid step size Use coarsest grid (step = 0.20) for warm-up phase of the Critic

Scenario 1: Ground Truth t action found with the given grid, correspond mated on the 500 ﬁxed realizations of the econ Step # actions Best action Best reward 0.20 21 [0.0, 0.20, 0.80] 2.552 0.10 66 [0.0, 0.10, 0.90] 2.707 0.05 231 [0.0, 0.15, 0.85] 2.790 0.02 1326 [0.0 ,0.16, 0.84] 2.799 0.01 5151 [0.0 ,0.17, 0.83] 2.811

Scenario 2 Assets = cash, and Italian Bonds with 3, 5, 10, 20, 30 years tenors No parametric constraints: any part of action space might contain optimum λ set to high value of 4 to avoid optimal solution of only most proﬁtable and most risky asset (30y bond) Negative cash ﬂows concentrated at 5 and 10 years

Summary RL for asset-liability management of insurance company Minimal assumptions: generally applicable to any asset, liability, and economic scenario Improves on mean-variance optimization via grid Monte-Carlo simulations Designed for complex multi-period optimization (testing w.i.p.) Risk-adjusted optimization problem could be integrated in MDP formulation

It's a Relationship Communicate effectively Common language Show that you care Understand the problem Acquire domain knowledge Build trust Deliver on your promises

It's Research "Works on Paper" 10x more effort to run on realistic scenarios Scope creep "appetite comes with eating" No ground truths Hard to generate trust