Slide 1

Slide 1 text

REINFORCEMENT LEARNING BASED SIMULATION OF MARKETS Sumitra Ganesh, Ph.D. Research Director - Multi-agent Learning & Simulation JPMorgan AI Research

Slide 2

Slide 2 text

REINFORCEMENT LEARNING - IN THEORY

Slide 3

Slide 3 text

REINFORCEMENT LEARNING - IN PRACTICE

Slide 4

Slide 4 text

WHAT THIS TALK IS ABOUT If your environment is composed of multiple agents … How do you model the system of interacting agents ?

Slide 5

Slide 5 text

EXAMPLE: SIMPLE MARKET IN PRINTER PAPER Shops have to decide how to price and manage inventory Customers have to decide whether to buy or not, and from which shop Shops Customers Alice Bob Carl X Y Z

Slide 6

Slide 6 text

NOT ALL AGENTS ARE THE SAME ! Customers could have different preferences on cost vs. delivery time Shops could have different costs / constraints for storage Agents could have varying levels of connectivity and hence access to information Shops Customers Alice Bob Carl X Y Z Heterogeneity + interactions -> complex real-world dynamics

Slide 7

Slide 7 text

WHAT WOULD HAPPEN IF SHOP “Z” CHANGED ITS PRICING ? Shops Customers Depends on how Alice and Bob change their buying behavior in response Depends on how X and Y change their pricing behavior in response Alice Bob Carl X Y Z Need to simulate how other agents would respond to evaluate impact of change on sales, revenue, marketshare

Slide 8

Slide 8 text

OUR APPROACH: USE RL TO BUILD AN AGENT-BASED SIMULATOR How do agents differ from each other ? Agents have different reward functions and connectivity How do agents behave? How do these behaviors evolve? Use RL to automatically learn agent behaviors/policies What types of agents are present in the system? Use RL to calibrate the agent composition using real data 1 2 3

Slide 9

Slide 9 text

USING REWARD FUNCTIONS TO SPECIFY AGENT TYPES

Slide 10

Slide 10 text

HOW DO AGENTS DIFFER FROM EACH OTHER ? Connectivity: Agents have different levels of connectivity to other agents Constrains agent actions and amount of information Specify a deterministic / stochastic network between agents Reward: Agents make different tradeoffs due to preferences / constraints Assumption: rational agents, will try to maximize its reward w1 ⋅ Sales Revenue − w2 ⋅ Storage Cost − w3 ⋅ Shortfall −w1 ⋅ Cost − w2 ⋅ Delivery Time Shop: Customer:

Slide 11

Slide 11 text

EXAMPLE: FINANCIAL MARKET IN A SINGLE ASSET Market makers have to decide how to price and manage inventory Investors have to decide whether to buy / sell / do nothing Key differences: All agents can both buy and sell; inventory can be negative Value of inventory can fluctuate with time - inventory P&L and risk Reference price (exchange) Market makers Investors Alice Bob Carl X Y Z

Slide 12

Slide 12 text

MARKET MAKER AND INVESTOR TYPES Reference price (exchange) Market maker reward terms: Sales revenue Inventory P&L Risk Market share target Investor reward terms: Execution cost Inventory P&L Risk Quantity target Market makers Investors Alice Bob Carl X Y Z

Slide 13

Slide 13 text

USING RL TO LEARN AGENT BEHAVIORS

Slide 14

Slide 14 text

LEARN SEPARATE POLICIES FOR EACH AGENT ? (s(1) t , a(1) t , R(1) t ) π1 (s(2) t , a(2) t , R(2) t ) (s(3) t , a(3) t , R(3) t ) π2 π3 Agent Policy Hard to scale Unstable Doesn’t leverage the commonality across agents Agent Experience

Slide 15

Slide 15 text

SHARED POLICY LEARNING πθ (a|s, λ) Learn a shared policy across agents that takes agent type as input λ Policy is trained with the local experiences of all the agents simultaneously using a policy gradient method Gθ = 1 N N ∑ i=1 T ∑ t=0 ∇θ ln πθ (a(i) t |s(i) t , λi) ⋅ R(i) t } gradient for i-th agent avg across agents

Slide 16

Slide 16 text

RL-BASED INVESTOR LEARNS TO STRATEGICALLY TIME TRADES w=0.25 w=0 w=0.75 w=1.00 Quantity target: 75% buy, 25% sell ; w = weight for PnL in reward function Matches targets but strategic timing of trades Ignores targets; opportunistic buying/selling to maximize PnL

Slide 17

Slide 17 text

RL-BASED MARKET MAKER LEARNS TO SKEW PRICING The shared policy learns to vary the intensity of price skewing depending on the agent’s risk aversion and connectivity Market makers learn to sell at a discount (skew pricing) when they have a large inventory ..to get rid of it faster More favorable sell price More favorable buy price Inventory Increasing connectivity Low High Connectivity

Slide 18

Slide 18 text

USING RL TO CALIBRATE

Slide 19

Slide 19 text

WHY IS THIS HARD ? Agent composition Calibration algo Agent behaviors depend on the types of agent present in the system Slow - agents have to learn and reach equilibrium for each agent composition

Slide 20

Slide 20 text

SIMULTANEOUSLY LEARN AGENT COMPOSITION AND BEHAVIORS RL-based calibrator picks di ff erent actions - agent compositions Receives reward based on calibration loss Two-time scale - calibrator learns at a slower pace than agents in the system So agents are able to adapt their policy to the new composition Agent composition

Slide 21

Slide 21 text

OUR APPROACH OUTPERFORMS BAYESIAN OPTIMIZATION Fit marketshare and fl ow vs. price curves using real foreign exchange market data from [1]. [1] Barzykin, A., Bergault, P., and Guéant, O. (2021b). Market making by an fx dealer: tiers, pricing ladders and hedging rates for optimal risk control. http://arxiv.org/abs/2112.02269

Slide 22

Slide 22 text

PHANTOM Framework to build RL-driven multi-agent simulations Built to scale using Ray/Rllib Interested in using? Contact: [email protected] PUBLICATIONS: Calibration of Shared Equilibria in General Sum Partially Observable Markov Games, NeurIPS 2020, Nelson Vadori, Sumitra Ganesh, Prashant Reddy, Manuela Veloso https://arxiv.org/abs/ 2006.13085 Towards a fully RL-based Market Simulator, ICAIF 2021, Leo Ardon, Nelson Vadori, Thomas Spooner, Mengda Xu, Jared Vann, Sumitra Ganesh https://arxiv.org/abs/2110.06829

Slide 23

Slide 23 text

THANKS!