Reinforcement learning-based simulation of markets

REINFORCEMENT LEARNING BASED SIMULATION OF MARKETS Sumitra Ganesh, Ph.D. Research
Director - Multi-agent Learning & Simulation JPMorgan AI Research

REINFORCEMENT LEARNING - IN THEORY

REINFORCEMENT LEARNING - IN PRACTICE

WHAT THIS TALK IS ABOUT If your environment is composed
of multiple agents … How do you model the system of interacting agents ?

EXAMPLE: SIMPLE MARKET IN PRINTER PAPER Shops have to decide
how to price and manage inventory Customers have to decide whether to buy or not, and from which shop Shops Customers Alice Bob Carl X Y Z

NOT ALL AGENTS ARE THE SAME ! Customers could have
different preferences on cost vs. delivery time Shops could have different costs / constraints for storage Agents could have varying levels of connectivity and hence access to information Shops Customers Alice Bob Carl X Y Z Heterogeneity + interactions -> complex real-world dynamics

WHAT WOULD HAPPEN IF SHOP “Z” CHANGED ITS PRICING ?
Shops Customers Depends on how Alice and Bob change their buying behavior in response Depends on how X and Y change their pricing behavior in response Alice Bob Carl X Y Z Need to simulate how other agents would respond to evaluate impact of change on sales, revenue, marketshare

OUR APPROACH: USE RL TO BUILD AN AGENT-BASED SIMULATOR How
do agents differ from each other ? Agents have different reward functions and connectivity How do agents behave? How do these behaviors evolve? Use RL to automatically learn agent behaviors/policies What types of agents are present in the system? Use RL to calibrate the agent composition using real data 1 2 3

USING REWARD FUNCTIONS TO SPECIFY AGENT TYPES

HOW DO AGENTS DIFFER FROM EACH OTHER ? Connectivity: Agents
have different levels of connectivity to other agents Constrains agent actions and amount of information Specify a deterministic / stochastic network between agents Reward: Agents make different tradeoffs due to preferences / constraints Assumption: rational agents, will try to maximize its reward w1 ⋅ Sales Revenue − w2 ⋅ Storage Cost − w3 ⋅ Shortfall −w1 ⋅ Cost − w2 ⋅ Delivery Time Shop: Customer:

EXAMPLE: FINANCIAL MARKET IN A SINGLE ASSET Market makers have
to decide how to price and manage inventory Investors have to decide whether to buy / sell / do nothing Key differences: All agents can both buy and sell; inventory can be negative Value of inventory can fluctuate with time - inventory P&L and risk Reference price (exchange) Market makers Investors Alice Bob Carl X Y Z

MARKET MAKER AND INVESTOR TYPES Reference price (exchange) Market maker
reward terms: Sales revenue Inventory P&L Risk Market share target Investor reward terms: Execution cost Inventory P&L Risk Quantity target Market makers Investors Alice Bob Carl X Y Z

USING RL TO LEARN AGENT BEHAVIORS

LEARN SEPARATE POLICIES FOR EACH AGENT ? (s(1) t ,
a(1) t , R(1) t ) π1 (s(2) t , a(2) t , R(2) t ) (s(3) t , a(3) t , R(3) t ) π2 π3 Agent Policy Hard to scale Unstable Doesn’t leverage the commonality across agents Agent Experience

SHARED POLICY LEARNING πθ (a|s, λ) Learn a shared policy
across agents that takes agent type as input λ Policy is trained with the local experiences of all the agents simultaneously using a policy gradient method Gθ = 1 N N ∑ i=1 T ∑ t=0 ∇θ ln πθ (a(i) t |s(i) t , λi) ⋅ R(i) t } gradient for i-th agent avg across agents

RL-BASED INVESTOR LEARNS TO STRATEGICALLY TIME TRADES w=0.25 w=0 w=0.75
w=1.00 Quantity target: 75% buy, 25% sell ; w = weight for PnL in reward function Matches targets but strategic timing of trades Ignores targets; opportunistic buying/selling to maximize PnL

RL-BASED MARKET MAKER LEARNS TO SKEW PRICING The shared policy
learns to vary the intensity of price skewing depending on the agent’s risk aversion and connectivity Market makers learn to sell at a discount (skew pricing) when they have a large inventory ..to get rid of it faster More favorable sell price More favorable buy price Inventory Increasing connectivity Low High Connectivity

USING RL TO CALIBRATE

WHY IS THIS HARD ? Agent composition Calibration algo Agent
behaviors depend on the types of agent present in the system Slow - agents have to learn and reach equilibrium for each agent composition

SIMULTANEOUSLY LEARN AGENT COMPOSITION AND BEHAVIORS RL-based calibrator picks di
ff erent actions - agent compositions Receives reward based on calibration loss Two-time scale - calibrator learns at a slower pace than agents in the system So agents are able to adapt their policy to the new composition Agent composition

OUR APPROACH OUTPERFORMS BAYESIAN OPTIMIZATION Fit marketshare and fl ow
vs. price curves using real foreign exchange market data from [1]. [1] Barzykin, A., Bergault, P., and Guéant, O. (2021b). Market making by an fx dealer: tiers, pricing ladders and hedging rates for optimal risk control. http://arxiv.org/abs/2112.02269

PHANTOM Framework to build RL-driven multi-agent simulations Built to scale
using Ray/Rllib Interested in using? Contact: [email protected] PUBLICATIONS: Calibration of Shared Equilibria in General Sum Partially Observable Markov Games, NeurIPS 2020, Nelson Vadori, Sumitra Ganesh, Prashant Reddy, Manuela Veloso https://arxiv.org/abs/ 2006.13085 Towards a fully RL-based Market Simulator, ICAIF 2021, Leo Ardon, Nelson Vadori, Thomas Spooner, Mengda Xu, Jared Vann, Sumitra Ganesh https://arxiv.org/abs/2110.06829

THANKS!

Reinforcement learning-based simulation of markets

Reinforcement learning-based simulation of markets

Anyscale

More Decks by Anyscale

Other Decks in Technology

Featured

Transcript