Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Reinforcement learning-based simulation of markets

Anyscale
April 05, 2022

Reinforcement learning-based simulation of markets

Agent-based simulation of markets provides a useful tool for policy optimization, counterfactual analysis, and market mechanism design. In this talk, we will present our work in modeling complex economic systems as a network of heterogeneous, utility maximizing agents with partial observability. We demonstrate how reinforcement learning (RL) can be used to solve two primary challenges in agent-based modeling — finding the equilibrium with multiple strategic agents and calibrating the model using real data. These techniques have been useful in enabling the practical application of agent-based modeling.

Anyscale

April 05, 2022
Tweet

More Decks by Anyscale

Other Decks in Technology

Transcript

  1. REINFORCEMENT LEARNING BASED SIMULATION OF MARKETS Sumitra Ganesh, Ph.D. Research

    Director - Multi-agent Learning & Simulation JPMorgan AI Research
  2. WHAT THIS TALK IS ABOUT If your environment is composed

    of multiple agents … How do you model the system of interacting agents ?
  3. EXAMPLE: SIMPLE MARKET IN PRINTER PAPER Shops have to decide

    how to price and manage inventory Customers have to decide whether to buy or not, and from which shop Shops Customers Alice Bob Carl X Y Z
  4. NOT ALL AGENTS ARE THE SAME ! Customers could have

    different preferences on cost vs. delivery time Shops could have different costs / constraints for storage Agents could have varying levels of connectivity and hence access to information Shops Customers Alice Bob Carl X Y Z Heterogeneity + interactions -> complex real-world dynamics
  5. WHAT WOULD HAPPEN IF SHOP “Z” CHANGED ITS PRICING ?

    Shops Customers Depends on how Alice and Bob change their buying behavior in response Depends on how X and Y change their pricing behavior in response Alice Bob Carl X Y Z Need to simulate how other agents would respond to evaluate impact of change on sales, revenue, marketshare
  6. OUR APPROACH: USE RL TO BUILD AN AGENT-BASED SIMULATOR How

    do agents differ from each other ? Agents have different reward functions and connectivity How do agents behave? How do these behaviors evolve? Use RL to automatically learn agent behaviors/policies What types of agents are present in the system? Use RL to calibrate the agent composition using real data 1 2 3
  7. HOW DO AGENTS DIFFER FROM EACH OTHER ? Connectivity: Agents

    have different levels of connectivity to other agents Constrains agent actions and amount of information Specify a deterministic / stochastic network between agents Reward: Agents make different tradeoffs due to preferences / constraints Assumption: rational agents, will try to maximize its reward w1 ⋅ Sales Revenue − w2 ⋅ Storage Cost − w3 ⋅ Shortfall −w1 ⋅ Cost − w2 ⋅ Delivery Time Shop: Customer:
  8. EXAMPLE: FINANCIAL MARKET IN A SINGLE ASSET Market makers have

    to decide how to price and manage inventory Investors have to decide whether to buy / sell / do nothing Key differences: All agents can both buy and sell; inventory can be negative Value of inventory can fluctuate with time - inventory P&L and risk Reference price (exchange) Market makers Investors Alice Bob Carl X Y Z
  9. MARKET MAKER AND INVESTOR TYPES Reference price (exchange) Market maker

    reward terms: Sales revenue Inventory P&L Risk Market share target Investor reward terms: Execution cost Inventory P&L Risk Quantity target Market makers Investors Alice Bob Carl X Y Z
  10. LEARN SEPARATE POLICIES FOR EACH AGENT ? (s(1) t ,

    a(1) t , R(1) t ) π1 (s(2) t , a(2) t , R(2) t ) (s(3) t , a(3) t , R(3) t ) π2 π3 Agent Policy Hard to scale Unstable Doesn’t leverage the commonality across agents Agent Experience
  11. SHARED POLICY LEARNING πθ (a|s, λ) Learn a shared policy

    across agents that takes agent type as input λ Policy is trained with the local experiences of all the agents simultaneously using a policy gradient method Gθ = 1 N N ∑ i=1 T ∑ t=0 ∇θ ln πθ (a(i) t |s(i) t , λi) ⋅ R(i) t } gradient for i-th agent avg across agents
  12. RL-BASED INVESTOR LEARNS TO STRATEGICALLY TIME TRADES w=0.25 w=0 w=0.75

    w=1.00 Quantity target: 75% buy, 25% sell ; w = weight for PnL in reward function Matches targets but strategic timing of trades Ignores targets; opportunistic buying/selling to maximize PnL
  13. RL-BASED MARKET MAKER LEARNS TO SKEW PRICING The shared policy

    learns to vary the intensity of price skewing depending on the agent’s risk aversion and connectivity Market makers learn to sell at a discount (skew pricing) when they have a large inventory ..to get rid of it faster More favorable sell price More favorable buy price Inventory Increasing connectivity Low High Connectivity
  14. WHY IS THIS HARD ? Agent composition Calibration algo Agent

    behaviors depend on the types of agent present in the system Slow - agents have to learn and reach equilibrium for each agent composition
  15. SIMULTANEOUSLY LEARN AGENT COMPOSITION AND BEHAVIORS RL-based calibrator picks di

    ff erent actions - agent compositions Receives reward based on calibration loss Two-time scale - calibrator learns at a slower pace than agents in the system So agents are able to adapt their policy to the new composition Agent composition
  16. OUR APPROACH OUTPERFORMS BAYESIAN OPTIMIZATION Fit marketshare and fl ow

    vs. price curves using real foreign exchange market data from [1]. [1] Barzykin, A., Bergault, P., and Guéant, O. (2021b). Market making by an fx dealer: tiers, pricing ladders and hedging rates for optimal risk control. http://arxiv.org/abs/2112.02269
  17. PHANTOM Framework to build RL-driven multi-agent simulations Built to scale

    using Ray/Rllib Interested in using? Contact: [email protected] PUBLICATIONS: Calibration of Shared Equilibria in General Sum Partially Observable Markov Games, NeurIPS 2020, Nelson Vadori, Sumitra Ganesh, Prashant Reddy, Manuela Veloso https://arxiv.org/abs/ 2006.13085 Towards a fully RL-based Market Simulator, ICAIF 2021, Leo Ardon, Nelson Vadori, Thomas Spooner, Mengda Xu, Jared Vann, Sumitra Ganesh https://arxiv.org/abs/2110.06829