Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DeepStack: Expert-Level Artificial Intelligence in Heads-Up No-Limit Poker

DeepStack: Expert-Level Artificial Intelligence in Heads-Up No-Limit Poker

PWL SF Talk

Tom Santero

March 30, 2017
Tweet

More Decks by Tom Santero

Other Decks in Technology

Transcript

  1. DeepStack: Expert-Level Artificial Intelligence
    in Heads-Up No-Limit Poker
    @tsantero
    PWL SF March 30, 2017

    View full-size slide

  2. Texas Hold’em
    "truly a 'sawdust joint, with…oiled sawdust covering the floors."
    originated in Texas
    one of many variations

    of the game of poker
    most widely played card

    game in the world

    View full-size slide

  3. Texas Hold’em
    originated in Texas
    one of many variations

    of the game of poker
    most widely played card

    game in the world

    View full-size slide

  4. 1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    gameplay moves clockwise

    View full-size slide

  5. D
    SB
    BB
    Blinds: 1/2

    View full-size slide

  6. flop turn river
    hole cards ∀players
    player actions:
    check, bet,
    raise, fold

    View full-size slide

  7. “The honor’s in the dollar, kid.”

    —Seth Davis, Boiler Room (2000)
    Keeping Score in Hold’em
    ________________
    Cost of Big Blind
    Chip Stack = # BBs
    ________________
    100
    BB = mBB

    View full-size slide

  8. For example…..

    View full-size slide

  9. tsantero’s hand cowboy’s hand
    Preflop: 

    I raise to $20

    cowboy raises to $45
    I call
    all other players fold
    Odds of flopping:

    Trips 29 to 1
    Straight 75 to 1
    Flush 118 to 1
    6
    6
    5
    5
    ♠ ♠
    ♠ ♠
    ♠ ♠
    ♠ ♠

    ♠ ♠

    View full-size slide

  10. tsantero’s hand
    6
    6
    5
    5
    ♠ ♠
    ♠ ♠
    ♠ ♠
    ♠ ♠

    ♠ ♠
    3
    3
    2
    2
    4
    4





    ♦ ♦
    ♦ ♦
    pot size: $90
    I bet $65
    cowboy calls

    View full-size slide

  11. tsantero’s hand
    6
    6
    5
    5
    ♠ ♠
    ♠ ♠
    ♠ ♠
    ♠ ♠

    ♠ ♠
    3
    3
    2
    2
    4
    4





    ♦ ♦
    ♦ ♦
    pot size: $210
    I shove all in ($420)
    cowboy calls
    J
    J

    View full-size slide

  12. tsantero’s hand
    6
    6
    5
    5
    ♠ ♠
    ♠ ♠
    ♠ ♠
    ♠ ♠

    ♠ ♠
    3
    3
    2
    2
    4
    4





    ♦ ♦
    ♦ ♦
    J
    J
    cowboy’s hand
    3
    3



    3
    3



    4/38 ~ 10%

    View full-size slide

  13. tsantero’s hand
    6
    6
    5
    5
    ♠ ♠
    ♠ ♠
    ♠ ♠
    ♠ ♠

    ♠ ♠
    3
    3
    2
    2
    4
    4





    ♦ ♦
    ♦ ♦
    J
    J
    cowboy’s hand
    3
    3



    3
    3



    J
    J

    View full-size slide


  14. Imperfect information games require more
    complex reasoning than similarly sized perfect
    information games.
    The correct decision at a particular moment
    depends upon the probability distribution over
    private information that the opponent holds,
    which is revealed through their past actions.


    View full-size slide

  15. Heads-Up No Limit Texas Holdem:
    total number of decision points: 10160 (10170 in Go)
    past approaches include pre-solving the entire game or
    abstraction via Counterfactual Regret Minimization (CFR)
    DeepStack: attempts to solve for a Nash Equilibrium

    View full-size slide

  16. It’s time for some
    game theory.

    View full-size slide

  17. Nash Equilibrium
    Let (S, f) represent a game with n players
    Si = strategy for player i such that:
    S = S1 x S2 x S3 … x Sn

    is the set of strategy profiles
    f(x) = (f1(x), f2(x), …fn(x))
    is the payoff function

    evaluated at x ∈ S

    View full-size slide

  18. Nash Equilibrium contd.
    Let xi equal S for player i and x-i equal S for all other players
    IFF player i ∈ {1, … n} chooses strategy xi then payout function = fi(x) ;
    it follows that x* ∈ S will produce a Nash Equilibrium such that
    ∀i, xi ∈ Si : fi(xi*, x-i*) ≥ fi(xi*, x-i*)

    View full-size slide

  19. Nash Equilibrium contd.
    ∀i, xi ∈ Si : fi(xi*, x-i*) ≥ fi(xi*, x-i*)
    the strategy profile will produce
    a Nash Equilibrium so long as no
    unilateral deviation from the strategy by any single player
    results in an improved payoff function

    View full-size slide

  20. CONFESS
    CONFESS
    LIE
    LIE
    -6, -6 0, -10
    -10, 0 -2, -2
    Prisoner 2
    Prisoner 1

    View full-size slide

  21. Oskar Morgenstern and John von Neumann

    View full-size slide

  22. Oskar Morgenstern and John von Neumann
    a mixed strategy
    Nash Equilibrium
    will exist in any
    zero-sum game
    involving a finite
    set of actions

    View full-size slide

  23. a mixed strategy
    Nash Equilibrium
    will exist in any
    zero-sum game
    involving a finite
    set of actions
    K
    K
    K
    K
    decision to bet/call, raise, or fold
    will vary depending on the situation
    5
    5
    ♠ ♠

    ♠ ♠
    K
    K
    J
    J

    View full-size slide

  24. a mixed strategy
    Nash Equilibrium
    will exist in any
    zero-sum game
    involving a finite
    set of actions
    K
    K
    K
    K
    decision to bet/call, raise, or fold
    will vary depending on the situation
    5
    5


    ♦ ♦
    A
    A
    J
    J
    ♣ ♦

    View full-size slide

  25. Exploitable vs Non-exploitable play
    a player’s strategy:
    probability distribution over range of public actions

    determine a player’s range:
    function of the card’s they’ll play for what price
    expected utility:
    estimation of the payout function

    View full-size slide

  26. Exploitable vs Non-exploitable play
    a Nash Equilibrium strategy:
    seeks to maximize expected utility against a

    best-response opponent strategy
    exploitability:
    difference in expected utility of a best-response
    strategy and the expected utility under a NE

    View full-size slide

  27. DeepStack: limited exploitability via

    heuristic search
    local strategy:
    computation over the current public state
    depth-limited lookahead:
    learned value function
    restricted set of lookahead actions:
    difference in expected utility of a best-response
    strategy and the expected utility under a NE

    View full-size slide

  28. Continual Resolving
    initial state:
    strategy is determined by our range of play
    next actions:
    discard previous strategy and determine action
    using CFR algorithm.
    input: our stored range and a vector of opponent
    counterfactual values
    update our range and opponent counter factual values

    View full-size slide

  29. Rules for Resolving
    our action:
    replace opponent counterfactual values with those
    computed in the re-solved strategy for our chosen action
    update our own range using the computed strategy and Bayes’ rule
    chance action:
    replace the opponent counterfactual values with those computed
    for this chance action from the last resolve
    update our own range by zeroing out impossible hands given
    the new public state

    View full-size slide

  30. Limited Depth Lookahead “intuition”
    neural net:
    learned counterfactual function that approximates the
    resulting values if the public state were to be solved
    with the current iteration’s ranges
    input:
    public state = players’ ranges, pot size, public cards
    output:
    vector of counterfactual values for holding any hand
    for both players

    View full-size slide

  31. Sparse Lookahead Trees
    reduction of actions:
    generate a space lookahead tree limited to only 4 actions:
    fold, call, 2 or 3 bet, all-in
    optimization for decision speed:
    allows DeepStack to play at conventional human speed
    decision points reduced form 10160 to 107
    can be solved in < 5 seconds using a GeForce GTX 1080

    View full-size slide

  32. Training the NN

    View full-size slide