Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DeepStack: Expert-Level Artificial Intelligence in Heads-Up No-Limit Poker

DeepStack: Expert-Level Artificial Intelligence in Heads-Up No-Limit Poker

PWL SF Talk

Tom Santero

March 30, 2017
Tweet

More Decks by Tom Santero

Other Decks in Technology

Transcript

  1. DeepStack: Expert-Level Artificial Intelligence
    in Heads-Up No-Limit Poker
    @tsantero
    PWL SF March 30, 2017

    View Slide

  2. Texas Hold’em
    "truly a 'sawdust joint, with…oiled sawdust covering the floors."
    originated in Texas
    one of many variations

    of the game of poker
    most widely played card

    game in the world

    View Slide

  3. Texas Hold’em
    originated in Texas
    one of many variations

    of the game of poker
    most widely played card

    game in the world

    View Slide

  4. 1
    2
    3
    4
    5
    6
    7
    8
    9
    10

    View Slide

  5. 1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    gameplay moves clockwise

    View Slide

  6. D
    SB
    BB
    Blinds: 1/2

    View Slide

  7. D
    SB
    BB

    View Slide

  8. BB
    D
    SB

    View Slide

  9. SB
    BB
    D

    View Slide

  10. flop turn river
    hole cards ∀players
    player actions:
    check, bet,
    raise, fold

    View Slide

  11. “The honor’s in the dollar, kid.”

    —Seth Davis, Boiler Room (2000)
    Keeping Score in Hold’em
    ________________
    Cost of Big Blind
    Chip Stack = # BBs
    ________________
    100
    BB = mBB

    View Slide

  12. For example…..

    View Slide

  13. tsantero’s hand cowboy’s hand
    Preflop: 

    I raise to $20

    cowboy raises to $45
    I call
    all other players fold
    Odds of flopping:

    Trips 29 to 1
    Straight 75 to 1
    Flush 118 to 1
    6
    6
    5
    5
    ♠ ♠
    ♠ ♠
    ♠ ♠
    ♠ ♠

    ♠ ♠

    View Slide

  14. tsantero’s hand
    6
    6
    5
    5
    ♠ ♠
    ♠ ♠
    ♠ ♠
    ♠ ♠

    ♠ ♠
    3
    3
    2
    2
    4
    4





    ♦ ♦
    ♦ ♦
    pot size: $90
    I bet $65
    cowboy calls

    View Slide

  15. tsantero’s hand
    6
    6
    5
    5
    ♠ ♠
    ♠ ♠
    ♠ ♠
    ♠ ♠

    ♠ ♠
    3
    3
    2
    2
    4
    4





    ♦ ♦
    ♦ ♦
    pot size: $210
    I shove all in ($420)
    cowboy calls
    J
    J

    View Slide

  16. tsantero’s hand
    6
    6
    5
    5
    ♠ ♠
    ♠ ♠
    ♠ ♠
    ♠ ♠

    ♠ ♠
    3
    3
    2
    2
    4
    4





    ♦ ♦
    ♦ ♦
    J
    J
    cowboy’s hand
    3
    3



    3
    3



    4/38 ~ 10%

    View Slide

  17. tsantero’s hand
    6
    6
    5
    5
    ♠ ♠
    ♠ ♠
    ♠ ♠
    ♠ ♠

    ♠ ♠
    3
    3
    2
    2
    4
    4





    ♦ ♦
    ♦ ♦
    J
    J
    cowboy’s hand
    3
    3



    3
    3



    J
    J

    View Slide

  18. View Slide


  19. Imperfect information games require more
    complex reasoning than similarly sized perfect
    information games.
    The correct decision at a particular moment
    depends upon the probability distribution over
    private information that the opponent holds,
    which is revealed through their past actions.


    View Slide

  20. Heads-Up No Limit Texas Holdem:
    total number of decision points: 10160 (10170 in Go)
    past approaches include pre-solving the entire game or
    abstraction via Counterfactual Regret Minimization (CFR)
    DeepStack: attempts to solve for a Nash Equilibrium

    View Slide

  21. It’s time for some
    game theory.

    View Slide

  22. Nash Equilibrium
    Let (S, f) represent a game with n players
    Si = strategy for player i such that:
    S = S1 x S2 x S3 … x Sn

    is the set of strategy profiles
    f(x) = (f1(x), f2(x), …fn(x))
    is the payoff function

    evaluated at x ∈ S

    View Slide

  23. Nash Equilibrium contd.
    Let xi equal S for player i and x-i equal S for all other players
    IFF player i ∈ {1, … n} chooses strategy xi then payout function = fi(x) ;
    it follows that x* ∈ S will produce a Nash Equilibrium such that
    ∀i, xi ∈ Si : fi(xi*, x-i*) ≥ fi(xi*, x-i*)

    View Slide

  24. Nash Equilibrium contd.
    ∀i, xi ∈ Si : fi(xi*, x-i*) ≥ fi(xi*, x-i*)
    the strategy profile will produce
    a Nash Equilibrium so long as no
    unilateral deviation from the strategy by any single player
    results in an improved payoff function

    View Slide

  25. CONFESS
    CONFESS
    LIE
    LIE
    -6, -6 0, -10
    -10, 0 -2, -2
    Prisoner 2
    Prisoner 1

    View Slide

  26. Oskar Morgenstern and John von Neumann

    View Slide

  27. Oskar Morgenstern and John von Neumann
    a mixed strategy
    Nash Equilibrium
    will exist in any
    zero-sum game
    involving a finite
    set of actions

    View Slide

  28. a mixed strategy
    Nash Equilibrium
    will exist in any
    zero-sum game
    involving a finite
    set of actions
    K
    K
    K
    K
    decision to bet/call, raise, or fold
    will vary depending on the situation
    5
    5
    ♠ ♠

    ♠ ♠
    K
    K
    J
    J

    View Slide

  29. a mixed strategy
    Nash Equilibrium
    will exist in any
    zero-sum game
    involving a finite
    set of actions
    K
    K
    K
    K
    decision to bet/call, raise, or fold
    will vary depending on the situation
    5
    5


    ♦ ♦
    A
    A
    J
    J
    ♣ ♦

    View Slide

  30. View Slide

  31. Exploitable vs Non-exploitable play
    a player’s strategy:
    probability distribution over range of public actions

    determine a player’s range:
    function of the card’s they’ll play for what price
    expected utility:
    estimation of the payout function

    View Slide

  32. Exploitable vs Non-exploitable play
    a Nash Equilibrium strategy:
    seeks to maximize expected utility against a

    best-response opponent strategy
    exploitability:
    difference in expected utility of a best-response
    strategy and the expected utility under a NE

    View Slide

  33. DeepStack: limited exploitability via

    heuristic search
    local strategy:
    computation over the current public state
    depth-limited lookahead:
    learned value function
    restricted set of lookahead actions:
    difference in expected utility of a best-response
    strategy and the expected utility under a NE

    View Slide

  34. Continual Resolving
    initial state:
    strategy is determined by our range of play
    next actions:
    discard previous strategy and determine action
    using CFR algorithm.
    input: our stored range and a vector of opponent
    counterfactual values
    update our range and opponent counter factual values

    View Slide

  35. Rules for Resolving
    our action:
    replace opponent counterfactual values with those
    computed in the re-solved strategy for our chosen action
    update our own range using the computed strategy and Bayes’ rule
    chance action:
    replace the opponent counterfactual values with those computed
    for this chance action from the last resolve
    update our own range by zeroing out impossible hands given
    the new public state

    View Slide

  36. Limited Depth Lookahead “intuition”
    neural net:
    learned counterfactual function that approximates the
    resulting values if the public state were to be solved
    with the current iteration’s ranges
    input:
    public state = players’ ranges, pot size, public cards
    output:
    vector of counterfactual values for holding any hand
    for both players

    View Slide

  37. Sparse Lookahead Trees
    reduction of actions:
    generate a space lookahead tree limited to only 4 actions:
    fold, call, 2 or 3 bet, all-in
    optimization for decision speed:
    allows DeepStack to play at conventional human speed
    decision points reduced form 10160 to 107
    can be solved in < 5 seconds using a GeForce GTX 1080

    View Slide

  38. Training the NN

    View Slide

  39. Results

    View Slide

  40. Results

    View Slide

  41. Questions?

    View Slide