Slide 1

Slide 1 text

DeepStack: Expert-Level Artificial Intelligence in Heads-Up No-Limit Poker @tsantero PWL SF March 30, 2017

Slide 2

Slide 2 text

Texas Hold’em "truly a 'sawdust joint, with…oiled sawdust covering the floors." originated in Texas one of many variations
 of the game of poker most widely played card
 game in the world

Slide 3

Slide 3 text

Texas Hold’em originated in Texas one of many variations
 of the game of poker most widely played card
 game in the world

Slide 4

Slide 4 text

1 2 3 4 5 6 7 8 9 10

Slide 5

Slide 5 text

1 2 3 4 5 6 7 8 9 10 gameplay moves clockwise

Slide 6

Slide 6 text

D SB BB Blinds: 1/2

Slide 7

Slide 7 text

D SB BB

Slide 8

Slide 8 text

BB D SB

Slide 9

Slide 9 text

SB BB D

Slide 10

Slide 10 text

flop turn river hole cards ∀players player actions: check, bet, raise, fold

Slide 11

Slide 11 text

“The honor’s in the dollar, kid.”
 —Seth Davis, Boiler Room (2000) Keeping Score in Hold’em ________________ Cost of Big Blind Chip Stack = # BBs ________________ 100 BB = mBB

Slide 12

Slide 12 text

For example…..

Slide 13

Slide 13 text

tsantero’s hand cowboy’s hand Preflop: 
 I raise to $20
 cowboy raises to $45 I call all other players fold Odds of flopping:
 Trips 29 to 1 Straight 75 to 1 Flush 118 to 1 6 6 5 5 ♠ ♠ ♠ ♠ ♠ ♠ ♠ ♠ ♠ ♠ ♠

Slide 14

Slide 14 text

tsantero’s hand 6 6 5 5 ♠ ♠ ♠ ♠ ♠ ♠ ♠ ♠ ♠ ♠ ♠ 3 3 2 2 4 4 ♣ ♣ ♣ ❤ ❤ ♦ ♦ ♦ ♦ pot size: $90 I bet $65 cowboy calls

Slide 15

Slide 15 text

tsantero’s hand 6 6 5 5 ♠ ♠ ♠ ♠ ♠ ♠ ♠ ♠ ♠ ♠ ♠ 3 3 2 2 4 4 ♣ ♣ ♣ ❤ ❤ ♦ ♦ ♦ ♦ pot size: $210 I shove all in ($420) cowboy calls J J

Slide 16

Slide 16 text

tsantero’s hand 6 6 5 5 ♠ ♠ ♠ ♠ ♠ ♠ ♠ ♠ ♠ ♠ ♠ 3 3 2 2 4 4 ♣ ♣ ♣ ❤ ❤ ♦ ♦ ♦ ♦ J J cowboy’s hand 3 3 ♠ ♠ ♠ 3 3 ♦ ♦ ♦ 4/38 ~ 10%

Slide 17

Slide 17 text

tsantero’s hand 6 6 5 5 ♠ ♠ ♠ ♠ ♠ ♠ ♠ ♠ ♠ ♠ ♠ 3 3 2 2 4 4 ♣ ♣ ♣ ❤ ❤ ♦ ♦ ♦ ♦ J J cowboy’s hand 3 3 ♠ ♠ ♠ 3 3 ♦ ♦ ♦ J J

Slide 18

Slide 18 text

No content

Slide 19

Slide 19 text


 Imperfect information games require more complex reasoning than similarly sized perfect information games. The correct decision at a particular moment depends upon the probability distribution over private information that the opponent holds, which is revealed through their past actions.


Slide 20

Slide 20 text

Heads-Up No Limit Texas Holdem: total number of decision points: 10160 (10170 in Go) past approaches include pre-solving the entire game or abstraction via Counterfactual Regret Minimization (CFR) DeepStack: attempts to solve for a Nash Equilibrium

Slide 21

Slide 21 text

It’s time for some game theory.

Slide 22

Slide 22 text

Nash Equilibrium Let (S, f) represent a game with n players Si = strategy for player i such that: S = S1 x S2 x S3 … x Sn
 is the set of strategy profiles f(x) = (f1(x), f2(x), …fn(x)) is the payoff function
 evaluated at x ∈ S

Slide 23

Slide 23 text

Nash Equilibrium contd. Let xi equal S for player i and x-i equal S for all other players IFF player i ∈ {1, … n} chooses strategy xi then payout function = fi(x) ; it follows that x* ∈ S will produce a Nash Equilibrium such that ∀i, xi ∈ Si : fi(xi*, x-i*) ≥ fi(xi*, x-i*)

Slide 24

Slide 24 text

Nash Equilibrium contd. ∀i, xi ∈ Si : fi(xi*, x-i*) ≥ fi(xi*, x-i*) the strategy profile will produce a Nash Equilibrium so long as no unilateral deviation from the strategy by any single player results in an improved payoff function

Slide 25

Slide 25 text

CONFESS CONFESS LIE LIE -6, -6 0, -10 -10, 0 -2, -2 Prisoner 2 Prisoner 1

Slide 26

Slide 26 text

Oskar Morgenstern and John von Neumann

Slide 27

Slide 27 text

Oskar Morgenstern and John von Neumann a mixed strategy Nash Equilibrium will exist in any zero-sum game involving a finite set of actions

Slide 28

Slide 28 text

a mixed strategy Nash Equilibrium will exist in any zero-sum game involving a finite set of actions K K K K decision to bet/call, raise, or fold will vary depending on the situation 5 5 ♠ ♠ ♠ ♠ ♠ K K J J

Slide 29

Slide 29 text

a mixed strategy Nash Equilibrium will exist in any zero-sum game involving a finite set of actions K K K K decision to bet/call, raise, or fold will vary depending on the situation 5 5 ♦ ♦ ♦ ♦ A A J J ♣ ♦

Slide 30

Slide 30 text

No content

Slide 31

Slide 31 text

Exploitable vs Non-exploitable play a player’s strategy: probability distribution over range of public actions
 determine a player’s range: function of the card’s they’ll play for what price expected utility: estimation of the payout function

Slide 32

Slide 32 text

Exploitable vs Non-exploitable play a Nash Equilibrium strategy: seeks to maximize expected utility against a
 best-response opponent strategy exploitability: difference in expected utility of a best-response strategy and the expected utility under a NE

Slide 33

Slide 33 text

DeepStack: limited exploitability via
 heuristic search local strategy: computation over the current public state depth-limited lookahead: learned value function restricted set of lookahead actions: difference in expected utility of a best-response strategy and the expected utility under a NE

Slide 34

Slide 34 text

Continual Resolving initial state: strategy is determined by our range of play next actions: discard previous strategy and determine action using CFR algorithm. input: our stored range and a vector of opponent counterfactual values update our range and opponent counter factual values

Slide 35

Slide 35 text

Rules for Resolving our action: replace opponent counterfactual values with those computed in the re-solved strategy for our chosen action update our own range using the computed strategy and Bayes’ rule chance action: replace the opponent counterfactual values with those computed for this chance action from the last resolve update our own range by zeroing out impossible hands given the new public state

Slide 36

Slide 36 text

Limited Depth Lookahead “intuition” neural net: learned counterfactual function that approximates the resulting values if the public state were to be solved with the current iteration’s ranges input: public state = players’ ranges, pot size, public cards output: vector of counterfactual values for holding any hand for both players

Slide 37

Slide 37 text

Sparse Lookahead Trees reduction of actions: generate a space lookahead tree limited to only 4 actions: fold, call, 2 or 3 bet, all-in optimization for decision speed: allows DeepStack to play at conventional human speed decision points reduced form 10160 to 107 can be solved in < 5 seconds using a GeForce GTX 1080

Slide 38

Slide 38 text

Training the NN

Slide 39

Slide 39 text

Results

Slide 40

Slide 40 text

Results

Slide 41

Slide 41 text

Questions?