DeepStack: Expert-Level Artificial Intelligence in Heads-Up No-Limit Poker

DeepStack: Expert-Level Artiﬁcial Intelligence in Heads-Up No-Limit Poker @tsantero PWL
SF March 30, 2017

Texas Hold’em "truly a 'sawdust joint, with…oiled sawdust covering the
ﬂoors." originated in Texas one of many variations  of the game of poker most widely played card  game in the world

Texas Hold’em originated in Texas one of many variations  of
the game of poker most widely played card  game in the world

1 2 3 4 5 6 7 8 9 10

1 2 3 4 5 6 7 8 9 10
gameplay moves clockwise

D SB BB Blinds: 1/2

D SB BB

BB D SB

SB BB D

ﬂop turn river hole cards ∀players player actions: check, bet,
raise, fold

“The honor’s in the dollar, kid.”  —Seth Davis, Boiler Room
(2000) Keeping Score in Hold’em ________________ Cost of Big Blind Chip Stack = # BBs ________________ 100 BB = mBB

For example…..

tsantero’s hand cowboy’s hand Preﬂop:   I raise to $20 
cowboy raises to $45 I call all other players fold Odds of ﬂopping:  Trips 29 to 1 Straight 75 to 1 Flush 118 to 1 6 6 5 5 ♠ ♠ ♠ ♠ ♠ ♠ ♠ ♠ ♠ ♠ ♠

tsantero’s hand 6 6 5 5 ♠ ♠ ♠ ♠
♠ ♠ ♠ ♠ ♠ ♠ ♠ 3 3 2 2 4 4 ♣ ♣ ♣ ❤ ❤ ♦ ♦ ♦ ♦ pot size: $90 I bet $65 cowboy calls

♠ ♠ ♠ ♠ ♠ ♠ ♠ 3 3 2 2 4 4 ♣ ♣ ♣ ❤ ❤ ♦ ♦ ♦ ♦ pot size: $210 I shove all in ($420) cowboy calls J J

♠ ♠ ♠ ♠ ♠ ♠ ♠ 3 3 2 2 4 4 ♣ ♣ ♣ ❤ ❤ ♦ ♦ ♦ ♦ J J cowboy’s hand 3 3 ♠ ♠ ♠ 3 3 ♦ ♦ ♦ 4/38 ~ 10%

♠ ♠ ♠ ♠ ♠ ♠ ♠ 3 3 2 2 4 4 ♣ ♣ ♣ ❤ ❤ ♦ ♦ ♦ ♦ J J cowboy’s hand 3 3 ♠ ♠ ♠ 3 3 ♦ ♦ ♦ J J

  Imperfect information games require more complex reasoning than similarly
sized perfect information games. The correct decision at a particular moment depends upon the probability distribution over private information that the opponent holds, which is revealed through their past actions. 

Heads-Up No Limit Texas Holdem: total number of decision points:
10160 (10170 in Go) past approaches include pre-solving the entire game or abstraction via Counterfactual Regret Minimization (CFR) DeepStack: attempts to solve for a Nash Equilibrium

It’s time for some game theory.

Nash Equilibrium Let (S, f) represent a game with n
players Si = strategy for player i such that: S = S1 x S2 x S3 … x Sn  is the set of strategy proﬁles f(x) = (f1(x), f2(x), …fn(x)) is the payoff function  evaluated at x ∈ S

Nash Equilibrium contd. Let xi equal S for player i
and x-i equal S for all other players IFF player i ∈ {1, … n} chooses strategy xi then payout function = fi(x) ; it follows that x* ∈ S will produce a Nash Equilibrium such that ∀i, xi ∈ Si : fi(xi*, x-i*) ≥ fi(xi*, x-i*)

Nash Equilibrium contd. ∀i, xi ∈ Si : fi(xi*, x-i*)
≥ fi(xi*, x-i*) the strategy proﬁle will produce a Nash Equilibrium so long as no unilateral deviation from the strategy by any single player results in an improved payoff function

CONFESS CONFESS LIE LIE -6, -6 0, -10 -10, 0
-2, -2 Prisoner 2 Prisoner 1

Oskar Morgenstern and John von Neumann

Oskar Morgenstern and John von Neumann a mixed strategy Nash
Equilibrium will exist in any zero-sum game involving a ﬁnite set of actions

a mixed strategy Nash Equilibrium will exist in any zero-sum
game involving a ﬁnite set of actions K K K K decision to bet/call, raise, or fold will vary depending on the situation 5 5 ♠ ♠ ♠ ♠ ♠ K K J J

a mixed strategy Nash Equilibrium will exist in any zero-sum
game involving a ﬁnite set of actions K K K K decision to bet/call, raise, or fold will vary depending on the situation 5 5 ♦ ♦ ♦ ♦ A A J J ♣ ♦

Exploitable vs Non-exploitable play a player’s strategy: probability distribution over
range of public actions  determine a player’s range: function of the card’s they’ll play for what price expected utility: estimation of the payout function

Exploitable vs Non-exploitable play a Nash Equilibrium strategy: seeks to
maximize expected utility against a  best-response opponent strategy exploitability: difference in expected utility of a best-response strategy and the expected utility under a NE

DeepStack: limited exploitability via  heuristic search local strategy: computation over
the current public state depth-limited lookahead: learned value function restricted set of lookahead actions: difference in expected utility of a best-response strategy and the expected utility under a NE

Continual Resolving initial state: strategy is determined by our range
of play next actions: discard previous strategy and determine action using CFR algorithm. input: our stored range and a vector of opponent counterfactual values update our range and opponent counter factual values

Rules for Resolving our action: replace opponent counterfactual values with
those computed in the re-solved strategy for our chosen action update our own range using the computed strategy and Bayes’ rule chance action: replace the opponent counterfactual values with those computed for this chance action from the last resolve update our own range by zeroing out impossible hands given the new public state

Limited Depth Lookahead “intuition” neural net: learned counterfactual function that
approximates the resulting values if the public state were to be solved with the current iteration’s ranges input: public state = players’ ranges, pot size, public cards output: vector of counterfactual values for holding any hand for both players

Sparse Lookahead Trees reduction of actions: generate a space lookahead
tree limited to only 4 actions: fold, call, 2 or 3 bet, all-in optimization for decision speed: allows DeepStack to play at conventional human speed decision points reduced form 10160 to 107 can be solved in < 5 seconds using a GeForce GTX 1080

Training the NN

Results

Questions?

DeepStack: Expert-Level Artificial Intelligence...

DeepStack: Expert-Level Artificial Intelligence in Heads-Up No-Limit Poker

More Decks by Tom Santero

Other Decks in Technology

Featured

Transcript