[Master Thesis] Diversity and Novelty as Objectives in Poker

Slide 1

Slide 1 text

Diversity and Novelty as Objectives in Poker Jéssica Pauli de C. Bonson Supervisor: Dr. Malcolm I. Heywood Co-Supervisor: Dr. Andrew R. McIntyre

Slide 2

Slide 2 text

Motivation ● Evolutionary algorithms can lead to efficient solutions without a predefined design ● Downsides: ○ prone to early convergence ○ may be deceived by a non-informative or deceptive fitness function 2

Slide 3

Slide 3 text

Deceptive tasks 3

Slide 4

Slide 4 text

Diversity applied to Deceptive Tasks 4

Slide 5

Slide 5 text

Texas Hold’em Poker 5

Slide 6

Slide 6 text

Methodology ● Evolve agents with various combinations of diversity maintenance methods and fitness-based evolution ● Compare them in ten scenarios for diverse types of hands and opponents ● Analyze effects on diversity, performance and behaviors 6

Slide 7

Slide 7 text

SBB architecture 7

Slide 8

Slide 8 text

Methodology: Fitness Function ● The performance of a player is measured by the average chips won per hand 8

Slide 9

Slide 9 text

Methodology: Opponents ● Static opponents ● Bayesian opponents ● Hall of Fame opponents 9

Slide 10

Slide 10 text

Methodology: Diversity ● Diversity maintenance methods: ○ bid diversity ○ genotypic diversity ○ behavioral diversity ○ novelty search 10

Slide 11

Slide 11 text

Methodology: Inputs ● Groups ○ Game State ○ Opponent Model 11

Slide 12

Slide 12 text

Experiments ● Opponent Complexity Group ● Degrees of Diversity Group ● Diversity Models Group ● Analysis of the behaviors 12

Slide 13

Slide 13 text

Results 13

Slide 14

Slide 14 text

Opponent Complexity Group 14

Slide 15

Slide 15 text

Degrees of Diversity Group 15

Slide 16

Slide 16 text

Results: Diversity Models Group ● Comparisons using the cumulative plots ○ Diversity and Performance ○ 9 models in 10 scenarios ○ Tests: Friedman, Bonferroni-Dunn, and Nemenyi 16

Slide 17

Slide 17 text

Results: Diversity Models Group ● Most models were able to improve the diversity of the agents ● Two models translated diversity into performance 17

Slide 18

Slide 18 text

Results: Diversity Models Group ● The results indicate that novelty search alone does not work well for Texas Hold'em Poker ● The model with novelty search and fitness was significantly better than the one without fitness 18

Slide 19

Slide 19 text

Results: Analysis of Behaviors ● Formulated the hypothesis that novelty would incentive bluffing ● The model with only novelty search bluffed as much as 3 models, and significantly more than 5 models 19

Slide 20

Slide 20 text

Playing Styles (Balanced and Unbalanced) 20

Slide 21

Slide 21 text

Cumulative Score vs Cumulative Hands Won 21

Slide 22

Slide 22 text

Conclusions ● Diversity maintenance methods were able to improve diversity and performance ● Novelty search alone was not enough to improve neither diversity nor performance ● Diversity was useful mainly to increase the exploitation of chips per hand 22

Slide 23

Slide 23 text

Future Work ● Find a way to deploy a subset of the agents ● Further test diversity and novelty on a more ambiguous and complex version of Poker 23

Slide 24

Slide 24 text

Thank you! 24

Slide 25

Slide 25 text

Extras 25

Slide 26

Slide 26 text

Motivation ● How to deal with deceptive tasks? ○ Diversity maintenance ■ Genotypic diversity ■ Behavioral diversity ■ Novelty search 26

Slide 27

Slide 27 text

Background: Inputs ● Game State Inputs ○ Hand Strength, Effective Potential, Pot Odds, Betting Position, Round ● Opponent Model Inputs ○ Last Action, Overall Long-term Aggressiveness, Overall Short-term Aggressiveness, Hand Aggressiveness, Tight/Loose, Passive/Aggressive, Bluffing, Chips, Self Overall Short-term Aggressiveness 27

Slide 28

Slide 28 text

Background: Hands ● Each point corresponds to a poker hand. ● Training points are balanced in nine categories, per hand strength. ● Real-world hands: 60% weak, 30% intermediate, 10% strong. 28

Slide 29

Slide 29 text

Differences from Previous Work ● The main differences between the work developed by Alberta's group and this research ○ evolve a diverse group of capable agents ○ agents evolve their strategies from scratch ○ agents work as teams of programs ○ it is not possible to use simulations ○ use poker as a domain, not as the goal 29

Slide 30

Slide 30 text

Overall flow of the classic Genetic Algorithm 30

Slide 31

Slide 31 text

Most used inputs per teams 31

Slide 32

Slide 32 text

● Training chart? Too noisy ● Tested before in other tasks ● Why not tournament? To focus on diversity ● Normalized between 0.0 and 10.0 ○ Better for SBB due to previous work results 32 Possible Questions

Slide 33

Slide 33 text

Bluffing ● Behavior ○ teams play less hands to avoid losing chips due to weaker hands ○ they also increase their bluffing, to exploit the opponent's weaker hands ● The teams are using their opponent modeling inputs to find when the opponent seems to have weaker hands, and then bluff 33