learning to play hearts - university of albertanathanst/talks/learning...learning in hearts nathan...

41
Learning to Play Hearts Nathan Sturtevant University of Alberta - AI Seminar February 3, 2006 1

Upload: others

Post on 01-Jan-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Learning to Play Hearts - University of Albertanathanst/talks/Learning...Learning In Hearts Nathan Sturtevant Adam White Hearts • Trick-based card game • Want to minimize your

Learning to Play HeartsNathan Sturtevant

University of Alberta - AI SeminarFebruary 3, 2006

1

Page 2: Learning to Play Hearts - University of Albertanathanst/talks/Learning...Learning In Hearts Nathan Sturtevant Adam White Hearts • Trick-based card game • Want to minimize your

Learning In Hearts Nathan SturtevantAdam White

Hearts• Trick-based card game

2

2

Page 3: Learning to Play Hearts - University of Albertanathanst/talks/Learning...Learning In Hearts Nathan Sturtevant Adam White Hearts • Trick-based card game • Want to minimize your

Learning In Hearts Nathan SturtevantAdam White

Sample Trick

3

Page 4: Learning to Play Hearts - University of Albertanathanst/talks/Learning...Learning In Hearts Nathan Sturtevant Adam White Hearts • Trick-based card game • Want to minimize your

Learning In Hearts Nathan SturtevantAdam White

Sample Trick

4

Page 5: Learning to Play Hearts - University of Albertanathanst/talks/Learning...Learning In Hearts Nathan Sturtevant Adam White Hearts • Trick-based card game • Want to minimize your

Learning In Hearts Nathan SturtevantAdam White

Hearts• Trick-based card game

• Want to minimize your points

• One point for every heart (♥)

• 13 points for Q♠

• If one player takes all 26 points

(shoots the moon) others get 26 each

5

5

Page 6: Learning to Play Hearts - University of Albertanathanst/talks/Learning...Learning In Hearts Nathan Sturtevant Adam White Hearts • Trick-based card game • Want to minimize your

Learning In Hearts Nathan SturtevantAdam White

Challenge

• Learn to play the game of Hearts well:

• Multi-Player Game

• Imperfect Information

• Learning

6

6

Page 7: Learning to Play Hearts - University of Albertanathanst/talks/Learning...Learning In Hearts Nathan Sturtevant Adam White Hearts • Trick-based card game • Want to minimize your

Learning In Hearts Nathan SturtevantAdam White

Multi-Player Games

• A lot of work in two-player games:

• Checkers, chess, backgammon, scrabble, othello, go…

• Much less in multi-player games

7

7

Page 8: Learning to Play Hearts - University of Albertanathanst/talks/Learning...Learning In Hearts Nathan Sturtevant Adam White Hearts • Trick-based card game • Want to minimize your

Learning In Hearts Nathan SturtevantAdam White

Multi-Player Games

• Differences:

• Maxn algorithm; generalization of minimax

• Less pruning possible

• Weaker theoretical properties

8

8

Page 9: Learning to Play Hearts - University of Albertanathanst/talks/Learning...Learning In Hearts Nathan Sturtevant Adam White Hearts • Trick-based card game • Want to minimize your

Learning In Hearts Nathan SturtevantAdam White

Hearts Example

(0, 0, 13) (13, 0, 0)

1

3

2

3 3 3

(0, 0, 13) (0, 0, 13)

A♠ 3♣

K♠ Q♠ K♠ Q♠2

9

Page 10: Learning to Play Hearts - University of Albertanathanst/talks/Learning...Learning In Hearts Nathan Sturtevant Adam White Hearts • Trick-based card game • Want to minimize your

Learning In Hearts Nathan SturtevantAdam White

Imperfect Information• In practice we can’t see opponents cards

• Monte-Carlo Sampling

• Generate perfect-information sample hands for opponents

• Analyze samples

• Combine results

10

10

Page 11: Learning to Play Hearts - University of Albertanathanst/talks/Learning...Learning In Hearts Nathan Sturtevant Adam White Hearts • Trick-based card game • Want to minimize your

Learning In Hearts Nathan SturtevantAdam White

Previous Work

• Hearts program based on previous ideas

• Hand-tuned evaluation function

• Non-linear evaluation function

• Somewhat slow

• Plays as well (better than) best computers?

11

11

Page 12: Learning to Play Hearts - University of Albertanathanst/talks/Learning...Learning In Hearts Nathan Sturtevant Adam White Hearts • Trick-based card game • Want to minimize your

Learning In Hearts Nathan SturtevantAdam White

12

Page 13: Learning to Play Hearts - University of Albertanathanst/talks/Learning...Learning In Hearts Nathan Sturtevant Adam White Hearts • Trick-based card game • Want to minimize your

Learning In Hearts Nathan SturtevantAdam White

Average Scores

Per Game Per Hand

Expert Program 56.1 5.16

Opponent Avg. 76.3 6.97

Played 90 games, each to 100 points.

13

Page 14: Learning to Play Hearts - University of Albertanathanst/talks/Learning...Learning In Hearts Nathan Sturtevant Adam White Hearts • Trick-based card game • Want to minimize your

Learning In Hearts Nathan SturtevantAdam White

Learning in Hearts• University of Mass. Course Project

(Perkins, 1998)

• Operational Advice

(Fürnkranz, et. al., 2000)

• State sampling with imperfect-information

(Fujita and Ishii, 2005)

14

14

Page 15: Learning to Play Hearts - University of Albertanathanst/talks/Learning...Learning In Hearts Nathan Sturtevant Adam White Hearts • Trick-based card game • Want to minimize your

Learning In Hearts Nathan SturtevantAdam White

Our Approach

• Use search-based approach to train

• Similar to what was used in Backgammon

• TD-Gammon plays at the level of the best humans

15

15

Page 16: Learning to Play Hearts - University of Albertanathanst/talks/Learning...Learning In Hearts Nathan Sturtevant Adam White Hearts • Trick-based card game • Want to minimize your

Learning In Hearts Nathan SturtevantAdam White

Backgammon

• Why did learning in backgammon work well?

• Already developed good neural networks

• Stochastic element helps exploration

• Good features

16

16

Page 17: Learning to Play Hearts - University of Albertanathanst/talks/Learning...Learning In Hearts Nathan Sturtevant Adam White Hearts • Trick-based card game • Want to minimize your

Learning In Hearts Nathan SturtevantAdam White

Hearts

• Promising domain for learning:

• Game fixed length (13 moves)

• Cards dealt randomly

• Occasionally get good cards

17

17

Page 18: Learning to Play Hearts - University of Albertanathanst/talks/Learning...Learning In Hearts Nathan Sturtevant Adam White Hearts • Trick-based card game • Want to minimize your

Learning In Hearts Nathan SturtevantAdam White

Hearts Difficulty

• Cards have relative value

• 5♣ is good when 2-4♣ already played

• 5♣ is bad when 6-A♣ already played

18

18

Page 19: Learning to Play Hearts - University of Albertanathanst/talks/Learning...Learning In Hearts Nathan Sturtevant Adam White Hearts • Trick-based card game • Want to minimize your

Learning In Hearts Nathan SturtevantAdam White

Approach

• Define features for game

• Use perceptron (regression) to predict game score given features

• Use maxn to play given predicted score

• Use TD(λ) to train

19

19

Page 20: Learning to Play Hearts - University of Albertanathanst/talks/Learning...Learning In Hearts Nathan Sturtevant Adam White Hearts • Trick-based card game • Want to minimize your

Learning In Hearts Nathan SturtevantAdam White

Features

• What features to use for each player?

• 52 cards they could have in their hand

• 52 cards they could have taken

• 104 features per player

• 416 total features

20

20

Page 21: Learning to Play Hearts - University of Albertanathanst/talks/Learning...Learning In Hearts Nathan Sturtevant Adam White Hearts • Trick-based card game • Want to minimize your

Learning In Hearts Nathan SturtevantAdam White

Valuable Feature• Interesting feature: P1 has the lowest ♥

• [P1 has 2♥] or

• [P1 has 3♥] and

[[P1 has taken 2♥] or [P2 has taken 2♥] [[P3 has taken 2♥] or [P4 has taken 2♥]]

• …

21

21

Page 22: Learning to Play Hearts - University of Albertanathanst/talks/Learning...Learning In Hearts Nathan Sturtevant Adam White Hearts • Trick-based card game • Want to minimize your

Learning In Hearts Nathan SturtevantAdam White

Features• We defined basic ‘atomic’ features

• Only evaluate features on trick boundaries

• Sample Features

• Which suits do we hold low/high cards

• Which suits are we ‘short’

• Which suits does the ‘leader’ have

22

22

Page 23: Learning to Play Hearts - University of Albertanathanst/talks/Learning...Learning In Hearts Nathan Sturtevant Adam White Hearts • Trick-based card game • Want to minimize your

Learning In Hearts Nathan SturtevantAdam White

Features

• These features still inadequate

• Combinations of features more interesting than ‘atomic’ features

• Combine features using AND operator

23

23

Page 24: Learning to Play Hearts - University of Albertanathanst/talks/Learning...Learning In Hearts Nathan Sturtevant Adam White Hearts • Trick-based card game • Want to minimize your

Learning In Hearts Nathan SturtevantAdam White

Learning Part I• Learn to avoid the Q♠

• 60 ‘atomic features’

• λ = 0.75

• α = 1/[13 x # active features]

• Predict expected points in game

• Train against previous program

• Randomly switch position each game

24

24

Page 25: Learning to Play Hearts - University of Albertanathanst/talks/Learning...Learning In Hearts Nathan Sturtevant Adam White Hearts • Trick-based card game • Want to minimize your

Learning In Hearts Nathan SturtevantAdam White

QoS Features

2.75

3.00

3.25

3.50

3.75

4.00

4.25

Break Even 1x Features 2x Features 3x Features 4x Features

Games Played

Ave

rage

Sco

re

200k

25

Page 26: Learning to Play Hearts - University of Albertanathanst/talks/Learning...Learning In Hearts Nathan Sturtevant Adam White Hearts • Trick-based card game • Want to minimize your

Learning In Hearts Nathan SturtevantAdam White

Analysis

• What is the network learning

• Easily understand by examining weights assigned to feature sets

26

26

Page 27: Learning to Play Hearts - University of Albertanathanst/talks/Learning...Learning In Hearts Nathan Sturtevant Adam White Hearts • Trick-based card game • Want to minimize your

Learning In Hearts Nathan SturtevantAdam White

Features - Avoid Q♠Rank Weight We have We have We have Opponent

1 -0.103 1 low ♠ Lead Q♠ no ♠

2 -0.097 1 low ♠ No ♥ Lead Q♠ no ♠

3 -0.096 2 low ♠ K♠ Q♠ two ♠

4 -0.093 1 low ♠ No ♣ Lead Q♠ no ♠

5 -0.090 1 low ♠ No ♦ Lead Q♠ no ♠

148 -0.040 1 low ♠ Q♠ Lead no ♠

27

Page 28: Learning to Play Hearts - University of Albertanathanst/talks/Learning...Learning In Hearts Nathan Sturtevant Adam White Hearts • Trick-based card game • Want to minimize your

Learning In Hearts Nathan SturtevantAdam White

Features - Take Q♠Rank Weight We Have We have We have We have

1 0.125 Q♠ 1 low ♠ Lead

2 0.123 Q♠ 1 low ♠

3 0.117 Q♠ No ♣ No ♥ Lead

4 0.116 A/K/Q♠ Lead

5 0.112 Q♠ No ♣ No ♥ No ♦

28

Page 29: Learning to Play Hearts - University of Albertanathanst/talks/Learning...Learning In Hearts Nathan Sturtevant Adam White Hearts • Trick-based card game • Want to minimize your

Learning In Hearts Nathan SturtevantAdam White

Learning Part II

• Learn to avoid taking ♥

• Removed 14 Q♠-specific features

• 42 new point (♥) related features (0-13)

• Same learning parameters

29

29

Page 30: Learning to Play Hearts - University of Albertanathanst/talks/Learning...Learning In Hearts Nathan Sturtevant Adam White Hearts • Trick-based card game • Want to minimize your

Learning In Hearts Nathan SturtevantAdam White

Hearts Features

2.75

3.00

3.25

3.50

3.75

Break Even 1x Features 2x Features 3x Features

Games Played

Ave

rage

Sco

re

200k

30

Page 31: Learning to Play Hearts - University of Albertanathanst/talks/Learning...Learning In Hearts Nathan Sturtevant Adam White Hearts • Trick-based card game • Want to minimize your

Learning In Hearts Nathan SturtevantAdam White

Learning Part III

• Learn to play the full game of Hearts

• No ‘shooting the moon’

• Take best 10,000 features from the Q♠

• Take best 1,000 features from ♥ points

• Train against expert and by self-play

31

31

Page 32: Learning to Play Hearts - University of Albertanathanst/talks/Learning...Learning In Hearts Nathan Sturtevant Adam White Hearts • Trick-based card game • Want to minimize your

Learning In Hearts Nathan SturtevantAdam White

Steady-State Evaluation

• Test the learned networks

• 4 players, 2 player types

• 24 ways of assigning player types

• Repeat each hand 24 - 2 times

• Play 100 hands

32

32

Page 33: Learning to Play Hearts - University of Albertanathanst/talks/Learning...Learning In Hearts Nathan Sturtevant Adam White Hearts • Trick-based card game • Want to minimize your

Learning In Hearts Nathan SturtevantAdam White

ArrangementPlayer 1 Player 2 Player 3 Player 4

Expert Trained Trained Trained

Trained Expert Trained Trained

Expert Expert Trained Trained

Trained Trained Expert Trained

Expert Trained Expert Trained

Trained Expert Expert Trained

Expert Expert Expert Trained

33

Page 34: Learning to Play Hearts - University of Albertanathanst/talks/Learning...Learning In Hearts Nathan Sturtevant Adam White Hearts • Trick-based card game • Want to minimize your

Learning In Hearts Nathan SturtevantAdam White

Games Against Expert

5.05.56.06.57.07.58.08.5

Break-Even Expert Trained Self Trained

Games Played

Ave

rage

Sco

re

10k

34

Page 35: Learning to Play Hearts - University of Albertanathanst/talks/Learning...Learning In Hearts Nathan Sturtevant Adam White Hearts • Trick-based card game • Want to minimize your

Learning In Hearts Nathan SturtevantAdam White

Games Against Expert

5.05.56.06.57.07.58.08.5

Break Even Expert Trained Self Trained

Games Played

Ave

rage

Sco

re

300k

35

Page 36: Learning to Play Hearts - University of Albertanathanst/talks/Learning...Learning In Hearts Nathan Sturtevant Adam White Hearts • Trick-based card game • Want to minimize your

Learning In Hearts Nathan SturtevantAdam White

Trained Play

5.5

6.0

6.5

7.0

7.5

Self Trained Expert Trained Break Even

Games Played

Ave

rage

Sco

re

250k

36

Page 37: Learning to Play Hearts - University of Albertanathanst/talks/Learning...Learning In Hearts Nathan Sturtevant Adam White Hearts • Trick-based card game • Want to minimize your

Learning In Hearts Nathan SturtevantAdam White

Summary

• Learned to beat ‘expert’ by a large margin

• Program plays well, but lacks deep analysis of game

37

37

Page 38: Learning to Play Hearts - University of Albertanathanst/talks/Learning...Learning In Hearts Nathan Sturtevant Adam White Hearts • Trick-based card game • Want to minimize your

Learning In Hearts Nathan SturtevantAdam White

Ongoing Work• Learn with ‘shooting the moon’ turned on

• Duplicate all features three times

• Points are split

• We have all the points

• Someone else has all the points

• Compare steady-state play

38

38

Page 39: Learning to Play Hearts - University of Albertanathanst/talks/Learning...Learning In Hearts Nathan Sturtevant Adam White Hearts • Trick-based card game • Want to minimize your

Learning In Hearts Nathan SturtevantAdam White

Steady-State Results(1) (2) Score (1) Score (2)

Self-Trained90k games Expert 6.27 7.10

Expert-Trained220k games Expert 6.17 7.35

Self-Trained90k games

Expert-Trained220k games 6.35 7.03

39

Page 40: Learning to Play Hearts - University of Albertanathanst/talks/Learning...Learning In Hearts Nathan Sturtevant Adam White Hearts • Trick-based card game • Want to minimize your

Learning In Hearts Nathan SturtevantAdam White

Future Work• Optimize code

• Different algorithm than maxn

• Better ways of combining features

• Play against humans

• Passing cards

• Imperfect information

40

40

Page 41: Learning to Play Hearts - University of Albertanathanst/talks/Learning...Learning In Hearts Nathan Sturtevant Adam White Hearts • Trick-based card game • Want to minimize your

Learning In Hearts Nathan SturtevantAdam White

• Joint work with Adam White

• Thanks to Rich Sutton and Mark Ring

Thank You

41

41