wisdom of crowds in human memory: reconstructing events by aggregating memories across individuals

Wisdom of Crowds in Human Memory: Reconstructing Events by Aggregating

Memories across Individuals

Mark SteyversDepartment of Cognitive Sciences

University of California, Irvine

Joint work with:Brent Miller, Pernille Hemmer, Mike Yi

Michael Lee, Bill Batchelder, Paolo Napoletano

Wisdom of crowds phenomenon

Group estimate often performs as well as or better than best individual in the group

2

Examples of wisdom of crowds phenomenon

3

Who wants to be a millionaire?Galton’s Ox (1907): Median of individual estimates comes close to true answer

0

5

10

15

20

25

30

Num

ber

of P

eopl

e

Recollection of 9/11 Event Sequence (Altmann, 2003)

4

A A A A A A A A A A A A C C A A A A A A A A C E E EB B B B B C C D D B C B A A B B B B C C D E D A C CC C D C D B B B B D B E B B D D E F D E F B A B A AD E F D C D E F F C D C D E E F D D B B B C B C B DF D C E F F D C E E E D F D F E F C F D C D F D D BE F E F E E F E C F F F E F C C C E E F E F E F F F

Correct

Most frequent response (i.e, mode)

A = One plane hits the WTC B = A second plane hits the WTCC = One plane crashes into the Pentagon D = One tower at the WTC collapsesE = One plane crashes in PennsylvaniaF = A second tower at the WTC collapses

Research goal: aggregating responses

5

D A B C A B D C B A D C A C B D A D B C

Aggregation Algorithm

A B C D A B C D

ground truth

=?

group answer

Task constraints

No communication between individuals

There is always a true answer (ground truth)

Aggregation algorithm never has access to ground truth unsupervised methods ground truth only used for evaluation

6

Is this research part of psychology?

Yes

Effective aggregation of human judgments requires cognitive models

7

Overview of talk

Ordering problems what is the order of US presidents?

Matching problems memory for pairs: what object was paired with what person?

Recognition memory problems what words were studied?

Experts in crowds how to find experts in the absence of feedback

8

Ulysses S. Grant

James Garfield

Rutherford B. Hayes

Abraham Lincoln

Andrew Johnson

James Garfield

Ulysses S. Grant

Rutherford B. Hayes

Andrew Johnson

Abraham Lincoln

Recollecting Order from Declarative Memory

time

Place these presidents in the correct order

Experiment: Order all 44 US presidents

Similar to Roediger and Crowder (1976); Healy, Havas, Parker (2000)

Methods 26 participants (college undergraduates) Names of presidents written on cards Cards could be shuffled on large table

10

= 1= 1+1Measuring performance

Kendall’s Tau: The number of adjacent pair-wise swaps

Ordering by IndividualA B E C D

True OrderA B C D E

C DEA B

A B E C D

A B C D E= 2

Empirical Results

12

1 10 200

100

200

300

400

500

Individuals (ordered from best to worst)

(random guessing)

A Bayesian (generative) approach

13

D A B C A B D C B A D C A C B D A D B C

Generative Model

A B C D(latent random variable)

shared group knowledge

Bayesian models

We extend two models: Thurstone’s (1927) model Estes (1972) perturbation model

14

Bayesian Thurstonian Approach

15

Each item has a true coordinate on some dimension

A B C


16

A B C

… but there is noise because of encoding and/or retrieval error

Person 1


17

Each person’s mental representation is based on (latent) samples of these distributions

B C

A B C

Person 1

A


18

B C

A B C

The observed ordering is based on the ordering of the samples

A < B < C

Observed Ordering:

Person 1

A


19

People draw from distributions with common means but different variances

Person 1

B C

A B CA < B < C

Observed Ordering:

Person 2

A B C

BC

Observed Ordering:

A < C < BA

A

Bayesian Inference Problem

Given the orderings from individuals, infer: mean for each item standard deviations for each person

Markov Chain Monte Carlo (MCMC)

20

Inferred Distributions for 44 US Presidents

21

George Washington (1)John Adams (2)

Thomas Jefferson (3)James Madison (4)James Monroe (6)

John Quincy Adams (5)Andrew Jackson (7)

Martin Van Buren (8)William Henry Harrison (21)

John Tyler (10)James Knox Polk (18)

Zachary Taylor (16)Millard Fillmore (11)Franklin Pierce (19)

James Buchanan (13)Abraham Lincoln (9)

Andrew Johnson (12)Ulysses S. Grant (17)

Rutherford B. Hayes (20)James Garfield (22)Chester Arthur (15)

Grover Cleveland 1 (23)Benjamin Harrison (14)

Grover Cleveland 2 (25)William McKinley (24)

Theodore Roosevelt (29)William Howard Taft (27)

Woodrow Wilson (30)Warren Harding (26)Calvin Coolidge (28)Herbert Hoover (31)

Franklin D. Roosevelt (32)Harry S. Truman (33)

Dwight Eisenhower (34)John F. Kennedy (37)

Lyndon B. Johnson (36)Richard Nixon (39)

Gerald Ford (35)James Carter (38)

Ronald Reagan (40)George H.W. Bush (41)

William Clinton (42)George W. Bush (43)

Barack Obama (44)

median and minimumsigma

Model can predict individual performance

22

0 0.1 0.2 0.3 0.450

100

150

200

250

300

R=0.941

inferred noise level for

each individual

distance to ground

truth

individual

1 10 200

50

100

150

200

250

300

350

Individuals

Thurstonian ModelIndividuals

(Weak) Wisdom of Crowds Effect

23

model’s ordering is as good as best individual (but not better)

Extension of Estes (1972) Perturbation Model

Main idea: item order is perturbed locally

Our extension: perturbation noise varies

between individuals and items

24

A

True order

B C D E

Recalled order

DB C EA

Inferred Perturbation Matrix and Item Accuracy

252 6 10 14 18 22 26 30 34 38 42

1. George Washington (1)2. John Adams (2)

3. Thomas Jefferson (3)4. James Madison (4)5. James Monroe (6)

6. John Quincy Adams (5)7. Andrew Jackson (7)

8. Martin Van Buren (8)9. William Henry Harrison (21)

10. John Tyler (11)11. James Knox Polk (16)

12. Zachary Taylor (18)13. Millard Fillmore (9)

14. Franklin Pierce (20)15. James Buchanan (13)16. Abraham Lincoln (15)17. Andrew Johnson (10)18. Ulysses S. Grant (17)

19. Rutherford B. Hayes (19)20. James Garfield (22)21. Chester Arthur (14)

22. Grover Cleveland 1 (23)23. Benjamin Harrison (12)

24. Grover Cleveland 2 (25)25. William McKinley (24)

26. Theodore Roosevelt (28)27. William Howard Taft (26)

28. Woodrow Wilson (30)29. Warren Harding (27)30. Calvin Coolidge (29)31. Herbert Hoover (31)

32. Franklin D. Roosevelt (32)33. Harry S. Truman (33)

34. Dwight Eisenhower (34)35. John F. Kennedy (35)

36. Lyndon B. Johnson (36)37. Richard Nixon (38)

38. Gerald Ford (37)39. James Carter (39)

40. Ronald Reagan (40)41. George H.W. Bush (41)

42. William Clinton (42)43. George W. Bush (43)

44. Barack Obama (44)

Output position

True

pos

ition

0 5 10

Abraham Lincoln

Richard Nixon

James Carter

Strong wisdom of crowds effect

26

1 10 200

50

100

150

200

250

300

350

Individuals

Thurstonian ModelPerturbationIndividuals

Perturbation model’s ordering is better than best individual

Perturbation

Alternative Heuristic Models

Many heuristic methods from voting theory E.g., Borda count method

Suppose we have 10 items assign a count of 10 to first item, 9 for second item, etc add counts over individuals order items by the Borda count

i.e., rank by average rank across people

27

Model Comparison

28

1 10 20 300

50

100

150

200

250

300

350

Individuals

Thurstonian ModelPerturbationBorda countIndividuals

Borda

Recollecting order from episodic memory

29http://www.youtube.com/watch?v=a6tSyDHXViM&feature=related

http://www.youtube.com/watch?v=a6tSyDHXViM&feature=related

Place scenes in correct order (serial recall)

30

time

A B C D

Recollecting Order from Episodic Memory

31

Study this sequence of images

Place the images in correct sequence (serial recall)

32

A

B

C

D

E

F

G

H

I

J

Average results across 6 problems

33

Mea

n

1 10 20 300

5

10

15

Individuals

Thurstonian ModelPerturbation ModelBorda countIndividuals

Example calibration result for individuals

34

0 2 4 60

5

10

15

20

25

30

R=0.920

inferred noise level

distance to ground

truth

individual

(pizza sequence; perturbation model)

Overview of talk





35

Study these combinations

36

2 3 4 51

B C D EA

Find all matching pairs

37

Results across 8 problems

38

1 5 10 150

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Individuals

Mea

n A

ccur

acy

Bayesian MatchingHungarian AlgorithmIndividuals

General Knowledge Matching Problems

39

Dutch

Danish

Yiddish

Thai

Vietnamese

Chinese

Georgian

Russian

Japanese

A

B

C

D

E

F

G

H

I

godt nytår

gelukkig nieuwjaar

a gut yohr

С Новым Годом

สวสัดีปีใหม่

Chúc Mừng Nǎm Mới

გილოცავთ ახალწელს

Modeling Results – Declarative Tasks

40

1 10 200

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Individuals

Mea

n A

ccur

acy

Bayesian MatchingHungarian AlgorithmIndividuals

Overview of talk





41

Listen to these words…

42

Experiment

Study list 10 lists of 15 spoken words

Recognition memory test Targets (15 items) Lure (1 item) Related distractors (15 items) Unrelated distractors (15 items)

Confidence ratings 5-point confidence ratings

1=definitely not on list; 2 = probably not on list; 3 = not sure; 4 = probably on list; 5 = sure it was on the list

43

Mean confidence ratings for 12 subjects

44

T L R U1

2

3

4

5Individual 1

T L R U1

2

3

4

5Individual 2

T L R U1

2

3

4

5Individual 3

T L R U1

2

3

4

5Individual 4

T L R U1

2

3

4

5Individual 5

T L R U1

2

3

4

5Individual 6

T L R U1

2

3

4

5Individual 7

T L R U1

2

3

4

5Individual 8

T L R U1

2

3

4

5Individual 9

T L R U1

2

3

4

5Individual 10

T L R U1

2

3

4

5Individual 11

T L R U1

2

3

4

5Individual 12

T L R U1

2

3

4

5METHOD1

Con

fiden

ce

ROC plots for individuals

45

0 0.1 0.2 0.3 0.4 0.5 0.6 0.70.4

0.5

0.6

0.7

0.8

0.9

1

False Alarm Rate

Hit

Rat

e

Individual 1Individual 2Individual 3Individual 4Individual 5Individual 6Individual 7Individual 8Individual 9Individual 10Individual 11Individual 12

Heuristic Aggregation Method

Group confidence = mean confidence rating across individuals

46

Performance of Aggregate

47

0 0.1 0.2 0.3 0.4 0.5 0.6 0.70.4

0.5

0.6

0.7

0.8

0.9

1

False Alarm Rate

Hit

Rat

e

Individual 1Individual 2Individual 3Individual 4Individual 5Individual 6Individual 7Individual 8Individual 9Individual 10Individual 11Individual 12AGGREGATE

Performance of Individuals and Aggregate

48

1 2 3 4 5 6 7 8 9 10 11 12 130.75

0.8

0.85

0.9

0.95

1

Individuals

AU

C

IndividualsMETHOD1

T L R U1

2

3

4

5

Problem with aggregation method

Aggregate also suffers from false memories

49

Con

fiden

ce

Potential Solution: identify group signature of false memories

50

1 2 3 4 50

0.2

0.4

0.6

0.8Study

1 2 3 4 50

0.1

0.2

0.3

0.4Lure

1 2 3 4 50

0.2

0.4

0.6

0.8Related Distractors

1 2 3 4 50

0.2

0.4

0.6

0.8

1Unrelated Distractors

Overview of talk





51

Experiment

78 participants 17 ordering problems each with 10 items

Chronological Events Physical Measures Purely ordinal problems, e.g.

Ten Amendments Ten commandments

52

Ordering states west-east

53

Oregon (1)

Utah (2)

Nebraska (3)

Iowa (4)

Alabama (6)

Ohio (5)

Virginia (7)

Delaware (8)

Connecticut (9)

Maine (10)

Ordering Ten Amendments

54

Freedom of speech & religion (1)

Right to bear arms (2)

No quartering of soldiers (4)

No unreasonable searches (3)

Due process (5)

Trial by Jury (6)

Civil Trial by Jury (7)

No cruel punishment (8)

Right to non-specified rights (10)

Power for the States & People (9)

Ordering Ten Commandments

55

Worship any other God (1)

Make a graven image (7)

Take the Lord's name in vain (2)

Break the Sabbath (3)

Dishonor your parents (4)

Murder (6)

Commit adultery (8)

Steal (5)

Bear false witness (9)

Covet (10)

Question

How many individuals do we need to average over?

56

Effect of Group Size: random groups

57

0 10 20 30 40 50 60 70 807

8

9

10

11

12

13

14

Group Size

T=0T=2

T=12

How effective are small groups of experts?

Want to find experts endogenously – without feedback

Approach: select individuals with the smallest estimated noise levels based on previous tasks

We are identifying general expertise (“Pearson’s g”)

58

Group Composition based on prior performance

59

0 10 20 30 40 50 60 70 807

8

9

10

11

12

13

14

Group Size

T=0T=2

T=12

T = 0

# previous tasks

T = 2T = 8

Group size (best individuals first)

60

Endogenous no feedback

required

Exogenous selecting people based on

actual performance

0 10 20 30 407

8

9

10

11

12

13

14

0 20 407

8

9

10

11

12

13

14

Summary Aggregation of combinatorially complex data

going beyond numerical estimates or multiple choice questions

Incorporate individual differences going beyond models that treat every vote equally assume some individuals might be “experts”

Take cognitive processes into account going beyond mere statistical aggregation

61

That’s all

62

Do the experiments yourself:

http://psiexp.ss.uci.edu/

http://psiexp.ss.uci.edu/

Predictive Rankings: fantasy football

63

South Australian Football League (32 people rank 9 teams)

1 10 20 300

20

40

60

80

Individuals

Thurstonian ModelPerturbation ModelBorda countIndividuals

Australian Football League (29 people rank 16 teams)

1 10 20 300

5

10

15

20

25

Individuals

1 10 20 300

20

40

60

80

Online Experiments

Experiment 1 (Prior knowledge) http://madlab.ss.uci.edu/dem2/examples/

Experiment 2a (Serial Recall) study sequence of still images http://madlab.ss.uci.edu/memslides/

Experiment 2b (Serial Recall) study video http://madlab.ss.uci.edu/dem/

64

http://madlab.ss.uci.edu/dem2/examples/

http://madlab.ss.uci.edu/memslides/

http://madlab.ss.uci.edu/dem/

MDS solution of pairwise tau distances

65-15 -10 -5 0 5 10 15 20 25 30 35-20

-15

-10

-5

0

5

10

15

7

26

3

16

7 96

1

22

2

13

12

7

11

14

9

5

7

11

8

3

24

3

7

10

10

4

03

6

9

6

26

5

18

44 3

14

6

2

5

3

5

1

4210

11

4

3

42

0

8

21

7

3

5

1

1

8

1

33

14

3

20

6

8

16

7

22

23

2 3710

states westeast

IndividualsTruthThurstonian Model

distance to truth

MDS solution of pairwise tau distances

66-20 -15 -10 -5 0 5 10 15 20 25

-20

-15

-10

-5

0

5

10

15

20

14

23

25

24

18 24

13

14

10

5

9

20

8

20

15

18

12

33

25

29

171

14

20

27176

13

11

15

3

17

17

17

24

7

26

9

13

17

27

13

15

11

15

15

23

2811

26

16

4

27

9

23

24

11

17

19

15

22

2

15

14

12

21

11

26

11

18

35

22

10

20

24

25

1

19

7

0

ten commandments

IndividualsTruthThurstonian Model

Thurstonian Model – stereotyped event sequences

67

event1 (1)event2 (2)event3 (3)event4 (4)event5 (5)event6 (7)event7 (6)event8 (8)event9 (9)

event10 (10)

Bus (Recall)

0

5

10

15

20

25

R=0.890


event10 (10)

Morning (Recall)

0

5

10

15

20

25

R=0.982


event10 (10)

Wedding (Recall)

0 0.5 1 1.5 20

5

10

15

20

25

R=0.973

Thurstonian Model – “random” videos

68


event10 (10)

Yogurt (Recall)

0

5

10

15

20

25

R=0.908

event1 (1)event2 (3)event3 (4)event4 (5)event5 (2)event6 (6)event7 (7)event8 (9)

event9 (10)event10 (8)

Pizza (Recall)

0

5

10

15

20

25

R=0.851


event10 (10)

Clay (Recall)

0 0.5 1 1.5 20

5

10

15

20

25

R=0.928

Heuristic Aggregation Approach

Combinatorial optimization problem maximizes agreement in assigning N items to N responses

Hungarian algorithm construct a count matrix M Mij = number of people that paired item i with response j find row and column permutations to maximize diagonal sum O( n3 )

69

Hungarian Algorithm Example

70= correct

DutchDan

ish

Frenc

h

Japan

ese

Span

ish

Arabic

Chinese

German

Italia

nRussi

an

ThaiViet

namese

Wels

hGeo

rgian

Yiddish

gelukkig Nieuwjaar 7 3 0 0 0 1 0 0 0 0 0 0 2 0 2godt nytår 2 3 0 0 0 0 0 2 0 2 0 0 1 3 2

bonne année 0 0 14 0 1 0 0 0 0 0 0 0 0 0 00 0 0 9 0 0 2 0 1 0 3 0 0 0 0

feliz año nuevo 0 0 0 0 14 0 0 0 0 0 1 0 0 0 0عامسعيد 0 1 0 0 0 14 0 0 0 0 0 0 0 0 0

0 0 0 2 0 0 12 0 0 0 0 1 0 0 0ein gutes neues Jahr 3 1 0 0 0 0 0 9 0 0 0 0 1 0 1

felice anno nuovo 0 0 0 0 0 0 0 0 14 1 0 0 0 0 0С Новым Годом 0 0 1 0 0 0 0 0 0 11 0 0 1 2 0

สวัสดีปีใหม่ ่ 0 0 0 1 0 0 1 0 0 0 7 1 1 4 0Chúc Mừng Nǎm Mới 0 0 0 0 0 0 0 0 0 1 0 11 1 2 0

Blwyddyn Newydd Dda 0 4 0 1 0 0 0 0 0 0 1 0 6 1 2გილოცავთ ახალ წელს 0 0 0 2 0 0 0 1 0 0 3 2 0 1 6

a gut yohr 3 3 0 0 0 0 0 3 0 0 0 0 2 2 2

= incorrect

What are methods for finding experts?

1) Self-reported expertise: unreliable has led to claims of “myth of expertise”

2) Based on explicit scores by comparing to ground truth but ground truth might not be immediately available

3) Endogenously discover experts Use the crowd to discover experts Small groups of experts can be effective

71

Modified Perturbation Model

75

0.8 1 1.2 1.4 1.6 1.8

0

2

4

6

8

10

12

14

16

18R=-0.752

1

2

3

4

5

6

7

8

9

10

1112

13

14

15

16

17

Predicting problem difficulty

76

std( )

dispersion of noise levels across individual

distance of group

answer to ground truth

ordering states geographically

city size rankings

Mean p( “yes” )

77

T L R U0

0.5

1aaa

T L R U0

0.5

1ardor

T L R U0

0.5

1azs

T L R U0

0.5

1incognito

T L R U0

0.5

1indigo

T L R U0

0.5

1jshi

T L R U0

0.5

1nobody

T L R U0

0.5

1peter griffin

T L R U0

0.5

1piper michelle

T L R U0

0.5

1plutonium

T L R U0

0.5

1scott bakula

T L R U0

0.5

1sky

T L R U0

0.5

1METHOD1

note: confidence ratings were converted to yes/no judgments. Yes = rating >= 3; No = rating < 3

wisdom of crowds in human memory: reconstructing events by aggregating memories across individuals

Documents

order of us presidents

b ca b d cb

us presidents similar

human memory

semantic memory

memorability of presidents

memory search

chronological position