wisdom of crowds in human memory: reconstructing events by aggregating memories across individuals
DESCRIPTION
Wisdom of Crowds in Human Memory: Reconstructing Events by Aggregating Memories across Individuals. Mark Steyvers Department of Cognitive Sciences University of California, Irvine. Joint work with: Brent Miller, Pernille Hemmer, Mike Yi Michael Lee, Bill Batchelder , Paolo Napoletano. - PowerPoint PPT PresentationTRANSCRIPT
Wisdom of Crowds in Human Memory: Reconstructing Events by Aggregating
Memories across Individuals
Mark SteyversDepartment of Cognitive Sciences
University of California, Irvine
Joint work with:Brent Miller, Pernille Hemmer, Mike Yi
Michael Lee, Bill Batchelder, Paolo Napoletano
Wisdom of crowds phenomenon
Group estimate often performs as well as or better than best individual in the group
2
Examples of wisdom of crowds phenomenon
3
Who wants to be a millionaire?Galton’s Ox (1907): Median of individual estimates comes close to true answer
0
5
10
15
20
25
30
Num
ber
of P
eopl
e
Recollection of 9/11 Event Sequence (Altmann, 2003)
4
A A A A A A A A A A A A C C A A A A A A A A C E E EB B B B B C C D D B C B A A B B B B C C D E D A C CC C D C D B B B B D B E B B D D E F D E F B A B A AD E F D C D E F F C D C D E E F D D B B B C B C B DF D C E F F D C E E E D F D F E F C F D C D F D D BE F E F E E F E C F F F E F C C C E E F E F E F F F
Correct
Most frequent response (i.e, mode)
A = One plane hits the WTC B = A second plane hits the WTCC = One plane crashes into the Pentagon D = One tower at the WTC collapsesE = One plane crashes in PennsylvaniaF = A second tower at the WTC collapses
Research goal: aggregating responses
5
D A B C A B D C B A D C A C B D A D B C
Aggregation Algorithm
A B C D A B C D
ground truth
=?
group answer
Task constraints
No communication between individuals
There is always a true answer (ground truth)
Aggregation algorithm never has access to ground truth unsupervised methods ground truth only used for evaluation
6
Is this research part of psychology?
Yes
Effective aggregation of human judgments requires cognitive models
7
Overview of talk
Ordering problems what is the order of US presidents?
Matching problems memory for pairs: what object was paired with what person?
Recognition memory problems what words were studied?
Experts in crowds how to find experts in the absence of feedback
8
Ulysses S. Grant
James Garfield
Rutherford B. Hayes
Abraham Lincoln
Andrew Johnson
James Garfield
Ulysses S. Grant
Rutherford B. Hayes
Andrew Johnson
Abraham Lincoln
Recollecting Order from Declarative Memory
time
Place these presidents in the correct order
Experiment: Order all 44 US presidents
Similar to Roediger and Crowder (1976); Healy, Havas, Parker (2000)
Methods 26 participants (college undergraduates) Names of presidents written on cards Cards could be shuffled on large table
10
= 1= 1+1Measuring performance
Kendall’s Tau: The number of adjacent pair-wise swaps
Ordering by IndividualA B E C D
True OrderA B C D E
C DEA B
A B E C D
A B C D E= 2
Empirical Results
12
1 10 200
100
200
300
400
500
Individuals (ordered from best to worst)
(random guessing)
A Bayesian (generative) approach
13
D A B C A B D C B A D C A C B D A D B C
Generative Model
A B C D(latent random variable)
shared group knowledge
Bayesian models
We extend two models: Thurstone’s (1927) model Estes (1972) perturbation model
14
Bayesian Thurstonian Approach
15
Each item has a true coordinate on some dimension
A B C
Bayesian Thurstonian Approach
16
A B C
… but there is noise because of encoding and/or retrieval error
Person 1
Bayesian Thurstonian Approach
17
Each person’s mental representation is based on (latent) samples of these distributions
B C
A B C
Person 1
A
Bayesian Thurstonian Approach
18
B C
A B C
The observed ordering is based on the ordering of the samples
A < B < C
Observed Ordering:
Person 1
A
Bayesian Thurstonian Approach
19
People draw from distributions with common means but different variances
Person 1
B C
A B CA < B < C
Observed Ordering:
Person 2
A B C
BC
Observed Ordering:
A < C < BA
A
Bayesian Inference Problem
Given the orderings from individuals, infer: mean for each item standard deviations for each person
Markov Chain Monte Carlo (MCMC)
20
Inferred Distributions for 44 US Presidents
21
George Washington (1)John Adams (2)
Thomas Jefferson (3)James Madison (4)James Monroe (6)
John Quincy Adams (5)Andrew Jackson (7)
Martin Van Buren (8)William Henry Harrison (21)
John Tyler (10)James Knox Polk (18)
Zachary Taylor (16)Millard Fillmore (11)Franklin Pierce (19)
James Buchanan (13)Abraham Lincoln (9)
Andrew Johnson (12)Ulysses S. Grant (17)
Rutherford B. Hayes (20)James Garfield (22)Chester Arthur (15)
Grover Cleveland 1 (23)Benjamin Harrison (14)
Grover Cleveland 2 (25)William McKinley (24)
Theodore Roosevelt (29)William Howard Taft (27)
Woodrow Wilson (30)Warren Harding (26)Calvin Coolidge (28)Herbert Hoover (31)
Franklin D. Roosevelt (32)Harry S. Truman (33)
Dwight Eisenhower (34)John F. Kennedy (37)
Lyndon B. Johnson (36)Richard Nixon (39)
Gerald Ford (35)James Carter (38)
Ronald Reagan (40)George H.W. Bush (41)
William Clinton (42)George W. Bush (43)
Barack Obama (44)
median and minimumsigma
Model can predict individual performance
22
0 0.1 0.2 0.3 0.450
100
150
200
250
300
R=0.941
inferred noise level for
each individual
distance to ground
truth
individual
1 10 200
50
100
150
200
250
300
350
Individuals
Thurstonian ModelIndividuals
(Weak) Wisdom of Crowds Effect
23
model’s ordering is as good as best individual (but not better)
Extension of Estes (1972) Perturbation Model
Main idea: item order is perturbed locally
Our extension: perturbation noise varies
between individuals and items
24
A
True order
B C D E
Recalled order
DB C EA
Inferred Perturbation Matrix and Item Accuracy
252 6 10 14 18 22 26 30 34 38 42
1. George Washington (1)2. John Adams (2)
3. Thomas Jefferson (3)4. James Madison (4)5. James Monroe (6)
6. John Quincy Adams (5)7. Andrew Jackson (7)
8. Martin Van Buren (8)9. William Henry Harrison (21)
10. John Tyler (11)11. James Knox Polk (16)
12. Zachary Taylor (18)13. Millard Fillmore (9)
14. Franklin Pierce (20)15. James Buchanan (13)16. Abraham Lincoln (15)17. Andrew Johnson (10)18. Ulysses S. Grant (17)
19. Rutherford B. Hayes (19)20. James Garfield (22)21. Chester Arthur (14)
22. Grover Cleveland 1 (23)23. Benjamin Harrison (12)
24. Grover Cleveland 2 (25)25. William McKinley (24)
26. Theodore Roosevelt (28)27. William Howard Taft (26)
28. Woodrow Wilson (30)29. Warren Harding (27)30. Calvin Coolidge (29)31. Herbert Hoover (31)
32. Franklin D. Roosevelt (32)33. Harry S. Truman (33)
34. Dwight Eisenhower (34)35. John F. Kennedy (35)
36. Lyndon B. Johnson (36)37. Richard Nixon (38)
38. Gerald Ford (37)39. James Carter (39)
40. Ronald Reagan (40)41. George H.W. Bush (41)
42. William Clinton (42)43. George W. Bush (43)
44. Barack Obama (44)
Output position
True
pos
ition
0 5 10
Abraham Lincoln
Richard Nixon
James Carter
Strong wisdom of crowds effect
26
1 10 200
50
100
150
200
250
300
350
Individuals
Thurstonian ModelPerturbationIndividuals
Perturbation model’s ordering is better than best individual
Perturbation
Alternative Heuristic Models
Many heuristic methods from voting theory E.g., Borda count method
Suppose we have 10 items assign a count of 10 to first item, 9 for second item, etc add counts over individuals order items by the Borda count
i.e., rank by average rank across people
27
Model Comparison
28
1 10 20 300
50
100
150
200
250
300
350
Individuals
Thurstonian ModelPerturbationBorda countIndividuals
Borda
Recollecting order from episodic memory
29http://www.youtube.com/watch?v=a6tSyDHXViM&feature=related
Place scenes in correct order (serial recall)
30
time
A B C D
Recollecting Order from Episodic Memory
31
Study this sequence of images
Place the images in correct sequence (serial recall)
32
A
B
C
D
E
F
G
H
I
J
Average results across 6 problems
33
Mea
n
1 10 20 300
5
10
15
Individuals
Thurstonian ModelPerturbation ModelBorda countIndividuals
Example calibration result for individuals
34
0 2 4 60
5
10
15
20
25
30
R=0.920
inferred noise level
distance to ground
truth
individual
(pizza sequence; perturbation model)
Overview of talk
Ordering problems what is the order of US presidents?
Matching problems memory for pairs: what object was paired with what person?
Recognition memory problems what words were studied?
Experts in crowds how to find experts in the absence of feedback
35
Study these combinations
36
2 3 4 51
B C D EA
Find all matching pairs
37
Results across 8 problems
38
1 5 10 150
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Individuals
Mea
n A
ccur
acy
Bayesian MatchingHungarian AlgorithmIndividuals
General Knowledge Matching Problems
39
Dutch
Danish
Yiddish
Thai
Vietnamese
Chinese
Georgian
Russian
Japanese
A
B
C
D
E
F
G
H
I
godt nytår
gelukkig nieuwjaar
a gut yohr
С Новым Годом
สวสัดีปีใหม่
Chúc Mừng Nǎm Mới
გილოცავთ ახალწელს
Modeling Results – Declarative Tasks
40
1 10 200
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Individuals
Mea
n A
ccur
acy
Bayesian MatchingHungarian AlgorithmIndividuals
Overview of talk
Ordering problems what is the order of US presidents?
Matching problems memory for pairs: what object was paired with what person?
Recognition memory problems what words were studied?
Experts in crowds how to find experts in the absence of feedback
41
Listen to these words…
42
Experiment
Study list 10 lists of 15 spoken words
Recognition memory test Targets (15 items) Lure (1 item) Related distractors (15 items) Unrelated distractors (15 items)
Confidence ratings 5-point confidence ratings
1=definitely not on list; 2 = probably not on list; 3 = not sure; 4 = probably on list; 5 = sure it was on the list
43
Mean confidence ratings for 12 subjects
44
T L R U1
2
3
4
5Individual 1
T L R U1
2
3
4
5Individual 2
T L R U1
2
3
4
5Individual 3
T L R U1
2
3
4
5Individual 4
T L R U1
2
3
4
5Individual 5
T L R U1
2
3
4
5Individual 6
T L R U1
2
3
4
5Individual 7
T L R U1
2
3
4
5Individual 8
T L R U1
2
3
4
5Individual 9
T L R U1
2
3
4
5Individual 10
T L R U1
2
3
4
5Individual 11
T L R U1
2
3
4
5Individual 12
T L R U1
2
3
4
5METHOD1
Con
fiden
ce
ROC plots for individuals
45
0 0.1 0.2 0.3 0.4 0.5 0.6 0.70.4
0.5
0.6
0.7
0.8
0.9
1
False Alarm Rate
Hit
Rat
e
Individual 1Individual 2Individual 3Individual 4Individual 5Individual 6Individual 7Individual 8Individual 9Individual 10Individual 11Individual 12
Heuristic Aggregation Method
Group confidence = mean confidence rating across individuals
46
Performance of Aggregate
47
0 0.1 0.2 0.3 0.4 0.5 0.6 0.70.4
0.5
0.6
0.7
0.8
0.9
1
False Alarm Rate
Hit
Rat
e
Individual 1Individual 2Individual 3Individual 4Individual 5Individual 6Individual 7Individual 8Individual 9Individual 10Individual 11Individual 12AGGREGATE
Performance of Individuals and Aggregate
48
1 2 3 4 5 6 7 8 9 10 11 12 130.75
0.8
0.85
0.9
0.95
1
Individuals
AU
C
IndividualsMETHOD1
T L R U1
2
3
4
5
Problem with aggregation method
Aggregate also suffers from false memories
49
Con
fiden
ce
Potential Solution: identify group signature of false memories
50
1 2 3 4 50
0.2
0.4
0.6
0.8Study
1 2 3 4 50
0.1
0.2
0.3
0.4Lure
1 2 3 4 50
0.2
0.4
0.6
0.8Related Distractors
1 2 3 4 50
0.2
0.4
0.6
0.8
1Unrelated Distractors
Overview of talk
Ordering problems what is the order of US presidents?
Matching problems memory for pairs: what object was paired with what person?
Recognition memory problems what words were studied?
Experts in crowds how to find experts in the absence of feedback
51
Experiment
78 participants 17 ordering problems each with 10 items
Chronological Events Physical Measures Purely ordinal problems, e.g.
Ten Amendments Ten commandments
52
Ordering states west-east
53
Oregon (1)
Utah (2)
Nebraska (3)
Iowa (4)
Alabama (6)
Ohio (5)
Virginia (7)
Delaware (8)
Connecticut (9)
Maine (10)
Ordering Ten Amendments
54
Freedom of speech & religion (1)
Right to bear arms (2)
No quartering of soldiers (4)
No unreasonable searches (3)
Due process (5)
Trial by Jury (6)
Civil Trial by Jury (7)
No cruel punishment (8)
Right to non-specified rights (10)
Power for the States & People (9)
Ordering Ten Commandments
55
Worship any other God (1)
Make a graven image (7)
Take the Lord's name in vain (2)
Break the Sabbath (3)
Dishonor your parents (4)
Murder (6)
Commit adultery (8)
Steal (5)
Bear false witness (9)
Covet (10)
Question
How many individuals do we need to average over?
56
Effect of Group Size: random groups
57
0 10 20 30 40 50 60 70 807
8
9
10
11
12
13
14
Group Size
T=0T=2
T=12
How effective are small groups of experts?
Want to find experts endogenously – without feedback
Approach: select individuals with the smallest estimated noise levels based on previous tasks
We are identifying general expertise (“Pearson’s g”)
58
Group Composition based on prior performance
59
0 10 20 30 40 50 60 70 807
8
9
10
11
12
13
14
Group Size
T=0T=2
T=12
T = 0
# previous tasks
T = 2T = 8
Group size (best individuals first)
60
Endogenous no feedback
required
Exogenous selecting people based on
actual performance
0 10 20 30 407
8
9
10
11
12
13
14
0 20 407
8
9
10
11
12
13
14
Summary Aggregation of combinatorially complex data
going beyond numerical estimates or multiple choice questions
Incorporate individual differences going beyond models that treat every vote equally assume some individuals might be “experts”
Take cognitive processes into account going beyond mere statistical aggregation
61
Predictive Rankings: fantasy football
63
South Australian Football League (32 people rank 9 teams)
1 10 20 300
20
40
60
80
Individuals
Thurstonian ModelPerturbation ModelBorda countIndividuals
Australian Football League (29 people rank 16 teams)
1 10 20 300
5
10
15
20
25
Individuals
1 10 20 300
20
40
60
80
Online Experiments
Experiment 1 (Prior knowledge) http://madlab.ss.uci.edu/dem2/examples/
Experiment 2a (Serial Recall) study sequence of still images http://madlab.ss.uci.edu/memslides/
Experiment 2b (Serial Recall) study video http://madlab.ss.uci.edu/dem/
64
MDS solution of pairwise tau distances
65-15 -10 -5 0 5 10 15 20 25 30 35-20
-15
-10
-5
0
5
10
15
7
26
3
16
7 96
1
22
2
13
12
7
11
14
9
5
7
11
8
3
24
3
7
10
10
4
03
6
9
6
26
5
18
44 3
14
6
2
5
3
5
1
4210
11
4
3
42
0
8
21
7
3
5
1
1
8
1
33
14
3
20
6
8
16
7
22
23
2 3710
states westeast
IndividualsTruthThurstonian Model
distance to truth
MDS solution of pairwise tau distances
66-20 -15 -10 -5 0 5 10 15 20 25
-20
-15
-10
-5
0
5
10
15
20
14
23
25
24
18 24
13
14
10
5
9
20
8
20
15
18
12
33
25
29
171
14
20
27176
13
11
15
3
17
17
17
24
7
26
9
13
17
27
13
15
11
15
15
23
2811
26
16
4
27
9
23
24
11
17
19
15
22
2
15
14
12
21
11
26
11
18
35
22
10
20
24
25
1
19
7
0
ten commandments
IndividualsTruthThurstonian Model
Thurstonian Model – stereotyped event sequences
67
event1 (1)event2 (2)event3 (3)event4 (4)event5 (5)event6 (7)event7 (6)event8 (8)event9 (9)
event10 (10)
Bus (Recall)
0
5
10
15
20
25
R=0.890
event1 (1)event2 (2)event3 (3)event4 (4)event5 (5)event6 (6)event7 (7)event8 (8)event9 (9)
event10 (10)
Morning (Recall)
0
5
10
15
20
25
R=0.982
event1 (1)event2 (2)event3 (3)event4 (4)event5 (5)event6 (6)event7 (7)event8 (8)event9 (9)
event10 (10)
Wedding (Recall)
0 0.5 1 1.5 20
5
10
15
20
25
R=0.973
Thurstonian Model – “random” videos
68
event1 (1)event2 (2)event3 (3)event4 (5)event5 (7)event6 (6)event7 (4)event8 (8)event9 (9)
event10 (10)
Yogurt (Recall)
0
5
10
15
20
25
R=0.908
event1 (1)event2 (3)event3 (4)event4 (5)event5 (2)event6 (6)event7 (7)event8 (9)
event9 (10)event10 (8)
Pizza (Recall)
0
5
10
15
20
25
R=0.851
event1 (1)event2 (2)event3 (3)event4 (4)event5 (6)event6 (5)event7 (7)event8 (8)event9 (9)
event10 (10)
Clay (Recall)
0 0.5 1 1.5 20
5
10
15
20
25
R=0.928
Heuristic Aggregation Approach
Combinatorial optimization problem maximizes agreement in assigning N items to N responses
Hungarian algorithm construct a count matrix M Mij = number of people that paired item i with response j find row and column permutations to maximize diagonal sum O( n3 )
69
Hungarian Algorithm Example
70= correct
DutchDan
ish
Frenc
h
Japan
ese
Span
ish
Arabic
Chinese
German
Italia
nRussi
an
ThaiViet
namese
Wels
hGeo
rgian
Yiddish
gelukkig Nieuwjaar 7 3 0 0 0 1 0 0 0 0 0 0 2 0 2godt nytår 2 3 0 0 0 0 0 2 0 2 0 0 1 3 2
bonne année 0 0 14 0 1 0 0 0 0 0 0 0 0 0 00 0 0 9 0 0 2 0 1 0 3 0 0 0 0
feliz año nuevo 0 0 0 0 14 0 0 0 0 0 1 0 0 0 0عامسعيد 0 1 0 0 0 14 0 0 0 0 0 0 0 0 0
0 0 0 2 0 0 12 0 0 0 0 1 0 0 0ein gutes neues Jahr 3 1 0 0 0 0 0 9 0 0 0 0 1 0 1
felice anno nuovo 0 0 0 0 0 0 0 0 14 1 0 0 0 0 0С Новым Годом 0 0 1 0 0 0 0 0 0 11 0 0 1 2 0
สวัสดีปีใหม่ ่ 0 0 0 1 0 0 1 0 0 0 7 1 1 4 0Chúc Mừng Nǎm Mới 0 0 0 0 0 0 0 0 0 1 0 11 1 2 0
Blwyddyn Newydd Dda 0 4 0 1 0 0 0 0 0 0 1 0 6 1 2გილოცავთ ახალ წელს 0 0 0 2 0 0 0 1 0 0 3 2 0 1 6
a gut yohr 3 3 0 0 0 0 0 3 0 0 0 0 2 2 2
= incorrect
What are methods for finding experts?
1) Self-reported expertise: unreliable has led to claims of “myth of expertise”
2) Based on explicit scores by comparing to ground truth but ground truth might not be immediately available
3) Endogenously discover experts Use the crowd to discover experts Small groups of experts can be effective
71
Modified Perturbation Model
75
0.8 1 1.2 1.4 1.6 1.8
0
2
4
6
8
10
12
14
16
18R=-0.752
1
2
3
4
5
6
7
8
9
10
1112
13
14
15
16
17
Predicting problem difficulty
76
std( )
dispersion of noise levels across individual
distance of group
answer to ground truth
ordering states geographically
city size rankings
Mean p( “yes” )
77
T L R U0
0.5
1aaa
T L R U0
0.5
1ardor
T L R U0
0.5
1azs
T L R U0
0.5
1incognito
T L R U0
0.5
1indigo
T L R U0
0.5
1jshi
T L R U0
0.5
1nobody
T L R U0
0.5
1peter griffin
T L R U0
0.5
1piper michelle
T L R U0
0.5
1plutonium
T L R U0
0.5
1scott bakula
T L R U0
0.5
1sky
T L R U0
0.5
1METHOD1
note: confidence ratings were converted to yes/no judgments. Yes = rating >= 3; No = rating < 3