aggregating information from the crowd · 2019-07-15 · aggregating information from the crowd...
TRANSCRIPT
![Page 1: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/1.jpg)
Aggregating information from the crowd
Anirban DasguptaIIT Gandhinagar
Joint work with Flavio Chiericetti, Nilesh Dalvi, Vibhor Rastogi, Ravi Kumar, Silvio Lattanzi
January 07, 2015
![Page 2: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/2.jpg)
Crowdsourcing
Many different modes of crowdsourcing
![Page 3: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/3.jpg)
Aggregating information using the Crowd: the expertise issue
Is IISc more than 100 years old?
Does IISc have more than UG than PG?
Yes Yes Yes No No
No!
Yes!
Yes YesNo
Typically, the answers to the crowdsourced tasks are unknown!
![Page 4: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/4.jpg)
Aggregating information using the Crowd: the effort issue
Does this article have appropriate references at all places?
Yes Yes Yes No No
Even expert users need to spend effort to give
meaningful answers
![Page 5: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/5.jpg)
• How to ensure that information collected is “useful”?
– Assume users are strategic
– effort put in when making judgments, truthful opinions
– design the right payment mechanism
• How to aggregate opinions from different agents?
– user behavior stochastic
– varying levels of expertise, unknown
– might not stick around to develop reputation
Elicitation & Aggregation
![Page 6: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/6.jpg)
This talk: only aggregation
• Formalizing a simple crowdsourcing task
– Tasks with hidden labels, varying user expertise
• Aggregation for binary tasks
– stochastic model of user behaviour
– algorithms to estimate task labels + expertise
• Continuous feedback
• Ranking
![Page 7: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/7.jpg)
Binary Task model
• Tasks have hidden labels:
– {-1, +1}
– E.g. labeling whether good quality article
• Each task is evaluated by a number of users
– not too many
• Each user outputs {-1, +1} per task
• Users and tasks fixedn users
m tasks
![Page 8: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/8.jpg)
Simple User model
• Each user performs set of tasks assigned to her
• Users have proficiency
– Indicates probability that the true signal is seen
– This is not observable
-1
+1
-1+1
+1
+1
[Dawid, Skene, '79]
Note: This does not model bias
![Page 9: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/9.jpg)
Stochastic model
G = user-item graph
q = vector of actual qualities
= rating on by user j on item i
Given n-by-m matrix U, estimate vectors q and p
+1
-1
+1
-1
![Page 10: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/10.jpg)
From users to items
• If all users are same, then simple majority/average will do
• Else, some notion of weighted majority e.g.
• We will try to estimate user reliabilities first
-1
-1
+1
??
![Page 11: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/11.jpg)
Intuition: if G is complete
• Consider the user x user matrix UUt
UUt = (#agreements - #disagreements) between j and k
is a rank one matrix
If we approximate, UUt ≈ E(UUt), w is rank-1 approximation of UUt
noise
![Page 12: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/12.jpg)
Arbitrary assignment graphs
Then
Hadamard product:
E[agree – disagree] on eachNumber of shared items
![Page 13: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/13.jpg)
Arbitrary assignment graphs
Then
Hadamard product:
E[agree – disagree] on eachNumber of shared items
Similar spectral intuitions hold, only slightly more
work is needed
![Page 14: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/14.jpg)
Algorithms● Core idea is to recover the “expected” matrix using spectral
techniques
● Ghosh, Kale, McAfee'11
– compute topmost eigenvector of item x item matrix
– proves small error for G dense random graph
● Karger, Oh, Shah'11
– using belief propagation on U
– proof of convergence for G sparse random
● Dalvi, D., Kumar, Rastogi'13
– for G an “expander”, use eigenvectors of both GG' and UU'
● EM based recovery Dawid & Skene'79
![Page 15: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/15.jpg)
Empirical: user proficiency can be more or less estimated
Correlation of predicted and actual proficiency on the Y-axis
[ Aggregating crowdsourced binary ratings, WWW'13
Dalvi, D., Kumar, Rastogi ]
![Page 16: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/16.jpg)
Aggregation
Formalizing a simple crowdsourcing task
– Tasks with hidden labels, varying user expertise
Aggregation for binary tasks – stochastic model of user behaviour
– algorithms to estimate task labels + expertise
Continuous feedback
Ranking
![Page 17: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/17.jpg)
Continuous feedback model
• Tasks are continuous:
– Quality
• Each user has a reliability
• Each user outputs a score per task
n users
m tasks
![Page 18: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/18.jpg)
Continuous feedback model
• Tasks are continuous:
– Quality
• Each user has a reliability
• Each user outputs a score per task
Minimize max
n users
m tasks
![Page 19: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/19.jpg)
Some simpler settings & obstacles
![Page 20: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/20.jpg)
Suppose that we know the
Single item, known variances
We want to minimize
![Page 21: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/21.jpg)
Suppose that we know the
Single item, known variances
We want to minimize
it is known that an asymptotically optimal estimate is
Loss =
![Page 22: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/22.jpg)
Single item, unknown variances
We want to minimize
Suppose that we do not know the
Only one sample, so cannot estimate Cannot compute weighted average
![Page 23: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/23.jpg)
Arithmetic Mean
In binary case for single item we can obtain the optimum by using a majority rule.
In a continuous case using the same approach we would compute the arithmetic mean.
![Page 24: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/24.jpg)
Arithmetic Mean
In binary case for single item we can obtain the optimum by using a majority rule.
In a continuous case using the same approach we would compute the arithmetic mean
and hence
![Page 25: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/25.jpg)
Arithmetic Mean
In binary case for single item we can obtain the optimum by using a majority rule.
In a continuous case using the same approach we would compute the arithmetic mean
and hence
Thus the loss
![Page 26: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/26.jpg)
Arithmetic Mean
In binary case for single item we can obtain the optimum by using a majority rule.
In a continuous case using the same approach we would compute the arithmetic mean
and hence
Thus the loss Is this optimal?
![Page 27: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/27.jpg)
Problem with Arithmetic mean
The AM would have error
![Page 28: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/28.jpg)
Problem with Arithmetic mean
The AM would have error
Same problem with the median algorithm
![Page 29: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/29.jpg)
Problem with Arithmetic mean
The AM would have error
By choosing the nearest pair of points, we have a much better
estimate
Same problem with the median algorithm
![Page 30: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/30.jpg)
Shortest gap algorithm
Maybe the optimal algo is to select one of two nearest samples?
In this setting, w.h.p., the two closest points are at distance
But arithmetic mean gives loss
![Page 31: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/31.jpg)
Last obstacle
More is not always better
In this setting, w.h.p., the first two closest points are at distance
Adding bad raters could
actually worsen the shortest
gap algorithm
Mean is not good here either
But so will be some other pair
![Page 32: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/32.jpg)
Single Item case
![Page 33: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/33.jpg)
Results
Theorem 1: There is an algo with expected loss
Theorem 2: There is an example where the gap between any algo and the known variance setting is
[Chiericetti, D., Kumar, Lattanzi' 14]
![Page 34: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/34.jpg)
Algorithm
Combination of two simple algorithms k-median algorithm return the rate of one of the k central raters
![Page 35: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/35.jpg)
Algorithm
Combination of two simple algorithms k-median algorithm return the rate of one of the k central raters
![Page 36: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/36.jpg)
Algorithm
Combination of two simple algorithms k-median algorithm return the rate of one of the k central raters
k-shortest gap Return one of the k closest points
![Page 37: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/37.jpg)
Algorithm
Combination of two simple algorithms k-median algorithm return the rate of one of the k central raters
k-shortest gap Return one of the k closest points
![Page 38: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/38.jpg)
Let be the length of the k-shortest gap
Compute the medianFind the shortest gap and return a point in it
Algorithm
![Page 39: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/39.jpg)
Proof Sketch
w.h.p. contains
WHP , length of the k-shortest gap is at most
Select the median points
![Page 40: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/40.jpg)
Proof Sketch
w.h.p. contains
WHP , length of the k-shortest gap is at most
Select the median points
If we consider points, then WHP there will be no
ratings with variance than
that are within distance
![Page 41: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/41.jpg)
Proof Sketch
Thus the distance of the shortest gap points to the
truth is bounded
![Page 42: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/42.jpg)
Lower bound
Instance: μ selected in
variance of j-th user =
Optimal algorithm (known variance) has loss
![Page 43: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/43.jpg)
Lower bound
Instance: μ selected at random in
variance of j-th user =
Optimal algorithm (known variance) has loss
We will show that maximum likelihood estimation cannot
distinguish between - L and + L → loss
![Page 44: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/44.jpg)
Lower Bound
Consider the two log-likelihoods
Claim: Irrespective of value of μ, can be positive or
negative with const prob.
![Page 45: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/45.jpg)
Lower Bound
Consider the two log-likelihoods
Claim: Irrespective of value of μ, can be positive or
negative with const prob.
![Page 46: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/46.jpg)
The idea is to use the same algorithm of constant number of items, but to use a smarter version of the k shortest gap that looks for k points at distance at most in all the items
Multiple items
![Page 47: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/47.jpg)
The idea is to use the same algorithm of constant number of items, but to use a smarter version of the k shortest gap that looks for k points at distance at most in all the items
Multiple items
![Page 48: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/48.jpg)
Multiple items
Theorem: For m=o(log n) , complete graph,
can get an expected loss of
Theorem: For m=Ω(log n), complete or dense random,
expected loss almost identical to the known variance case
![Page 49: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/49.jpg)
Aggregation
Formalizing a simple crowdsourcing task
– Tasks with hidden labels, varying user expertise
Aggregation for binary task
– stochastic model of user behaviour
– algorithms to estimate task labels + expertise
Continuous feedback
Ranking
![Page 50: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/50.jpg)
Crowdsourced rankings
![Page 51: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/51.jpg)
Crowdsourced rankings
How can we
aggregate noisy
rankings
![Page 52: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/52.jpg)
Crowdsourced rankings
How can we
aggregate noisy
rankings
![Page 53: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/53.jpg)
Mallows Model [Mallows 1957]
There is a hidden permutation σ
and a scale parameter β
A permutation π is generated as
κ(σ,π) = Kendall-Tau distance
Braverman, Mossel'09: Finding the MLE for single parameter Mallows
![Page 54: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/54.jpg)
Mallows Model
There is a hidden permutation σ
and a user specific scale parameter βi
![Page 55: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/55.jpg)
Single item with known parameters
Theorem: For m samples, if then can recover σ WHP.
Theorem: If then cannot recover σ
Approximate reconstruction versions of these theorems also hold
[Chiericetti, D, Kumar, Lattanzi, RANDOM'14]
Algo: Weighted Borda count, weights = thresholded β values
![Page 56: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/56.jpg)
Summary
● Host of interesting problems in crowdsourcing aggregation
– Specially for structured outputs
● For binary tasks
– Spectral techniques provide a powerful tool
● For gaussians
– new aggregation problems even for single item
– Combination of k-median & k-shortest gap
● For ranking
– Main technical contribution is calculating the swapping probs
– aggregation with known parameters is nontrivial
![Page 57: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/57.jpg)
Open questions• Continuous feedback
– More natural algorithms for aggregation?
– Better algorithms for multiple items
– Instance optimal algorithms?
– Non-gaussian distributions?
– Mixture learning with lots of components and single/constant samples per component?
• Ranking
– Better estimation of Mallows parameters
– Multiple items, under partial ranking/pairwise preferences?
• More realistic complex model of user?
– Incorporating user bias?
– different kind of expertise, not just reliability
![Page 58: Aggregating information from the crowd · 2019-07-15 · Aggregating information from the crowd Anirban Dasgupta IIT Gandhinagar Joint work with Flavio Chiericetti, Nilesh Dalvi,](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1cbd193c5da03d0664711/html5/thumbnails/58.jpg)
Thanks!