0 ! . 1. 2 3 0 4 5 6! 7 , 8 9 / , . 7 : 4 ; < - xidiansee.xidian.edu.cn/faculty/chdeng/welcome to...

4610 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 31, NO. 11, NOVEMBER 2020

Robust Cumulative Crowdsourcing FrameworkUsing New Incentive Payment Function and Joint

Aggregation ModelKamran Ghasedi Dizaji, Hongchang Gao, Yanhua Yang, Heng Huang ,

and Cheng Deng , Student Member, IEEE

Abstract— In recent years, crowdsourcing has gained tremen-dous attention in the machine learning community due to theincreasing demand for labeled data. However, the labels collectedby crowdsourcing are usually unreliable and noisy. This issue ismainly caused by: 1) nonflexible data collection mechanisms;2) nonincentive payment functions; and 3) inexpert crowdworkers. We propose a new robust crowdsourcing frameworkas a comprehensive solution for all these challenging problems.Our unified framework consists of three novel components. First,we introduce a new flexible data collection mechanism basedon the cumulative voting system, allowing crowd workers toexpress their confidence for each option in multi-choice questions.Second, we design a novel payment function regarding thesettings of our data collection mechanism. The payment functionis theoretically proved to be incentive-compatible, encouragingcrowd workers to disclose truthfully their beliefs to get themaximum payment. Third, we propose efficient aggregationmodels, which are compatible with both single-option and multi-option crowd labels. We define a new aggregation model, calledsimplex constrained majority voting (SCMV), and enhance itby using the probabilistic generative model. Furthermore, fastoptimization algorithms are derived for the proposed aggregationmodels. Experimental results indicate higher quality for thecrowd labels collected by our proposed mechanism withoutincreasing the cost. Our aggregation models also outperform thestate-of-the-art models on multiple crowdsourcing data sets interms of accuracy and convergence speed.

Index Terms— Crowdsourcing aggregation model, data collec-tion mechanism, incentive payment function.

I. INTRODUCTION

IN RECENT years, crowdsourcing has attracted extensiveattentions in diverse research communities due its effective

Manuscript received June 19, 2018; revised March 6, 2019, July 5, 2019,and October 10, 2019; accepted November 15, 2019. Date of publicationJanuary 14, 2020; date of current version October 29, 2020. The work ofY. Yang and C. Deng was supported in part by the National Natural ScienceFoundation of China under Grant 61572388 and Grant 61703327, in part bythe Key R&D Program-The Key Industry Innovation Chain of Shaanxi underGrant 2017ZDCXL-GY-05-04-02, Grant 2017ZDCXL-GY-05-02, and Grant2018ZDXM-GY-176, and in part by the National Key R&D Program of Chinaunder Grant 2017YFE0104100. (Corresponding author: Cheng Deng.)

K. G. Dizaji and H. Gao are with the Department of Electrical and ComputerEngineering, University of Pittsburgh, Pittsburgh, PA 15260 USA.

Y. Yang and C. Deng are with Xidian University, Xi’an 710071, China(e-mail: [email protected]).

H. Huang is with the Department of Electrical and Computer Engineering,University of Pittsburgh, Pittsburgh, PA 15260 USA, and also with JD FinanceAmerica Corporation, Mountain View, CA 94043 USA.

Color versions of one or more of the figures in this article are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TNNLS.2019.2956523

platform in employing large groups of Internet users forsolving complex problems such as the fast and inexpensiveannotation of huge data sets [1], introducing hybrid human–machine systems [2], [3], creation of user-generated contents[4], financing of products [5], and even collecting ideas fornew products [6]. Notably, the impressive performance ofdeep models on different applications [7]–[12] is relied onthe large training data sets, which are mostly labeled usingthe crowdsourcing platforms like Amazon Mechanical Turk.In particular, these platforms provide this opportunity to arequester, who needs to complete a large task and has accessto the on-demand crowd workers in order to collect the crowdlabels in a fast and inexpensive way. The crowdsourcingtasks are mostly formatted as a set of multi-choice questions,where the crowd workforce is supposed to complete themby choosing a single option for each question. However,these single-option crowd labels are often noisy and unreliabledue to various issues such as: 1) crowd workers are oftennot expert in the assigned task [13]–[16]; 2) data collectionmechanisms are not flexible enough to allow crowd workerscompletely conveying their nondeterministic beliefs [17], [18];and 3) payments to crowd workers are not proportional totheir performance, and consequently do not incentivize themto accurately disclose their beliefs [17], [19], [20].

A. Problem Statement

There are many studies in literature to tackle these issuesindependently. To address the first issue, each question isusually labeled by multiple crowd workers, where these labelsare aggregated using a crowdsourcing aggregation model toestimate the potential true labels [13], [15], [21], [22]. Theseaggregation models are mainly developed for the single-option crowd labels and are based on the generative models.The second issue is mainly caused by the vanilla crowd-sourcing data collection mechanisms that force the crowdworkforce to report their beliefs using the single-option labels,even when they are not sure about true labels. This issueis explored in a few recent studies, which introduce moreflexible data collection mechanisms, allowing the workers toexpress their nondeterministic beliefs [17], [18], [23]. Thethird issue is tackled by incentive payment functions, whichreward workers based on their performance. These paymentfunctions are designed with regard to the settings of a data

2162-237X © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on November 03,2020 at 01:42:51 UTC from IEEE Xplore. Restrictions apply.

https://orcid.org/0000-0002-3483-8333

https://orcid.org/0000-0003-2620-3247

GHASEDI DIZAJI et al.: ROBUST CUMULATIVE CROWDSOURCING FRAMEWORK 4611

Fig. 1. Three crowd workers are hired to categorize the breed of adog as Scotch, Yorkshire, or Australian. They express their single-optionand Cumulative crowd labels using checked boxes and confidence bars,respectively. While the average score of Cumulative labels correctly indicateshigher chance for Australian, the majority of single-option labels incorrectlysuggests Yorkshire as the truth.

collection mechanism [17], [18], [20]. However, the proposedaggregation models for the first issue are incompatible with theflexible data collection mechanisms, which are designed forthe second issue [17]. These aggregation models are primarilydeveloped to deal with the single-option crowd labels and maynot be the optimal candidate for handling the multi-option(nondeterministic) crowd labels.

B. Proposed Solution

In this article, we propose a new unified crowdsourcingframework as a comprehensive solution for all the threeaforementioned issues. We first introduce a new data collectionmechanism with a flexible interface for crowd workers, thenprovide an incentive payment function designed based on thesettings of our data collection mechanism, and finally proposenovel aggregation models to aggregate the nondeterministiccrowd labels efficiently.

C. Data Collection Mechanism

In general, crowdsourcing data collection mechanismsinclude several multi-choice questions, where the crowd work-ers are expected to answer each question only by a sin-gle option regardless of their confidence about the selectedoption. This mechanism may adversely affect the quality ofcollected data, since crowd workers are obligated to conveytheir nondeterministic beliefs in a rigid single-option system.To address this issue, we introduce a flexible data collectionapproach, called Cumulative mechanism, which leverages acumulative voting strategy [24]. Our mechanism allows crowdworkers to express their confidence levels for each option inmulti-choice questions. In fact, the Cumulative mechanismcan be considered as the most general (i.e., flexible) case ofsingle-option mechanisms, since crowd workers can choose asingle option with 100% confidence, when they are sure abouttheir answer; select multiple options with different confidencelevels, when they have partial knowledge about the true label;and also skip the question, when they do not have any clueabout the true label. Using a toy example, Fig. 1 demonstratesan advantage of the Cumulative mechanism against the single-option system in estimating the true labels.

D. Payment Function

Payment function is the other key factor, which affects thequality of the crowd data. Equal payment does not certainlyalign the incentives of crowd workers with the task requester.

Some workers (spammers) may randomly answer all questionswithout any effort in order to quickly complete their task andreceive the fixed payment. However, an appropriate paymentfunction should reward the workers who accurately report theirbeliefs and penalize those who carelessly answer the questions.Technically, a payment function is called incentive-compatible,if crowd workers receive their maximum payment when theytruthfully disclose their knowledge [17]. We define a newpayment function by considering the settings of Cumulativecrowd labels and theoretically prove its incentive-compatibleproperty.

E. Aggregation Model

We also propose crowd aggregation models that are com-patible with both the Cumulative labels as well as the single-option data, unlike the existing aggregation models that areonly able to deal with the single-option labels. To do so,we propose simplex constrained majority voting (SCMV) asa novel discriminative aggregation model. SCMV boosts thediscriminative ability of the majority voting (MV) model byconsidering the unequal reliability parameters for crowd work-ers. We define an intuitive objective function for SCMV, provethe convexity of its optimization problem, and derive a fastoptimization algorithm to solve the corresponding problem.Moreover, we enhance SCMV as a discriminative model bycoupling with a regularized variant of the Dawid and Skene(DS) method as a generative method [21]. A probabilisticframework is used to combine the discriminative SCMV andthe generative DS models rather than a heuristic way. Ourproposed joint model, denoted by DS-SCMV, is trained by analternating learning approach, where the truths are estimatedby the contribution of both SCMV and DS submodels, andtheir parameters are updated using the estimated truths.

F. Experimental Results

In our experiments, we first separately evaluate the Cumula-tive data collection mechanism and the proposed aggregationmodels. The Cumulative mechanism is shown to collect crowdlabels with higher quality and similar cost compared with othermechanisms, benefiting from our incentive payment function.Moreover, our aggregation models not only achieve resultson par with the state-of-the-art methods on several bench-mark data sets (i.e., single-option data) but also show fasterconvergence rates than them. Finally, we examine the effec-tiveness of the proposed unified crowdsourcing framework bycomparing with alternative approaches with different crowddata collections, payment functions, and aggregation models,indicating that our crowdsourcing framework outperforms theother approaches with significant margins.

G. Contributions

The main contributions of this article can be summarizedas follows.

1) Introducing a flexible data collection mechanism basedon the cumulative voting strategy.



2) Designing a new payment function regarding the settingsof Cumulative crowd labels along with a theoreticalproof for its incentive-compatible property.

3) Proposing a joint aggregation model with an efficientlearning framework for the single- and multiple-optioncrowd labels.

II. RELATED WORK

A. Data Collection Mechanisms

Crowdsourcing task requesters usually employ the pluralityvoting system as their data collection mechanisms, in whichcrowd workforce can only choose a single option. However,recent studies have shown improvement in the quality of crowdlabels, when more flexible data collection mechanisms areused for collecting crowd labels [17], [18], [23]. To acquirehigher quality crowd labels, Shah and Zhou [18] enhancedthe single selection mechanism by two extensions includingthe skip-based and confidence-based settings. In the formercase, crowd workers can skip questions if they are not sureabout the true labels; in the latter one, they are able to reporttheir confidence for a single selected option. Oyama et al. [25]used the self-reported confidence by crowd workers for eachsingle-option crowd label to improve estimating the truths.These confidence values are first binarized by an adaptivethreshold for each worker and are then fed to an aggregationmodel as the input. Kazai [26] explored the relation betweenthe self-reported confidence and the quality of crowd labels.More recently, approval voting strategy was used for thecrowdsourcing data collection in order to benefit from thepartial knowledge of crowd workers. In this approach, crowdworkers can select multiple options, which could be correctaccording to their personal beliefs [17]. Our data collectionmechanism is different from previous studies and providesmore flexibility for crowd workers, since they not only areable to select single or multiple options but also can expresstheir confidence for each option.

B. Incentive Payment Functions

Traditional payment functions for crowdsourcing tasks justpay equally to all workers who complete their task. In otherwords, it does not discriminate between the truthful workers,who perform the task by their best of knowledge, and thespammers, who randomly respond to the questions without anyeffort. Regarding this issue, two general groups of paymentfunctions were developed to incentivize crowd workers todisclose their beliefs truthfully. The first group employs a smallsubset of questions, called “gold standard” questions, whichhave a priori known true labels for the task requester. Thesegold standard questions are randomly distributed among thequestions to evaluate the performance of workers [17], [18],[20], [27]. The second group, called peer-prediction paymentfunctions, elicits additional information from crowd workersto assess the performance of other workers in the absence ofany known true labels [19], [27], [28]. Our payment functionfalls in the first group, since it uses gold standard questions aswell. However, the proposed payment function is unique, in the

sense that it is designed with regard to the nondeterministiccrowd labels collected by the Cumulative mechanism.

C. Aggregation Models

There are large number of crowdsourcing aggregation mod-els in literature for aggregating noisy and unreliable crowdlabels [22], which can be generally grouped into discriminativeand generative models. On the one hand, the discriminativemodels directly estimate the truths without any assumptionabout the distribution of crowd labels. (MV) can be consideredas the most basic discriminative model, assuming similarweights for all workers in averaging their votes. Li and Yu[29] extended MV to iterative weighted MV, denoted byIWMV, as a fast and intuitive weighted version of MV, andalso theoretically guaranteed to optimize the derived errorbound for general type of aggregation rules. Moreover, Tianand Zhu [30] proposed max-margin majority voting (M3V ),which enhances MV by employing the concept of marginin support vector machines by their model. On the otherhand, the generative aggregation models build a probabilisticmodel for representing the distribution of crowd labels, giventhe unknown truths and parameters. The DS model is oneof the earliest generative aggregation models, considering aconfusion matrix as a reliability parameter of each worker[21]. The diagonal elements of the confusion matrix representthe probability that a worker correctly predicting the itemsof each class, and the off-diagonal elements indicate thechance of mislabeling items by a worker. Furthermore, severalextensions of the DS models were proposed by using theBayesian posterior distribution for the truths, given differentprior distributions for parameters [14], [31]. GLAD is anothergenerative model, considering one parameter for the reliabilityof each worker and another for the difficulty of each itemwith the logistic regression model [13]. Zhou et al. [15], [32]introduced Entropy using a minmax conditional entropy for thecrowd labels, where every worker and item have a confusionmatrix as the reliability and difficulty parameters, respectively.Later, Tian and Zhu [30] regularized a variant of DS withthe M3V model and jointly learned the parameters of bothmodels. Dizaji et al. [16] also proposed a joint aggregationmodel, which differs with our proposed model that imposes anonnegativity constraint for the worker reliability parameters.In addition, our crowdsourcing framework is different fromthese models due to the comprehensive solution that addressesa broader problem from the crowd data collection to the labelaggregation. Furthermore, unlike the standard aggregationmodels, our proposed models are able to efficiently deal withmulti-option Cumulative input data as well as the single-optioncrowd labels.

III. CUMULATIVE VOTING DATA COLLECTION INTERFACE

A. Existing Mechanisms

Typical crowdsourcing data collection mechanisms employa simple-plurality (single-selection) voting system, asking thecrowd workers to choose only a single option for each question[see Fig. 2(a)]. The skip-based single-selection mechanismjust adds this opportunity for the crowd workers to skip



Fig. 2. Four crowdsourcing data collection mechanisms for the followingtask: For each flag, mark the continent to which the country belongs.(a) Plurality voting system: Crowd workforce is asked to choose only asingle option. (b) Skip-based plurality voting system: Crowd workers canchoose a single option or skip the question [18]. (c) Approval voting system:Crowd workers are allowed to select multiple options that can be correct basedon their beliefs [17]. (d) Cumulative voting system: Crowd workers are ableto report their confidence values for every options or skip the question.

questions, when they are not sure about the true labels[see Fig. 2(b)] [18]. However, the cardinal voting strategyallows voters to give each option an independent rating [33].Approval voting can be seen as the simplest version of thecardinal voting system, including two choices for each option,“approved” or “unapproved.” Shah et. al. [17] used thismechanism as their crowdsourcing data collection mechanism[see Fig. 2(c)]. Other cardinal voting systems provide moreflexibility for the votes, like the cumulative voting system,which allows voters to split their votes among different can-didates [24].

B. Cumulative Mechanisms

Considering the advantages of the cumulative voting sys-tems, we design a skip-based fractional cumulative mechanismto collect labels from crowd workers efficiently [see Fig. 2(d)].In this mechanism, crowd workers can express their beliefsflexibly by assigning different confidence levels for eachoption. Therefore, they can select a single option when they aresure about their beliefs, represent their partial knowledge abouta question by choosing multiple options, or skip a questionwhen they do not have any clue about the true label. Unlikeapproval voting system that does not authorize crowd workersto prioritize their selected options, our Cumulative mechanismallows them to specify higher confidence level for the morepreferred options. Although crowd workers can skip a questionby selecting all the options with the same confidence, we addan “I am not sure!” option to make the skip choice easier.

IV. INCENTIVE PAYMENT FUNCTION

A. Notations

Let us consider the crowdsourcing task with N questions,where each one has B options, and only one of the options istrue. The task requester hires M crowd workers to completethe task, while he knows the answers of G “gold standard”questions in advance and randomly distributes them amongthe questions. The answers of a worker are indicated by Q ={q1, . . . , qN }, where qi = [q1

i , . . . , q Bi ] is a B-dimensional

Cumulative label. In particular, qki shows the confidence of

a worker about the kth option of the i th question. Hereafter,for any integer K , we use [K ] as the set {1, 2, . . . , K } for thesake of simpler notations. Throughout this article, the nonbold,lowercase bold, and uppercase bold letters are used for scalars,vectors, and matrices, respectively.

As mentioned earlier, we use the G gold standard questionsto evaluate the performance of crowd workers. Knowing thetrue options of gold standard questions, we consider theconfidence of workers for the corresponding options as theinput for our payment function. Assuming that the maximumpayment is bounded by amax , the payment function is definedby f : {qc1

1 , . . . , qcGG } → [0, amax ], where the set {c1, . . . , cG}

indicates the priori known true options of G gold standardquestions. In the following, we first provide the definition ofthe incentive-compatible payment function [17], then introduceour payment function, and finally prove this property for theproposed payment function.

Definition 1: A mechanism is called incentive-compatible,if the expected payment from the view of workers is strictlymaximized, when they precisely answer questions accordingto their personal beliefs.

B. Expected Payment Function

Thus, the expected payment function from the workers’view plays a central role in proving the incentive-compatibilityof a mechanism. Since crowd workers do not know theplace of the gold standard questions, the expectation is takenover the random G gold standard questions among the totalN questions. The personal beliefs and reported Cumulativelabels of workers are indicated by pk

i and qki , respectively,

where i and k are the question and option indices. Thefollowing equation shows the expected payment function forthe Cumulative crowd labels:

$ = 1(NG

) ∑(i1,...,iG )

⊆[N]

∑(k1,··· ,kG )

⊆[B]G

⎛⎝ G∏

g=1

pkgig

⎞⎠ f

(qk1

i1, . . . , qkG

iG

)(1)

where (i1, . . . , iG) ⊆ [N] in the outer summation shows therandomness of the G gold standard questions based on aworker’s view and (k1, . . . , kG) ⊆ [B]G indicates differentcombinations of options as the potential true options. Consid-ering kg and ig as the option and question indices, p

kgig

and

qkgig

show a worker’s personal belief and a reported Cumulativelabel, which may not be a similar value. Note that thisexpected payment function is different from the one in [17].Ours incorporates the confidence of each option that could be



different in the Cumulative labels, but Shah et al. [17] onlyconsidered the total confidence of all selected options, sincethere is no more detailed information in approval voting data.

C. Linear Payment Function

Although paying linearly proportional to the confi-dence level of gold standard questions, f (qc1

1 , .., qcGG ) =

a∏G

i=1 qcii + b, seems intuitively proper for Cumulative data,

this approach is not an incentive-compatible payment function.Using the proof with contradiction, we first assume that thelinear payment function is incentive-compatible and then showthe contradiction. Let us consider the special case when N =G = 1 and B = 2, and the expected payment function is

$ =∑

k⊆{1,2}pk f (qk) = p1 f (q1) + p2 f (q2)

= p f (q) + (1 − p) f (1 − q)

= p (aq + b) + (1 − p) (a(1 − q) + b). (2)

In order to hold the incentive-compatible property,the expected payment function should be maximized if aworker answers based on her/his belief (i.e., q = p). There-fore, the maximum expected payment is $sup = a(2 p2 −2 p +1)+b, when the worker truthfully answers the question,and the other expected payment is $other = a(2 pq − p −q + 1)+ b, when the worker imprecisely answers the question(i.e., q �= p). Now suppose the case when the worker’s beliefis 0.5 < p < 1, but the reported Cumulative label is q = 1and a > 0. In the following, we show that $sup < $other ,and consequently, the linear payment function is not incentive-compatible:

$sup−$other =a(2 p2−3 p+1)=a(p−1)(p−0.5)<0. (3)

D. Proposed Payment Function

We propose a new payment function for Cumulative crowdlabels and summarize it in Algorithm 1. Our payment functionhas the multiplicative format for the G gold standard questions,and generally can be abbreviated as follows: 1) the paymentis zero, if a worker assigns zero confidence to the true optionof even one gold standard question and 2) the payment isdecreasing proportional to the lack of confidence in the trueoptions by factor 1 − logε(q

cii ), where qci

i is a crowd labelof a worker for the true option of the i th gold standardquestion. Consequently, crowd workers select all the optionsthat have the chance to be correct based on their personalbeliefs due to the first property (i.e., to avoid zero payment),and they assign more confidence for the more preferredoptions due to the second property (i.e., to increase payments).In the proposed payment function, the scalar parameter amax

shows the maximum payment amount, and ε is a very smallpositive number and can be chosen as the smallest possiblepositive value in the machine. In the following, we provide thetheorem and proof for incentive-compatibility of the proposedmechanism.

Theorem 1: The mechanism of Algorithm 1 is incentive-compatible.

Algorithm 1 Incentive Payment Function for CumulativeCrowd Labels

1 Input: A worker’s confidence level for the true optionsof G “gold standard” questions{qc1

1 , · · · , qcGG } where qci

i ∈ [0, 1]2 Output: worker’s payment

f (qc11 , · · · , qcG

G ) = amax

G∏i=1

(1 − logε qci

i

)1{qci

i ≥ ε}

Proof: The incentive-compatible mechanism implies thatthe expected payment function is maximized when a workerdiscloses his/her personal beliefs. Thus, the maximumexpected payment function ($sup) with input q = p mustbe strictly greater than any other expected payment functions($other) with input q �= p. In the case of N = G = 1, weneed to prove that

$other =B∑

k=1pk f (qk) < $sup =

B∑k=1

pk f (pk). (4)

When N = G = 1, the payment function with amax = 1 is

f (qc) = (1 − logε qc)1{qc ≥ ε}.Thus

$other − $sup

=B∑

k=1

pk[(1 − logε qk)1{qk ≥ ε} − (1 − logε pk)1{pk ≥ ε}]

≤ − 1

log10

ε

∑k∈I

pk[

log10

qk

pk

]where I = {k|pk > 0}

<∑k∈I

pk[

qk

pk− 1

]=

∑k∈I

qk − pk ≤ 0. (5)

Note that the key inequality in the fourth line is induced bythe Gibbs inequality, since pk �= qk is valid at least for oneoption based on the assumption p �= q for $other . Therefore,the first case of proof is examined. This case can also be seenas a variant of the proper logarithmic scoring rule. Consideringthe personal beliefs and Cumulative labels as the probabilisticpredictions, maximizing the proper scoring rule correspondsto reporting the true set of probabilities [27], [34].

When N = G > 1, we can rewrite the expected pay-ment function as the following equation, since there is anassumption that the belief distributions of crowd workers areindependent over different questions [17], [35]:

G∏i=1

E[(

1 − logε qcii

)1{qci

i ≥ ε}]

. (6)

Because the payment function is nonnegative, this multiplica-tive function can be maximized when each independent expec-tation is maximized separately. Thus, the problem converts tothe first case when N = G = 1, which is already proved.

When N > G > 1, the general expected payment functionin (1) has two summations. The outer one shows the expec-tation of uniformly random distribution of G gold standard



questions among N questions, and the inner one correspondsto the expectation with respect to different combinationsof selected options. The second case N = G proves themaximization of every individual term in the inner summation.Therefore, the last case is also maximized if workers answerall the assigned questions. �

V. AGGREGATION MODEL

A. Proposed Aggregation Models

We start by introducing SCMV as a new discriminativeaggregation model, prove the convexity of its correspondingoptimization problem, and then derive an effective optimiza-tion algorithm for estimating the true labels and updating theparameters. In addition, we define a joint aggregation model byintegrating the SCMV with the generative model. We designan alternative learning strategy for the joint model to estimatethe true labels and the update parameters.

B. Notations

We denote X = {x11, . . . , xN M } as the input data for theaggregation models, where xi j = [x1

i j , . . . , x Bi j ]T indicates the

crowd label for the i th question obtained from the j th workeras a B-dimensional vector. The variables N and M show thenumber of questions and workers, respectively. Note that thesingle-option (deterministic) labels are represented by xk=m

i j =1 and xk �=m

i j = 0, where m = argmax(qi j ), and cumulative

(nondeterministic) labels are induced by xki j = qk

i j /∑

k′ qk′i j ,

which makes sure the labels are summed up to 1. The modelpredictions are also denoted by Y = [y1, . . . , yN ], where yi

indicates the estimated truth for the i th question with a B-dimensional vector.

C. Discriminative Aggregation Model

1) MV: MV is a basic discriminative aggregation model,averaging the crowd labels for each question to estimate thetruths. It actually solves the following problem:

minY

N∑i=1

||Xi 1 − yi ||22 (7)

where Xi = [xi1, . . . , xi M ] ∈ R(B×M) includes the crowd

labels for the i th question.2) Nonnegative Weighted MV: Unlike MV with equal reli-

ability parameters for crowd workers using 1 vector, SCMVemploys flexible weights for each crowd worker. It allocatesa nonnegative M-dimensional vector w as the model parame-ters to improve the discriminative ability of MV. It has thefollowing constrained optimization problem:

minw≥0,Y≥0,1Y=1

N∑i=1

||Xi w − yi ||22 + λw||w||22 (8)

where λw represents the hyperparameter for regularizing thereliability parameters. The SCMV optimization problem hassimplex constraints on the parameters and truths (under minoperation), which are both crucial to get the optimal solu-tion. The negative weights are only desirable for adversar-ial workers, whose labels tend to be the opposite of true

label for all questions. There are extremely rare cases ofadversarial workers in real crowdsourcing tasks especiallydue to the incentive payment functions. We also find thatthe nonnegativity constraint on the parameters is empiricallyuseful in our experiments and improves the results. Moreover,the constraints on truths prevent a trivial solution for ourproblem. Assuming w = 0 and yi = 0, the optimization prob-lem is inefficiently minimized without the truths constraints.In addition, the l1 constraint is helpful to have probabilisticinterpretations and sparse solution for the estimated truths.

3) Update w: Since the optimization problem has twounknown variables (w and yi ), we solve the problem usingan alternative approach, such that the weights are updatedwhile the truths are fixed, and the truths are inferred whilethe weights are assumed to be known. Thus, the optimizationproblem for learning the weights is

minw≥0

N∑i=1

||Xi w − yi ||22 + λw||w||22. (9)

Using the multiplicative algorithm strategy [36], we derivethe following equation to update the parameters. In orderto define the updating rule, we start with gradient descentand then tune a learning step η to keep the updated weightnonnegative. Denoting the objective of (9) without constraintby Lw and its gradient with respect to the parameters of eachworker by ∇w j = ∂Lw/∂w j , we have

w(t+1)j = w

(t)j − η

∂Lw

∂w j= w

(t)j − η([∇w j ]+ + [∇w j ]−)

= w(t)j − w

(t)j

[∇w j ]+([∇w j ]+ + [∇w j ]−)

= −w(t)j

[∇w j ]−[∇w j ]+

(10)

= w(t)j

N∑i=1

xTi j yi/

(N∑

i=1

xTi j xi j + λw

)

where [∇w j ]+ and [∇w j ]− are the positive and negative partsof the gradient. Using the trick in [36] for the nonnegativematrix updates, we replace the learning parameter η byw

(t)j /[∇w j ]+ in the third line and satisfy the nonnegative

constraint on w.4) Estimate Y: Considering the parameters as constant

variables, we solve the following problem to infer the truths:

minY≥0,1Y=1

N∑i=1

||Xi w − yi ||22. (11)

Since the problem in (11) is independent for each questioni ∈ [N], we can decompose the problem into N independentsubproblems as follows:

minY≥0,1Y=1 yTi yi − 2yT

i Xi w. (12)

In order to solve this problem, we can use the first-order methods, including the gradient descent, the subgradientdescent, and the Nesterovs optimal method [37], which onlyrequire the objective function value and its (sub) gradient ateach iteration. However, we derive an efficient optimization



algorithm for this problem using the Lagrangian multipliermethod. Our empirical results confirm the fast convergencerate of this algorithm, where a few updating steps are sufficientfor the convergence.

5) Optimization Algorithm: We rewrite the objective in (12)as

minyi≥0,1T yi=1 yTi y − 2yT

i di (13)

where di = Xi w is an arbitrary vector used for a simplernotation. The Lagrangian function for the problem (13) is

L(yi , λ,μi ) = yTi yi − 2yT

i di − λ(1T yi − 1) − μTi yi (14)

where μi and λ indicate the Lagrangian multipliers corre-sponding to the two constraints in the problem (13). Supposey∗

i , λ∗, and μ∗i as the optimal variables, the Karush–Kuhn–

Tucker (KKT) conditions [38] result in⎧⎪⎪⎪⎨⎪⎪⎪⎩

y∗i − 2di − λ∗1 − μ∗

i = 0

λ∗ ≥ 0

μ∗i ≥ 0

μ∗Ti y∗

i = 0.

For each k ∈ [B], the first equation in the above KKTconditions can be reformulated as y∗

ik = 2 dik + λ∗ + μ∗ik .

Based on the inequality μ∗Ti y∗

i = 0, we further have

y∗ik = (2dik + λ∗)+ (15)

where (.)+ = max(0, .). Thus, we can obtain the optimalvariable y∗

ik by known λ∗. Based on 1T yi = 1, we are able todefine this piecewise linear function

f (λ∗) =∑

k

(2dik + λ∗)+ − 1 = 0 (16)

which can be solved by calculating the root of the functionusing the Newton method.

6) Convexity Proof: We are able to prove the convexity ofthe optimization problem (8) for SCMV jointly with respectto w and yi . The objective in (8) is the summation ofλw||w||22 and fi = ||Xi w − yi ||22 for every question i ∈ [N].In problem (8), the inequality constraints, equality constraint,and regularization term are convex sets, an affine set, anda convex function, respectively. Because the summation ofconvex functions under convex inequality and affine equalityconstraints preserves convexity, the problem boils down toproving the convexity for fi . It can be shown that fi isjointly convex over the variables w and yi , because theHessian matrix ∇2 fi (w, yi ) = [XT

i ,−I]T [Xi ,−I] � 0 ispositive semidefinite. In other words, this objective functioncan be reformulated as fi = ‖[Xi ,−I][w, yi ]T ‖2

2. We thencan simply prove the convexity, because h = ‖‖2

2 andg = [Xi ,−I][w, yi ]T are the convex and linear functions;therefore, the composition h ◦ g is also convex with respectto [w, yi ]. Consequently, the problems (8), (9), (11), and (12)are all convex, and the optimality of the estimated truths andparameters is guaranteed.

Algorithm 2 SCMV Algorithm

1 Initialize Y by the majority voting method2 while not converged do3 For worker j :

4 w(t+1)j = w

(t)j

N∑i=1

xTi j yi/

( N∑i=1

xTi j xi j + λw

)5 For question i : solve Problem (12) to estimate yi

6 end

7) Time Complexity: Algorithm 2 represents the updatingsteps for the SCMV algorithm, where we first initialize thetruths using MV and then alternatively update the reliabilityparameters and truths until convergence. Since the third steptakes O(N M B2) and the fourth step needs O(N M B + N B2)operations in each iteration, the computation cost for eachiteration is equivalent of O(N M B2).

D. Generative-Discriminative Joint Aggregation Model

Recent studies have shown that combining generative anddiscriminative models can enhance the performance of modelsin different domains [39], [40]. However, optimizing thegenerative and discriminative models in a nonjoint mannerleads to suboptimal aggregation results. To achieve optimalresults, we employ a joint learning framework for updatingthe parameters of the both submodels in our generative-discriminative aggregation model.

1) DS Model: We combine the discriminate SCMV modelwith a variant of the Dawid and Skene DS model to gainfrom the flexibility of generative models. Briefly explainingthe standard DS model [21], it allocates a B × B-dimensionalconfusion matrix ν j as a reliability parameter for each worker.While the diagonal element νcc

j represents the likelihood thatthe j th worker correctly predicts the cth class questions,the off-diagonal one νck

j indicates the chance that the workeranswers questions with true label c as k. Thus, a workerwith higher diagonal elements in the confusion matrix is morereliable than the one with higher off-diagonal elements. Thefollowing equation shows the DS likelihood function:

p(X|Y, ν) =N∏

i=1

M∏j=1

B∏c=1

B∏k=1

(νck

j

)xki j yc

i (17)

where the model parameters are subject to∑

k νckj = 1.

2) Coupling Discriminative and Generative Models: Weuse a probabilistic framework to combine of our generative anddiscriminative submodels and employ the posterior distributionto determine the objective function for our joint model. In par-ticular, we maximize the logarithm of the posterior distributionto learn the unknown variables in our joint model

ln P(ν, w|X, Y) = ln

(∏i

P(w|Xi , yi )P(ν|Xi , yi )

)

∝∑

i

ln P(yi |Xi , w) + ln P(w)︸︷︷︸Discriminative Model

+ ln P(Xi |yi , ν) + ln P(ν)︸︷︷︸Gnerative Model

. (18)



Let us assume the discriminative model and its prior haveGaussian distribution as P(yi |Xi , w) = N (yi |Xi w, β−1I),and P(w) = N (w|0, α−1I), where α and β are the pre-cision of the distributions. The objective function of thisdiscriminative model can be reformulated like the SCMV loss,ln(

∏i P(w|Xi , yi )) = −β/2

∑i ‖Xi w − yi‖2

2 − α/2wT w. Forthe prior distribution of ν, we assume Dirichlet distributionPν

0 = Dir(μ) and use the Kullback–Leibler divergenceKL(Pν‖Pν

0 ) for measuring the divergence between the con-fusion matrix and its prior.

The objective in (18) assigns the same weights for the dis-criminative and generative submodels. However, we can adjusttheir effects using the weights γD and γG in P(ν, w|X, Y) =∏

i P(w|Xi , yi )γD P(ν|Xi , yi )

γG . The following equation indi-cates the reformulated objective based on the aforementionedequation:

minw≥0,Y≥0,1Y=1,ν j

KL(Pν‖Pν0 ) −

∑i j ck

xki j yc

i log(νck

j

)+ γ

∑i

‖Xi w − yi‖22 + λw‖w‖2

2 (19)

where γ = (γDβ)/γG and λw = γDα. In order to solvethis problem, we employ an alternative learning strategy tooptimize the generative and discriminative objectives jointly.In this approach, the truths (Y) are estimated using theweighted contribution of both submodels, and the parameters(w, ν) are updated using the inferred truths.

3) Estimate Y: Assuming the parameters w and ν are fixed,we estimate the truths using

minY≥0,1Y=1

−∑i j ck

xki j yc

i log(νck

j

) + γ∑

i

‖Xi w − yi‖22. (20)

We further simplify problem (20) by converting into Nindependent subproblems

minyi≥0,1T yi=1

yTi yi − 2yT

i

⎛⎝Xi w + 1

γ

∑j k

xki j log

(νk

j

)⎞⎠ . (21)

We are able to solve efficiently this problem using ourderived optimization algorithm for the problem (12).

4) Update w: The problem for optimizing the parametersof discriminative submodel is equivalent to the problem (9).

5) Update ν: We update the parameter of the generativesubmodel by solving the following objective:

minν j

KL(Pν‖Pν

0

) −∑i j ck

xki j yc

i log(νck

j

). (22)

While the general closed-form solution for this problem isPν ∝ Pν

0 P(X|Y, ν), we consider the Dirichlet distributionwith hyperparameter μ as the prior Pν

0 = Dir(μ), andtherefore update the parameters by Pν j = Dir(μ+∑

i xki j yc

i ).6) Time Complexity: Algorithm 3 outlines the alternative

learning framework for the joint DS-SCMV model. The timecomplexity of the algorithm in Steps 3–5 and also the totalcost are similarly O(N M B2) in each iteration.

Algorithm 3 DS-SCMV Algorithm

1 Initialize Y by the majority voting method2 while not converged do3 For worker j :

4 w(t+1)j = w

(t)j

N∑i=1

xTi j yi/

( N∑i=1

xTi j xi j + λw

γ

)5 Pν j = Dir(μ + ∑

i xki j yc

i )

6 For question i : solve Problem (20) to estimate yi

7 end

TABLE I

DATA SET DESCRIPTIONS

VI. EXPERIMENTAL RESULTS

In Section VI, we first compare the proposed models withthe state-of-the-art aggregation models on several benchmarkdata sets with single-option data. Then, the Cumulative datacollection mechanism is examined using the statistics of thecollected crowd labels. Finally, we evaluate our unified crowd-sourcing framework compared with alternative approaches.

A. Evaluation of Aggregation Models

Here, we assess our aggregation models using severalsingle-option crowdsourcing data sets. The single-optioncrowd labels can be considered as the special case of Cumu-lative crowd labels, where crowd workers are only allowed toselect a single option for each question. Comparing SCMVand DS-SCMV with the alternative aggregation models onmultiple benchmark data sets is a challenging evaluation forour models, since they are primarily developed to work withthe nondeterministic (i.e., multi-option) Cumulative labels.

1) Data Sets: We use five crowdsourcing data sets inthis experiment, which are summarized in the first part ofTable I. Web Search [41]: 2665 query-URLs are labeled with177 workers for relevance rating scale from 1 to 5. Age [42]:1002 face images are labeled with 165 workers to estimate theage of a person. RTE [43]: 800 sentence pairs are labeled with164 workers to check if the second sentence can be inferredfrom the first one. Temp [43]: 462 pairs of verb events arelabeled with 76 workers to check if the event described in thefirst verb occurs before or after the second one. Flowers [44]:200 flowers picture are labeled with 36 workers to check ifthe flower is peach or not.



TABLE II

ERROR RATES (%) OF DISCRIMINATIVE (D), GENERATIVE (G), AND GENERATIVE-DISCRIMINATIVE (G-D) AGGREGATION MODELS ON SINGLE-OPTIONCROWDSOURCING DATA SETS. THE FIRST AND SECOND BEST RESULTS ARE SHOWN IN BOLD AND BRACKET, RESPECTIVELY. THE AVERAGE

ERROR RATES ARE CALCULATED BASED ON THE TOTAL NUMBER OF QUESTIONS IN ALL DATA SETS

2) Alternative Models: We compare our SCMV and DS-SCMV models with several baseline methods, including MV,IWMV [29], max-margin majority voting (M3V ) [30], Dawidand Skene (DS) [21], DS with Dirichlet prior (DS+Prior),multi-class minmax entropy (Entropy (M)) [15], ordinal min-max entropy (Entropy (O)) [15], crowdSVM, and GibbscrowdSVM (G-CrowdSVM) [30]. Since Entropy (O) justworks with the ordinal labels, we only compare it with thecategorical data sets. G-CrowdSVM also needs Gibbs randomsampling; thus, we report the average and standard deviationof its results based on five runs.

3) Hyperparameter Selection: We set the hyperparametersfor our models in an unsupervised manner, since there isno available true labels in training of the crowdsourcingaggregation models. Consequently, we choose the confusionmatrix prior as μ = 10 B2 M/

∑i j k X for all data sets,

proportional to the ratio of the number of parameters tothe number of labels. We also follow [30] in adopting thelikelihood P(X|Y, ν) as the criterion in selecting the unrelatedhyperparameters λw and γ . In particular, we choose them fromthe sets γ set = {0.01, 0.1, 1} and λset

w = {0.01, 0.1, 1, 10}, in away that the selected hyperparameters result in the maximumlikelihood p(X|Y, ν). For SCMV, we fix the truth Y aftertraining the model and then learn the confusion matrix in oneshot to calculate the likelihood. Using this approach, we makesure to select the hyperparameters without any knowledgefrom the true labels.

4) Performance Comparison: Table II indicates the resultsof the mentioned aggregation models in three groups of dis-criminative, generative, and generative-discriminative models.First, SCMV consistently improves the MV results with signif-icant margins on the all data sets and provides better results inaverage than the alternative discriminative models. Especially,it shows very good performance on the largest data set (WebSearch), even comparing with the more complicated modelswith higher number of parameters. Furthermore, DS-SCMVachieves superior or competitive (always the first or secondbest) results compared with the state-of-the-art methods, andhas the lowest error rate in average. It consistently outperformsits generative submodel (DS+Prior) due to the combinationwith the discriminative SCMV submodel. Our joint DS-SCMV

model has also 1.51% lower error rate on average than itsnaive nonjoint variant, where DS and SCMV are learnedseparately. It is also worth mentioning that DS-SCMV usesfewer parameters in comparison with the competitive modelsincluding the variants of Entropy and CrowdSVM models.

5) Evaluation of Running Speed: We also analyze theconvergence speed of the aggregation models on differentdata sets. We either use the released code or implement thealternative algorithms and run their MATLAB code on amachine with Intel Core i7-4790 CPU at 3.60-GHz and 16-GBmemory five times. The reported results are the average ofthese five running times. We use the selected hyperparame-ters from the previous experiment and run the models until|Y(t) − Y(t−1)|/N ≤ 10−3 or the maximum iteration 100.Fig. 3 illustrates the error rates of the models on the WebSearch data set based on the logarithm of the time scale(seconds) throughout training. As it is shown, both SCMVand DS-SCMV have significantly faster convergence rate thanthe other models. In addition, comparing with the state-of-the-art models on all data sets, DS-SCMV is 10.38, 27.12,and 58.24 times faster than CrowdSVM, Entropy (M), andG-crowdSVM in average. Therefore, we think that DS-SCMVis practically a better candidate than its best alternative modelG-crowdSVM, because it has much faster convergence rate dueto the fast and efficient optimization algorithm of DS-SCMVand the computationally expensive Gibbs sampling and matrixinverse calculations of G-CrowdSVM.

B. Evaluation of Data Collection Mechanisms

1) Collected Data Sets: To explore the effectiveness of ourCumulative data collection mechanism, we create two crowd-sourcing tasks on Amazon Mechanical Turk1. First, crowdworkers are asked to mark the continent of the countries,whose flags are shown in the questions. Second, using theStanford Dog data set [45], workers are asked to categorizefour breeds of Terrier dogs, including Kerry blue, Scotch,Australian, and Yorkshire. Using four data collection mech-anisms, including single selection, skip-based single selection[18], approval voting [17], and Cumulative voting strategies,

1https://www.mturk.com/



Fig. 3. Running speed of aggregation models on the Web Search data set.

we collect four data sets (Single, Skip, Approval, and Cumula-tive) for each task. Specifically, we only hired crowd workerswith more than 85% HIT approval rate and 1000 approvedHITs, which are located in United States.

Following [17] and [18], we have similar questions forthe data collection mechanisms, 120 for Flag data sets and256 for Dog data sets. We also make sure that each workeris randomly assigned only to one data collection mechanism.The second half of Table I shows the general statistics ofthese data sets. In addition to these data sets, we also generatesynthetic approval voting data sets based on Cumulative data,denoted by C-Approval for the Dog and Flag tasks, whichinclude the selected options of Cumulative data and discardthe reported confidence values.

Since zero payment is not possible in the Amazon Mechan-ical Turk platform, we assign similar fixed payment to all thedata collection mechanisms. Then, the bonus for Single, Skip,Approval, and Cumulative data sets are calculated using incen-tive multiplicative [18], incentive skip-multiplicative [18],incentive approval [17], and our incentive payment func-tions, respectively. Hence, we examine the efficiency of ourdata collection mechanism (Cumulative interface and paymentfunction) in comparison with three alternative approachesincluding Single [18], Skip [18], and Approval [17] using goldstandard questions.

We do not expect crowd workers understand the proposedincentive mechanism in Section IV. Hence, in addition to thepayment function, we provide the following worker-friendlyexplanation along with a few examples to clarify the propertiesof our payment function. There are four questions whoseanswers are known to us, and your bonus will be calculatedbased on them.

1) Your maximum bonus is one Dollar.2) The bonus is payed based on the reported confidence for

the true options.3) The bonus is zero, if a worker assigns zero confidence

to the true option of these questions.

2) Quality Comparison: Following [17], we compare theaforementioned data collection mechanisms using three sta-tistical factors, which are shown in Fig. 4. The first ratioillustrates the fraction of wrong labels to attempted questions[see Fig. 4(a)], the second ratio shows the fraction of wronglabels when only one option is selected [see Fig. 4(b)], and the

Fig. 4. Statistics of the collected crowd labels using four data collectionmechanisms, including Single [18], Skip [18], Approval [17], and Cumulativemechanisms. (a) Ratio1: fraction of wrong labels among the attemptedquestions. (b) Ratio2: fraction of wrong labels when only one option isselected. (c) Average bonus per worker (cents).

third factor indicates the average bonus per worker across datasets [see Fig. 4(c)]. As it is shown, Cumulative crowd labelshave fewer wrong labels according to both the ratios than theSingle, Skip, and Approval mechanisms. Although the firstand second ratios seem similar for each mechanism, the secondratio is slightly lower than the first one in the Approval andCumulative data sets. Hence, the Cumulative mechanism notonly collects crowd labels with higher quality using its flexibleinterface but also does not certainly increase the data collec-tion cost according to the payments in our experiment [seeFig. 4(c)]. We think the reason behind more accurate predic-tions of the workers on the Cumulative mechanism is that theyare more confident about their single-option answers due to thepotential of zero payment, and select multiple options in thecase of uncertainty. It is also worth mentioning that the averagetime for collecting data from crowd workers for the Cumula-tive mechanism is only 9% and 8% higher than the Single andSkip mechanisms and similar to the Approval mechanism.

C. Evaluation of the Proposed Framework

In the previous experiments, the lower error rates of SCMVand DS-SCMV on the single-option crowdsourcing data setsdemonstrate their effectiveness on the aggregation task, andthe fewer wrong labels in the Cumulative data sets indicatethe benefit of the Cumulative mechanism on the data collectiontask. Here, we evaluate our unified crowdsourcing framework,which is applying the proposed aggregation models (SCMVand DS-SCMV) on the Cumulative (Flag and Dog) data sets.

1) Alternative Approaches: For the sake of comparison,four crowdsourcing aggregation models along SCMV and DS-SCMV are applied to Single, Skip, Approval, C-Approval,and Cumulative data sets on both the Flag and Dog tasks.To extend the other aggregation models working with thenondeterministic (i.e., multi-option) crowd labels, we considereach selected option as the independent single-option data.We then use these synthetic labels for estimating the truthsand updating the parameters by the weight corresponding totheir confidence. The extended models include soft MV, softDS, soft DS+Prior, and soft Entropy (M).

It should be noted that we do not use the true labels ofgold standard questions in the aggregation task to a havefair comparison. In addition, there is a concern that some



TABLE III

ERROR RATES (%) OF AGGREGATION MODELS ON SINGLE-OPTION AND MULTI-OPTION CROWDSOURCING DATA SETS. THE FIRST AND SECOND BESTRESULTS ARE SHOWN IN BOLD AND BRACKET, RESPECTIVELY. THE AVERAGE ERROR RATES ARE CALCULATED BASED ON THE TOTAL NUMBER

OF QUESTIONS IN ALL DATA SETS

overconfident and underconfident workers may report theirCumulative labels with different ranges. There are some nor-malization preprocessing techniques to balance the confidencelevels of crowd workers by the average of their reported labels.However, we do not use them in our framework, since ouraggregation models penalize the incorrect high-confidence andcorrect low-confidence crowed labels. In addition, there is achance that some workers label with the low/high confidencelevel due to their poor/relevant expertise in the assigned task.

2) Performance Comparison: Table III represents error ratesfor different combinations of the aggregation models and datacollection mechanisms. DS-SCMV achieves the lowest errorrates on Flag (14.17%) and Dog (7.81%) data sets, when itis applied to the Cumulative data set. Therefore, our unifiedcrowdsourcing framework provides the best combination forthe data collection mechanism and aggregation model in thisexperiment. Moreover, the results generally show the superior-ity of the flexible (Approval and Cumulative) data collectionmechanisms over the single-option (Single and Skip) ones.In particular, the average error rates (based on the total numberof questions) for Single, Skip, Approval, C-Approval, andCumulative data sets are 20.57%, 18.79%, 12.82%, 12.50%,and 12.10%, respectively. Interestingly, the Cumulative datacollection mechanism not only provides high-quality datawith the lowest error rate on average but also induces betterapproval voting crowd data when comparing the C-Approvaland Approval data sets. This may imply that asking crowdworkers to report their confidence makes them more carefulin selecting their answers. On the other hand, DS-SCMVconsistently provides the best performance across differentdata sets and confirms its efficiency independent of the specialdata collection mechanism.

VII. CONCLUSION

In this article, we proposed a new crowdsourcing frameworkto address several challenging issues comprehensively, includ-ing nonflexible data collection interfaces, nonincentive pay-ment functions, incompatible aggregation models with multi-option crowd labels, and inexpert crowd workers. We tookadvantages of the cumulative voting system to provide aflexible data collection mechanism, introduced an incentivepayment function regarding the setting of our data collec-tion mechanism, and proposed compatible aggregation modelswith our nondeterministic crowd labels. The statistics of ourcollected data and its error rates independent of the aggre-gation models confirmed the efficiency of the proposed data

collection mechanism and the incentive-compatible paymentfunction. Moreover, the proposed joint aggregation modelachieved superior or on-par results compared with the state-of-the-art models on the deterministic (i.e., single-option) andnondeterministic (i.e., multi-option) data sets. As the futureworks, we aim to investigate the quality of our data collectionmechanism on a super large data set, analyze the crowdworkers opinions about data collection by the qualitative andquantitative metrics, and examine the efficiency of the hybridand online crowd-machine learning models.

REFERENCES

[1] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet:A large-scale hierarchical image database,” in Proc. IEEE Conf. Comput.Vis. Pattern Recognit., Jun. 2009, pp. 248–255.

[2] G. Demartini, “Hybrid human-machine information systems: Challengesand opportunities,” Comput. Netw., vol. 90, pp. 5–13, Oct. 2015.

[3] K. G. Dizaji and H. Huang, “Sentiment analysis via deep hybrid textual-crowd learning model,” in Proc. 32nd AAAI Conf. Artif. Intell., 2018,pp. 1563–1570.

[4] O. Nov, “What motivates wikipedians?” Commun. ACM, vol. 50, no. 11,pp. 60–64, 2007.

[5] A. R. Stemler, “The JOBS Act and crowdfunding: Harnessing thepower and money of the masses,” Bus. Horizons, vol. 56, pp. 271–275,Jun. 2013.

[6] B. L. Bayus, “Crowdsourcing new product ideas over time: An analysisof the Dell IdeaStorm community,” Manage. Sci., vol. 59, no. 1,pp. 226–244, 2013.

[7] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classificationwith deep convolutional neural networks,” in Proc. Adv. Neural Inf.Process. Syst., 2012, pp. 1097–1105.

[8] R. Collobert and J. Weston, “A unified architecture for natural languageprocessing: Deep neural networks with multitask learning,” in Proc. 25thInt. Conf. Mach. Learn., 2008, pp. 160–167.

[9] G. Hinton et al., “Deep neural networks for acoustic modeling in speechrecognition: The shared views of four research groups,” IEEE SignalProcess. Mag., vol. 29, no. 6, pp. 82–97, Nov. 2012.

[10] K. G. Dizaji, X. Wang, and H. Huang, “Semi-supervised gener-ative adversarial network for gene expression inference,” in Proc.24th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2018,pp. 1435–1444.

[11] X. Wang, K. G. Dizaji, and H. Huang, “Conditional generative adver-sarial network for gene expression inference,” Bioinformatics, vol. 34,no. 17, pp. i603–i611, 2018.

[12] K. Ghasedi, X. Wang, C. Deng, and H. Huang, “Balanced self-pacedlearning for generative adversarial clustering network,” in Proc. IEEEConf. Comput. Vis. Pattern Recognit., Jun. 2019, pp. 4391–4400.

[13] J. Whitehill, T.-F. Wu, J. Bergsma, J. R. Movellan, and P. L. Ruvolo,“Whose vote should count more: Optimal integration of labels fromlabelers of unknown expertise,” in Proc. Adv. Neural Inf. Process. Syst.,2009, pp. 2035–2043.

[14] V. C. Raykar et al., “Learning from crowds,” J. Mach. Learn. Res.,vol. 11, pp. 1297–1322, Apr. 2010.

[15] D. Zhou, Q. Liu, J. Platt, and C. Meek, “Aggregating ordinal labels fromcrowds by minimax conditional entropy,” in Proc. 31st Int. Conf. Mach.Learn., 2014, pp. 262–270.

[16] K. G. Dizaji, Y. Yang, and H. Huang, “Joint generative-discriminativeaggregation model for multi-option crowd labels,” in Proc. 11th ACMInt. Conf. Web Search Data Mining, 2018, pp. 144–152.



[17] N. Shah, D. Zhou, and Y. Peres, “Approval voting and incentives incrowdsourcing,” in Proc. 32nd Int. Conf. Mach. Learn., 2015, pp. 10–19.

[18] N. B. Shah and D. Zhou, “Double or nothing: Multiplicative incentivemechanisms for crowdsourcing,” in Proc. Adv. Neural Inf. Process. Syst.,2015, pp. 1–9.

[19] N. Miller, P. Resnick, and R. Zeckhauser, “Eliciting informative feed-back: The peer-prediction method,” Manage. Sci., vol. 51, no. 9,pp. 1359–1373, 2005.

[20] J. Le, A. Edmonds, V. Hester, and L. Biewald, “Ensuring qualityin crowdsourced search relevance evaluation: The effects of trainingquestion distribution,” in Proc. SIGIR Workshop Crowdsourcing SearchEval., 2010, pp. 21–26.

[21] A. P. Dawid and A. M. Skene, “Maximum likelihood estimation ofobserver error-rates using the em algorithm,” Appl. Statist., vol. 28,pp. 20–28, Mar. 1979.

[22] A. Sheshadri and M. Lease, “Square: A benchmark for research oncomputing crowd consensus,” in Proc. 1st AAAI Conf. Hum. Comput.Crowdsourcing, 2013, pp. 156–164.

[23] J. Zou, R. Meir, and D. Parkes, “Approval voting behavior in doodlepolls,” in Proc. 5th Workshop Comput. Social Choice, 2014.

[24] S. Bhagat and J. A. Brickley, “Cumulative voting: The value of minorityshareholder voting rights,” JL Econ., vol. 27, p. 339, Oct. 1984.

[25] S. Oyama, Y. Baba, Y. Sakurai, and H. Kashima, “Accurate integrationof crowdsourced labels using workers’ self-reported confidence scores,”in Proc. IJCAI, 2013, pp. 2554–2560.

[26] G. Kazai, “In search of quality in crowdsourcing for search engineevaluation,” in Advances in Information Retrieval. Berlin, Germany:Springer, 2011, pp. 165–176.

[27] E. Kamar and E. Horvitz, “Incentives for truthful reporting in crowd-sourcing,” in Proc. Int. Conf. Auto. Agents Multiagent Syst., vol. 3, 2012,pp. 1329–1330.

[28] V. Kamble, N. Shah, D. Marn, A. Parekh, and K. Ramachandran,“Truth serums for massively crowdsourced evaluation tasks,” 2015,arXiv:1507.07045. [Online]. Available: https://arxiv.org/abs/arXiv:1507.07045

[29] H. Li and B. Yu, “Error rate bounds and iterative weighted majorityvoting for crowdsourcing,” 2014, arXiv:1411.4086. [Online]. Available:https://arxiv.org/abs/arXiv:1411.4086

[30] T. Tian and J. Zhu, “Max-margin majority voting for learning fromcrowds,” in Proc. Adv. Neural Inf. Process. Syst., 2015, pp. 1612–1620.

[31] X. Chen, Q. Lin, and D. Zhou, “Optimistic knowledge gradient policy foroptimal budget allocation in crowdsourcing,” in Proc. Int. Conf. Mach.Learn., 2013, pp. 64–72.

[32] D. Zhou, Q. Liu, J. C. Platt, C. Meek, and N. B. Shah, “Regularized min-imax conditional entropy for crowdsourcing,” 2015, arXiv:1503.07240.[Online]. Available: https://arxiv.org/abs/arXiv:1503.07240

[33] S. Vasiljev, Cardinal Voting: The Way to Escape the Social ChoiceImpossibility, document SSRN 1116545, 2008.

[34] T. Gneiting and A. E. Raftery, “Strictly proper scoring rules, prediction,and estimation,” J. Amer. Stat. Assoc., vol. 102, no. 477, pp. 359–378,2007.

[35] J. D. Gibbons, I. Olkin, and M. Sobel, “A subset selection technique forscoring items on a multiple choice test,” Psychometrika, vol. 44, no. 3,pp. 259–270, 1979.

[36] D. D. Lee and H. S. Seung, “Algorithms for non-negative matrix fac-torization,” in Proc. Adv. Neural Inf. Process. Syst., 2001, pp. 556–562.

[37] Y. Nesterov, Introductory Lectures on Convex Optimization: A BasicCourse, vol. 87. Cham, Switzerland: Springer, 2013.

[38] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge, U.K.:Cambridge Univ. Press, 2004.

[39] R. Raina, Y. Shen, A. Y. Ng, and A. McCallum, “Classification withhybrid generative/discriminative models,” in Proc. NIPS, vol. 16, 2003,pp. 545–552.

[40] J. Zhao, M. Mathieu, R. Goroshin, and Y. Lecun, “Stacked what-whereauto-encoders,” 2015, arXiv:1506.02351. [Online]. Available: https://arxiv.org/abs/arXiv:1506.02351

[41] D. Zhou, S. Basu, Y. Mao, and J. C. Platt, “Learning from the wisdomof crowds by minimax entropy,” in Proc. Adv. Neural Inf. Process. Syst.,2012, pp. 2195–2203.

[42] H. Han, C. Otto, X. Liu, and A. K. Jain, “Demographic estimation fromface images: Human vs. machine performance,” IEEE Trans. PatternAnal. Mach. Intell., vol. 37, no. 6, pp. 1148–1161, Jun. 2015.

[43] R. Snow, B. O’Connor, D. Jurafsky, and A. Y. Ng, “Cheap and fast—Butis it good?: Evaluating non-expert annotations for natural languagetasks,” in Proc. Conf. Empirical Methods Natural Lang. Process., 2008,pp. 254–263.

[44] T. Tian and J. Zhu, “Uncovering the latent structures of crowd label-ing,” in Advances in Knowledge Discovery and Data Mining. Cham,Switzerland: Springer, 2015, pp. 392–404.

[45] A. Khosla, N. Jayadevaprakash, B. Yao, and F.-F. Li, “Novel datasetfor fine-grained image categorization: Stanford dogs,” in Proc. CVPRWorkshop Fine-Grained Vis. Categorization, 2011.

Kamran Ghasedi Dizaji received the B.Sc. degreein electrical engineering from Qazvin University,Qazvin, Iran, in 2009, and the M.S. degree in bio-medical engineering (bioelectric) from the Amirk-abir University of Technology, Tehran, Iran, in 2012.He is currently pursuing the Ph.D. degree in elec-trical engineering with the University of Pittsburgh,Pittsburgh, PA, USA.

He is also a member of the Data Science Labo-ratory, University of Pittsburgh, where he is also aResearch Assistant. His primary research interests

include machine learning, big data mining, deep learning, and computervision.

Hongchang Gao received the B.S. degree fromthe Ocean University of China, Qingdao, China,in 2011, and the M.S. degree from Beihang Uni-versity, Beijing, China, in 2014. He is currentlypursuing the Ph.D. degree with the University ofPittsburgh, Pittsburgh, PA, USA, under the super-vision of Dr. H. Huang.

His research interests include machine learning,data mining, and computer vision.

Yanhua Yang received the B.E., M.S., and Ph.D.degrees in signal and information processing fromXidian University, Xi’an, China, in 2004, 2007, and2017, respectively.

She is currently a Lecturer with the School ofComputer Science and Technology, Xidian Univer-sity. Her main research interests include complexaction recognition and event detection.

Heng Huang received the B.S. and M.S. degreesfrom Shanghai Jiao Tong University, Shanghai,China, in 1997 and 2001, respectively, and the Ph.D.degree in computer science from Dartmouth College,Hanover, NH, USA, in 2006.

He is a John A. Jurenko Endowed Professor ofcomputer engineering with the Electrical and Com-puter Engineering Department, University of Pitts-burgh, Pittsburgh, PA, USA. He is also a ConsultingResearcher with JD Finance American Corporation,Mountain View, CA, USA. His research interests

include machine learning, big data mining, bioinformatics, and neuroinfor-matics.

Cheng Deng (S’09) received the B.E., M.S., andPh.D. degrees in signal and information processingfrom Xidian University, Xi’an, China, in 2001, 2005,and 2009, respectively.

He is currently a Full Professor with the Schoolof Electronic Engineering, Xidian University. He isalso a author and a coauthor of more than 50 sci-entific articles at top venues, including the IEEETRANSACTIONS ON NEURAL NETWORKS AND

LEARNING SYSTEMS, the IEEE TRANSACTIONS

ON MULTIMEDIA, the IEEE TRANSACTIONS ONCYBERNETICS, the IEEE TRANSACTIONS ON SYSTEMS, MAN, AND

CYBERNETICS, the IEEE TRANSACTIONS ON IMAGE PROCESSING,the International Conference on Computer Vision (ICCV), the Conferenceon Computer Vision and Pattern Recognition (CVPR), the International JointConferences on Artificial Intelligence (IJCAI), and the Association for theAdvancement of Artificial Intelligence (AAAI). His research interests includecomputer vision, multimedia processing and analysis, and information hiding.


0 ! . 1. 2 3 0 4 5 6! 7 , 8 9 / , . 7 : 4 ; < - xidiansee.xidian.edu.cn/faculty/chdeng/welcome to...

Documents