article in press · article in press jid: compnw [m3gdc;july 22, 2015;16:6]...

17
ARTICLE IN PRESS JID: COMPNW [m3Gdc;July 22, 2015;16:6] Computer Networks xxx (2015) xxx–xxx Contents lists available at ScienceDirect Computer Networks journal homepage: www.elsevier.com/locate/comnet On the impact of knowledge extraction and aggregation on crowdsourced annotation of visual artworks Jasper Oosterman a,, Jie Yang a , Alessandro Bozzon a , Lora Aroyo b , Q1 Geert-Jan Houben a a Delft University of Technology, P.O. Box 5, 2600 AA, Delft, The Netherlands b VU University, De Boelelaan 1081a, 1081 HV, Amsterdam, The Netherlands article info Article history: Received 31 October 2014 Revised 14 June 2015 Accepted 3 July 2015 Available online xxx Keywords: Crowdsourcing Cultural heritage Artwork annotation abstract Cultural heritage institutions more and more provide online access to their collections. Collec- tions containing visual artworks need detailed and thorough annotations of the represented visual objects (e.g. plants or animals) to enable human access and retrieval. To make these suitable for access and retrieval, visual artworks need detailed and thorough annotations of the visual classes. Crowdsourcing has proven a viable tool to cater for the pitfalls of automatic annotation techniques. However, differently from traditional photographic image annotation, the artwork annotation task requires workers to possess the knowledge and skills needed to identify and recognise the occurrences of visual classes. The extent to which crowdsourcing can be effectively applied for artwork annotation is still an open research question. Based on a real-life case study from Rijksmuseum Amsterdam, this paper investigates the performance of a crowd of workers drawn from the CrowdFlower platform. Our contributions include a detailed analysis of crowd annotations based on two annotation configurations and a com- parison of these crowd annotations with the ones from trusted annotators. In this study we apply a novel method for the automatic aggregation of local (i.e. bounding box) annotations, and we study how different knowledge extraction and aggregation configurations affect the identification and recognition aspects of artwork annotation. Our work sheds new light on the process of crowdsourcing artwork annotations, and shows how techniques that are effective for photographic image annotation cannot be straightforwardly applied to artwork annota- tion, thus paving the way for new research in the area. © 2015 Published by Elsevier B.V. 1. Introduction 1 Visual artwork 1 annotation recently emerged as an im- 2 portant multidisciplinary discipline, fuelled by the growing 3 needs of cultural heritage institutions. Galleries, Libraries, 4 Corresponding author. Tel.: +31152786346. E-mail address: [email protected], [email protected] (J. Oosterman). 1 A visual artwork is an artistic expression represented on a flat surface (e.g., canvas or sheet of paper) in the form of a painting, printing or drawing [1]. Archives, and Museums (GLAMs) have the mission of ensur- 5 ing that the art produced by mankind is properly preserved, 6 described, catalogued, and made accessible to the public. To 7 unlock the value of their artwork collections, GLAMs must 8 enable and facilitate browsing and retrieval for a broad yet 9 unforeseen variety of users, having an unknown variety of 10 needs. To this end, textual annotations are used to describe 11 the instances of classes of visual objects, e.g. objects, plants, 12 animals and human body parts, represented in the artworks. 13 Image annotation is a notoriously hard problem for comput- 14 ers to solve but, thanks to recent progress in computer vi- 15 sion techniques [2,3], it is now possible to correctly identify 16 http://dx.doi.org/10.1016/j.comnet.2015.07.008 1389-1286/© 2015 Published by Elsevier B.V. Please cite this article as: J. Oosterman et al., On the impact of knowledge extraction and aggregation on crowdsourced anno- tation of visual artworks, Computer Networks (2015), http://dx.doi.org/10.1016/j.comnet.2015.07.008

Upload: others

Post on 08-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ARTICLE IN PRESS · ARTICLE IN PRESS JID: COMPNW [m3Gdc;July 22, 2015;16:6] ComputerNetworksxxx(2015)xxx–xxx Contents lists available at ScienceDirect ComputerNetworks

ARTICLE IN PRESSJID: COMPNW [m3Gdc;July 22, 2015;16:6]

Computer Networks xxx (2015) xxx–xxx

Contents lists available at ScienceDirect

Computer Networks

journal homepage: www.elsevier.com/locate/comnet

On the impact of knowledge extraction and aggregation on

crowdsourced annotation of visual artworks

Jasper Oosterman a,∗, Jie Yang a, Alessandro Bozzon a, Lora Aroyo b,Q1

Geert-Jan Houben a

a Delft University of Technology, P.O. Box 5, 2600 AA, Delft, The Netherlandsb VU University, De Boelelaan 1081a, 1081 HV, Amsterdam, The Netherlands

a r t i c l e i n f o

Article history:

Received 31 October 2014

Revised 14 June 2015

Accepted 3 July 2015

Available online xxx

Keywords:

Crowdsourcing

Cultural heritage

Artwork annotation

a b s t r a c t

Cultural heritage institutions more and more provide online access to their collections. Collec-

tions containing visual artworks need detailed and thorough annotations of the represented

visual objects (e.g. plants or animals) to enable human access and retrieval. To make these

suitable for access and retrieval, visual artworks need detailed and thorough annotations of

the visual classes. Crowdsourcing has proven a viable tool to cater for the pitfalls of automatic

annotation techniques. However, differently from traditional photographic image annotation,

the artwork annotation task requires workers to possess the knowledge and skills needed to

identify and recognise the occurrences of visual classes. The extent to which crowdsourcing

can be effectively applied for artwork annotation is still an open research question. Based on

a real-life case study from Rijksmuseum Amsterdam, this paper investigates the performance

of a crowd of workers drawn from the CrowdFlower platform. Our contributions include a

detailed analysis of crowd annotations based on two annotation configurations and a com-

parison of these crowd annotations with the ones from trusted annotators. In this study we

apply a novel method for the automatic aggregation of local (i.e. bounding box) annotations,

and we study how different knowledge extraction and aggregation configurations affect the

identification and recognition aspects of artwork annotation. Our work sheds new light on the

process of crowdsourcing artwork annotations, and shows how techniques that are effective

for photographic image annotation cannot be straightforwardly applied to artwork annota-

tion, thus paving the way for new research in the area.

© 2015 Published by Elsevier B.V.

1. Introduction1

Visual artwork1 annotation recently emerged as an im-2

portant multidisciplinary discipline, fuelled by the growing3

needs of cultural heritage institutions. Galleries, Libraries,4

∗ Corresponding author. Tel.: +31152786346.

E-mail address: [email protected],

[email protected] (J. Oosterman).1 A visual artwork is an artistic expression represented on a flat surface

(e.g., canvas or sheet of paper) in the form of a painting, printing or drawing

[1].

Archives, and Museums (GLAMs) have the mission of ensur- 5

ing that the art produced by mankind is properly preserved, 6

described, catalogued, and made accessible to the public. To 7

unlock the value of their artwork collections, GLAMs must 8

enable and facilitate browsing and retrieval for a broad yet 9

unforeseen variety of users, having an unknown variety of 10

needs. To this end, textual annotations are used to describe 11

the instances of classes of visual objects, e.g. objects, plants, 12

animals and human body parts, represented in the artworks. 13

Image annotation is a notoriously hard problem for comput- 14

ers to solve but, thanks to recent progress in computer vi- 15

sion techniques [2,3], it is now possible to correctly identify 16

http://dx.doi.org/10.1016/j.comnet.2015.07.008

1389-1286/© 2015 Published by Elsevier B.V.

Please cite this article as: J. Oosterman et al., On the impact of knowledge extraction and aggregation on crowdsourced anno-

tation of visual artworks, Computer Networks (2015), http://dx.doi.org/10.1016/j.comnet.2015.07.008

Page 2: ARTICLE IN PRESS · ARTICLE IN PRESS JID: COMPNW [m3Gdc;July 22, 2015;16:6] ComputerNetworksxxx(2015)xxx–xxx Contents lists available at ScienceDirect ComputerNetworks

2 J. Oosterman et al. / Computer Networks xxx (2015) xxx–xxx

ARTICLE IN PRESSJID: COMPNW [m3Gdc;July 22, 2015;16:6]

the presence of several visual classes in photographic images.17

Alas, such techniques cannot be applied in cultural heritage18

collections due to the unique nature of visual artworks [1].19

Therefore, GLAMs employ professionals, mostly art histori-20

ans, to analyse artworks and create annotations about the21

occurrence of visual classes of interest; but the quality and22

extent of their annotation work is subject to temporal, mon-23

etary, and knowledge limitations.24

Crowdsourcing has emerged as a viable solution to com-25

plement (or substitute) computer vision algorithms for many26

“difficult” visual analysis tasks, including the annotation of27

visual content [4–7]. Crowdsourced artwork annotation is28

a representative example of a Crowdsourced Knowledge Cre-29

ation (CKC) task [8], i.e. a class of crowdsourcing tasks where30

workers are requested to stress their high-level cognitive31

abilities (e.g., knowledge synthesis, data interpretation), and32

draw from their experience or education, in order to solve33

problems for which a unique, factual solution might not exist.34

In the case of artwork annotation, a crowd worker must pos-35

sess the knowledge and skills required to: (1) understand the36

abstract, symbolic, or allegorical interpretation of the reality37

depicted in the artwork to identify the occurrences of visual38

classes; and (2) recognise the type of such visual classes, de-39

scribing them with an expressive text.40

Crowdsourcing of artwork annotation is still an open re-41

search challenge.42

Domain-specific experts are hard and expensive to re-43

cruit. Knowledgeable contributors (e.g. pro-amateurs and44

enthusiasts) might be present in anonymous human compu-45

tation marketplaces, but must be located, engaged and moti-46

vated.47

The identification and recognition of visual classes are48

aspects of artwork annotation that can be influenced by the49

CKC process design. The knowledge extraction step is of great50

importance, as it requires work interfaces that guide, but not51

constrain, their high-level cognitive and memory processes52

of contributors. This is in contrast to traditional “computa-53

tional” crowdsourcing tasks, where a well-defined work in-54

terface guarantees execution efficiency and consistency. Also,55

the aggregation of knowledge from individual workers must56

account for the broad diversity of opinions and interpreta-57

tions that a crowd knowledge elicitation task might imply.58

This work studies the crowdsourcing of artwork annotation,59

and addresses the following research questions:60

g61

62

63

s64

f65

66

67

68

69

70

71

72

73

that is likely present in a general population. Three trusted 74

assessors created a reference annotation ground-truth to as- 75

sess both the number, type, and location of flowers depicted 76

in the dataset3. To test the effect of the CKC’s knowledge ex- 77

traction step, we evaluated two annotation configurations: 78

an Artwork–centric configuration where textual annotations 79

about visual objects are specified for the whole artwork; and 80

a Class–centric configuration where occurrences of visual ob- 81

jects are identified using bounding boxes with distinct tex- 82

tual annotations. 83

The experiment was performed on the CrowdFlower 84

human computation marketplace, and involved a crowd 85

of 235 workers. For each knowledge extraction configura- 86

tion, we tested the impact of different aggregation methods 87

on the identification and recognition performance. We anal- 88

yse the quality of annotations provided crowd workers to 89

study the richness of their vocabulary, and its overlap with 90

respect to annotations created by domain experts. 91

The main contribution of this paper is a study on how 92

extraction and aggregation methods affect annotation qual- 93

ity of visual artworks in a Artwork–centric and Class–centric 94

configuration. To enable our study we created a novel algo- 95

rithm for aggregating annotations in the Class–centric con- 96

figuration. 97

Results confirm the unique nature of the artwork annota- 98

tion problem, showing how crowdsourcing techniques that 99

are effective for photographic image annotation cannot be 100

straightforwardly applied. The high percentage of workers 101

who dropped out during recruitment testifies to the chal- 102

lenges related to the identification of visual objects, even 103

when as simple as flowers. The experiments highlight the 104

impact that the CKC process can have on the identification 105

and recognition quality: a Artwork–centric configuration en- 106

hances recognition aspects, and comes with a richer annota- 107

tion vocabulary; on the other hand, a Class–centric configura- 108

tion guarantees better identification performance, but poorer 109

recognition and vocabulary. 110

The remainder of the paper is structured as follows. In 111

Section 2 we present the related work. Section 3 details and 112

exemplifies the complexities of visual object identification 113

and recognition in artwork annotation. Next, Section 4 de- 114

scribes the design and execution of our evaluation. Sections 5 115

and 6 present and discuss experimental results. Section 7 116

concludes and sets the scene for future work. 117

118

- 119

- 120

- 121

RQ1: Can non-professional annotators from crowdsourcin

platforms provide high quality artwork annotations?

RQ2: To what extent can the extraction and aggregation

steps of a crowdsourced knowledge creation proces

influence the identification and recognition aspects o

visual artwork annotation?

To answer these questions, we partnered with the Ri-

jksmuseum Amsterdam2, and set-up an extensive evaluation

campaign aimed at testing the performance of workers from

human computation platforms when asked to identify and

recognise occurrences of visual object classes in artworks.

We assembled a collection of 80 Rijksmuseum prints, and fo-

cused on the “flowers” class, to target an area of expertise

2 http://rijksmuseum.nl. The Rijksmuseum Amsterdam is the largest and

most prestigious museum in the Netherlands.

k 122

- 123

e 124

- 125

126

- 127

- 128

Please cite this article as: J. Oosterman et al., On the impact of kn

tation of visual artworks, Computer Networks (2015), http://dx.do

2. Related work

Recent literature [1,9] shows how in contrast to photo

graphic images, which carefully represent the real world, art

works provide less and typically inconsistent visual informa

tion (texture, colour, depth, etc.); this, together with the lac

of sufficiently large training sets and the presence of a size

able number of visual classes to be recognised, are among th

main causes for ineffective automatic artwork annotation al

gorithms. Hybrid image annotation methods [10,11] emerged

as a promising solution to reduce costs and error rate by com

plementing automatic techniques with crowdsourced an

notations. Inspired by these works, our paper focuses on 129

3 The dataset and the results of this study are available for download from

http://bit.ly/CN-SI-artworks.

owledge extraction and aggregation on crowdsourced anno-

i.org/10.1016/j.comnet.2015.07.008

Page 3: ARTICLE IN PRESS · ARTICLE IN PRESS JID: COMPNW [m3Gdc;July 22, 2015;16:6] ComputerNetworksxxx(2015)xxx–xxx Contents lists available at ScienceDirect ComputerNetworks

J. Oosterman et al. / Computer Networks xxx (2015) xxx–xxx 3

ARTICLE IN PRESSJID: COMPNW [m3Gdc;July 22, 2015;16:6]

cr130

o131

g132

fi133

b134

135

a136

p137

th138

fe139

fr140

p141

to142

w143

ta144

le145

in146

g147

a148

e149

te150

fr151

u152

ia153

154

p155

fo156

a157

m158

w159

n160

in161

se162

163

w164

T165

in166

fo167

a168

[2169

to170

[2171

n172

w173

174

v175

w176

co177

T178

m179

cu180

o181

ca182

ce183

ta184

p185

3186

187

th188

p 189

h 190

fy 191

b 192

3 193

194

w 195

to 196

re 197

h 198

o 199

D 200

c 201

ti 202

p 203

d 204

fl 205

te 206

su 207

o 208

o 209

210

se 211

o 212

b 213

p 214

ta 215

e 216

a 217

218

a 219

a 220

e 221

m 222

ti 223

to 224

id 225

th 226

T 227

th 228

it 229

(d 230

231

p 232

fo 233

id 234

th 235

p 236

c 237

d 238

in 239

th 240

241

g 242

P

t

owdsourcing visual object occurrence annotations, a form

f annotation where strict quality control and effective ag-

regation techniques are needed to cater for the natural dif-

culties related to the drawing of geometric coordinates to

ound image objects [12].

Previous works investigated how crowds can support the

rtwork annotation process [13–15]. For instance, “The Steve

roject” [13] studied crowd tagging of collections from more

an 12 USA-based museums and compared crowd and pro-

ssional taggers. Authors found crowd annotators, drawn

om museum attendees, to use a different vocabulary than

rofessional ones, but that such annotations were effective

improve the retrieval of the artworks. In [15], crowds

ithout prior domain knowledge were engaged to anno-

te prints, while gaming mechanisms stimulated them to

arn about the domain. Based on the gaming mechanism

troduced by Luis von Ahns popular ESP-game [16], several

ames with a purpose [17,18] have been proposed to collect

rtwork metadata. Differently from previous works, which

xploited domain-knowledgeable volunteers or museum at-

ndees, our approach explicitly focuses on crowds drawn

om human computation platforms, i.e. anonymous individ-

als for which no assumption can be made about their famil-

rity with artworks or their domain knowledge.

Recent studies [4,19,20] compared the performance of ex-

ert and human annotators from human computation plat-

rms. All studies agreed on the potential and the scalability

nd reduced costs of crowds compared to experts, but also

entioned that additional actions, such as repetition and

orker qualification, are needed to obtain high quality an-

otations. Standard aggregation techniques for crowd results

clude removing results failing qualification tests and sub-

quently using majority voting to combine the results [4].

Most studies however focused on tasks that required

orkers to have only basic skill and common knowledge.

here is demand for more complex tasks, for example requir-

g creativity [21]. Only recently, several works advocated

r specialised crowdsourcing techniques for knowledge cre-

tion tasks, such as nichesourcing [22] or community sourcing

3,24]. For instance, in the context of domain-specific on-

logy and taxonomy creation, Noy et al and Chilton et al.

5,26] found a crowd performance of around 80% correct-

ess which, although being lower than that of domain expert,

as very promising and above all scalable.

The results described in this paper are rooted in our pre-

ious work on crowdsourced knowledge creation. The early

ork [27] defined the problem of artwork annotation in the

ntext of the Rijksmuseum Amsterdam in the Netherlands.

he subsequent works [8,28] introduced an experimental

ethodology, and reports on preliminary results mainly fo-

sed on Artwork–centric knowledge extraction. To the best

f our knowledge, our work is the first one that systemati-

lly studies the performance of Artwork–centric and Class–

ntric knowledge extraction techniques in human compu-

tion platforms, and assess their identification and location

erformance with respect to a high-quality ground truth.

. Challenges in artwork annotation

This section elaborates on the major challenges related to

e annotation of visual objects in artworks. We describe the n

lease cite this article as: J. Oosterman et al., On the impact of know

ation of visual artworks, Computer Networks (2015), http://dx.doi.o

rocess currently employed at the Rijksmuseum Amsterdam,

ighlighting typical annotation requirements, and exempli-

ing how the compliance to such requirements is hindered

y the nature of visual artworks.

.1. The need for professional annotation of artworks

The Rijksmuseum has a collection of over 1 million art-

orks, 700, 000 of which are prints, that the museum wants

make accessible for online consumption. The museum cur-

ntly conducts the following digitization process. First, a

igh quality digital representation is created. Then, a team

f 6 professional, in-house, annotators describe the artwork.

uring discussions with the museums curator of the online

ollection the museums’ interest regarding artwork annota-

on was stated: descriptions of both the art-historical as-

ects (such as creator, material and date of creation) and the

epicted visual objects, such as depicted persons, buildings,

ora and fauna. People using or studying their collection of-

n try to answer questions regarding a visual object class Xch as: “How many prints depict X?”; “Which are the type

f X most commonly represented in a given artistic period?”;

r “How are X depicted in different genres and periods?”.

To enable these retrieval scenarios, annotations must pos-

ss the following 2 properties: (1) coverage, i.e. all instances

f the visual object classes represented in the artwork should

e identified, (possibly) located, and annotated, and (2) ex-

ressiveness, i.e. visual objects should be recognised and anno-

ted with texts that should serve a broad spectrum of knowl-

dge levels; this is to allow both common and expert users to

ccess the collection by using the most familiar language.

Professional annotators are given 25 min to retrieve an

rtwork from storage, analyse it, find relevant information

nd publications online and in their library, and describe by

ntering annotations and references in collection manage-

ent software4. Twenty minutes are devoted to the descrip-

on of art-historical aspects, while only 5 min are allocated

visual object classes. The latter involves two steps: the

entification of instances of a given class, and subsequently,

eir recognition and association with a representative label.

he flower annotation case study. The size and importance of

e Rijksmuseum make it an exemplary case of cultural her-

age institution striving for high-quality annotation of digital

igitized) collections.

Let us consider the print in Fig. 1, drawn from the 700K

rints collection. According to the requirements defined be-

re, professional annotators from the Rijksmuseum must

entify all the instances of depicted flowers, and describe

em with their common and scientific names. The print de-

icts a woman holding flowers sitting in a flower decorated

art pulled by dogs; the print contains 71 flower instances,

istributed in at least 8 areas (highlighted with black bound-

g boxes), not considering the flowers decorating the cart or

e decorations resembling flowers on the cart’s wheels.

While focusing on prints and flower annotations, we ar-

ue this use case to be representative of a broader class of

4 The allocated time is constrained by budgetary considerations and can-

ot be significantly increased.

ledge extraction and aggregation on crowdsourced anno-

rg/10.1016/j.comnet.2015.07.008

Page 4: ARTICLE IN PRESS · ARTICLE IN PRESS JID: COMPNW [m3Gdc;July 22, 2015;16:6] ComputerNetworksxxx(2015)xxx–xxx Contents lists available at ScienceDirect ComputerNetworks

4 J. Oosterman et al. / Computer Networks xxx (2015) xxx–xxx

ARTICLE IN PRESSJID: COMPNW [m3Gdc;July 22, 2015;16:6]

a243

-244

-245

-246

247

y248

e249

e250

251

252

-253

-254

-255

k256

-257

r258

-259

r260

r261

-262

263

o264

265

266

267

-268

,Q2

269

t270

e271

272

-273

r274

275

276

-277

-278

.279

,

- 280

. 281

e 282

s 283

e 284

s 285

- 286

e 287

288

289

- 290

f 291

- 292

o 293

, 294

295

; 296

- 297

t 298

- 299

4 300

301

g 302

6 303

s 304

305

306

- 307

r 308

309

310

311

312

- 313

314

e 315

6 https://sealincmedia.wordpress.com. SEALINCMedia is part of the Dutch

national program COMMIT.

Fig. 1. ”Scent“ by Petrus Cool.

artwork annotation problem: on the one hand, it reflects

typical annotation subject, i.e. a non-photographic represen

tation of (a possibly symbolic) reality containing many, di

verse visual objects. On the other hand, “flower” is an ex

ample of a common visual object class, which is not related

to historical aspects, and for which frequency and specificit

are a major bottleneck in the manual annotation process. Th

next sections describe two main challenges that affect th

creation of qualitative annotations.

3.2. Challenges in visual object identification

Visual artworks often provide an abstract, symbolic, or al

legorical interpretation of reality. In such context, the iden

tification of all visual object occurrences is a very time

consuming and error-prone task, complicated by: (1) the lac

of colours or details; (2) the abstract or stylised representa

tion of the visual class occurrence; (3) the size, density, o

composition of the depicted visual objects; and ( 4) subjec

tive or personal interpretations. In such conditions, in orde

to identify all the occurrences of visual objects, an annotato

must show both commitment to the annotation task (to ac

count for the potentially high number of occurrences), and

some degree of experience in the art domain, to be able t

correctly infer the content of visual artworks.

3.3. Challenges in visual object recognition

To correctly recognise visual objects, and describe them

with expressive text, domain-specific expertise is often re

quired. Let us consider the domain of flowers: arguably

everyone is exposed, to some extent, to knowledge abou

flowers: in the mind of the writers, it is difficult to imagin

someone not being able to recognise the red flower in Fig. 2

as a rose. However, going beyond such a shallow descrip

tion requires domain-specific knowledge. Would the reade

be able to tell a Rosa canina from a Rosa multiflora?5. And

which terminology would the reader be able to use?

In our case study, the Rijksmuseum is interested in an

notating each flower instance at different levels of speci

ficity, according to the flowers (formally: plants) taxonomy

5 Both Rosa canina and Rosa multiflora belong to the rosa genus and

through the eyes of a non-expert, they look pretty similar.

Please cite this article as: J. Oosterman et al., On the impact of kn

tation of visual artworks, Computer Networks (2015), http://dx.do

Fig. 2. An extract of the flower taxonomy for box 8 of Fig. 1.

It ranges from the top element Kingdom, for example Plan

tae (all plants), to the most specific element Species, e.g

Hyacinthus orientalis, one plant. Fig. 2 depicts part of th

flower description taxonomy. Above thespecies level i

genus, which describes multiple flower species, e.g. th

Hyacinthus genus. Above genus is family, which describe

multiple flower genus, e.g., the Asparagaceae family. Anno

tations could use both thecommon, e.g. Dutch hyacinth, or th

botanical, e.g. Hyacinthus orientalis, flower name.

4. Experimental design

Our goal is to study the annotation coverage and accu

racy of non-professional annotators drawn from a crowd o

anonymous workers, and to analyse their performance un

der different knowledge creation process configurations. T

this end, we instrumented an extensive evaluation campaign

discussed in this section.

Section 4.1 describes the experimental dataset

Section 4.2 introduces the two annotation configura

tions subject of our study; Section 4.3 provides details abou

the setup and quality control mechanism of the experi

ments performed on the CrowdFlower platform; Section 4.

describes the procedure employed for the normalisation

of crowd annotations; Section 4.5 describes the gatherin

of annotations from domain experts; finally, Section 4.

introduces the aggregation methods and evaluation metric

used for quantitative assessment.

4.1. Experimental dataset

In collaboration with personnel of the Rijksmuseum Am

sterdam, we selected 80 prints containing at least one flowe

instance. We then instrumented a ground-truth creation

task, aimed at defining, for each print, the exact number and

location of each contained flower instance.

We recruited 3 trusted annotators. They were selected

from the SEALINCMedia project6 staff; we considered in

dividuals familiar with the targeted collections and with

the technology used for image annotation, namely th

owledge extraction and aggregation on crowdsourced anno-

i.org/10.1016/j.comnet.2015.07.008

Page 5: ARTICLE IN PRESS · ARTICLE IN PRESS JID: COMPNW [m3Gdc;July 22, 2015;16:6] ComputerNetworksxxx(2015)xxx–xxx Contents lists available at ScienceDirect ComputerNetworks

J. Oosterman et al. / Computer Networks xxx (2015) xxx–xxx 5

ARTICLE IN PRESSJID: COMPNW [m3Gdc;July 22, 2015;16:6]

Easy Average Hard

Nr.

flow

ers

(log)

1

10

100

(a) Number of flowers

Easy Average Hard

Nr.

type

s

0

10

(b) Number of types

Easy Average Hard

Avg

. bb.

siz

e (lo

g)

10 4

10 2

1

(c) Avg. bounding box size

Easy Average Hard

Nr.

bb. o

verla

ps

0

1

2

(d) Avg. overlapped bounding box ratio

Fig. 3. Distributions of the number of flowers and types and bounding box statistics per print difficulty class based on the ground-truth data created by the

trusted annotators.

A316

(s317

w318

in319

D320

a321

ca322

sa323

324

d325

th326

T327

to328

a329

330

w331

p332

W333

sp334

th335

o336

o337

a338

to339

g340

m341

p342

d343

a344

co345

th346

347

cu348

a349

id350

ti351

e352

ti

is

m

e 353

a 354

e 355

c 356

h 357

c 358

h 359

360

o 361

c 362

ty 363

fo 364

fl 365

th 366

th 367

th 368

n 369

la 370

re 371

b 372

c 373

4 374

375

in 376

A 377

n 378

C 379

su 380

a 381

ti 382

c 383

a 384

a 385

A 386

w 387

ti 388

P

t

nnotorious library.7 Annotorius allows users to draw

quare) bounding boxes on images. Each trusted annotator

as instructed to identify, in each print, all flowers conform-

g to the following definition:

efinition. A flower is considered to be the flowering part of

plant with petals and distinguishable from leaves. A branch

n have multiple flowers (but each of those flowers has the

me name). A flower bud counts as a flower.

For each print, they were asked to independently: (1)

raw a bounding box of each flower instance; and, (2) count

e number of different flower types depicted in the print.

o guarantee maximum correctness and precision, annota-

rs were given no time limits, and they were allowed to stop

nd resume at any time.

Upon completion of the independent annotation sessions

e gathered the results and created a new version of each

rint featuring the bounding boxes from each annotator.

e then organised three deliberation sessions (in total 8 h,

read over 3 days) where annotators were asked to discuss

eir work. For each print in the dataset, they needed to agree

n a unique set of bounding boxes, and on a unique number

f flower types. For each bounding box, the annotators were

sked to also agree on its location and size; they were asked

redraw each bounding box as accurately as possible, so to

uarantee that each flower occurrence was fully, but mini-

ally, contained by it. Trusted annotators were also asked to

rovide comments and remarks about the issues they faced

uring the annotation process, including properties of the

nnotated prints such as the presence of little flowers, low

ntrast flower and leaves, flower orientation and overlap,

at could have hindered the recognition activity.

These comments were uses to support an additional dis-

ssion session aimed at classifying each print in the dataset,

ccording to the difficulty encountered in its annotation. We

entified two difficulty dimensions, related to identifica-

on and recognition, while lead to three difficulty classes:

asy, which identifies prints where flower instances are

7 http://annotorious.github.io/. Annotorius is a javascript image annota-

on library developed in the context of the Europeana project. Europeana

a large European content aggregator for Cultural Heritage and strives to

ake Cultural Heritage more accessible.

v

ty

b

o

lease cite this article as: J. Oosterman et al., On the impact of know

ation of visual artworks, Computer Networks (2015), http://dx.doi.o

asy to identify and recognise; the class features 20 prints;

verage which identifies prints where identification is

asy, but recognition is hard — 29 prints; and hard, which

ontains prints where both identification and recognition are

ard to perform — 31 prints. Notably, no print in the dataset

ontained flower instance that were easy to recognise but

ard to identify.

This carefully crafted annotation work led to the creation

f 1047 bounding boxes. On average, each of the 80 prints

ontains 13 flower instances (σ = 15, Max = 71) and 3 flower

pes (σ = 3, Max = 16). Figs. 3a–d depict the distribution,

r each print in the dataset, of: (1) the number of identified

ower instances; (2) the number of unique flower types; (3)

e average size of the annotation bounding boxes ; and (4)

e average number of overlapping bounding boxes. Prints in

e average difficulty class (easy to identify, hard to recog-

ise) have, on average, the least number of flowers and over-

pping bounding boxes. The annotation difficulty is strictly

lated to the average size of the bounding boxes. The num-

er of flower types is similarly distributed across difficulty

lass.

.2. Experimental configurations

We consider two experimental configurations, each test-

g a different knowledge extraction modality. In the

rtwork–centric configuration, crowd workers define an-

otations on the artwork as a whole; in contrast, the

lass–centric requires annotations to be defined for each vi-

al object instance, thus including its exact location in the

rtwork. In both configurations, annotators are asked to iden-

fy and recognise the occurrences of objects in a given visual

lass. Then, for each configuration, we assess how different

ggregation methods impact on the quality of the resulting

nnotations.

rtwork–centric knowledge extraction. In this configuration,

e ask annotators to analyse the artwork as a whole. Iden-

fication is performed by counting the occurrences of the

isual objects of interest, and the number of object class 389

pes therein represented. Recognition also occurs globally, 390

y asking the worker to provide labels for each distinct type 391

f object types identified in the print. The Artwork–centric 392

ledge extraction and aggregation on crowdsourced anno-

rg/10.1016/j.comnet.2015.07.008

Page 6: ARTICLE IN PRESS · ARTICLE IN PRESS JID: COMPNW [m3Gdc;July 22, 2015;16:6] ComputerNetworksxxx(2015)xxx–xxx Contents lists available at ScienceDirect ComputerNetworks

6 J. Oosterman et al. / Computer Networks xxx (2015) xxx–xxx

ARTICLE IN PRESSJID: COMPNW [m3Gdc;July 22, 2015;16:6]

-

393

.394

e395

396

-397

-398

-399

.400

-401

402

l403

-404

r405

s406

o407

e408

o409

o410

r411

412

-413

e414

y415

t416

r417

418

-419

s420

-421

e422

-423

424

425

426

427

428

h

-

f 429

r 430

r 431

432

e 433

434

o 435

436

437

- 438

. 439

- 440

e 441

s 442

- 443

e 444

e 445

- 446

- 447

- 448

e 449

450

- 451

- 452

s 453

o 454

455

s 456

e 457

Fig. 4. Artwork–centric annotation user interface. Static image left and an

notation fields right.

annotation configuration has as advantage the simplicity and

speed by which an annotator can perform the assigned task

The main drawback, on the other hand, is that workers ar

not allowed to provide information about where and how (in

terms of size, clarity, etc.) in the artwork a given object in

stance is represented. Likewise, labels are not directly associ

ated with occurrences of the visual object class, thus provid

ing no insights about their importance or role in the artwork

Fig. 4 depicts the user interface used in our evaluation for Art

work–centric annotation. The print to annotate is presented

on the left-hand side. The worker can open the image at ful

screen by clicking on it. The right hand side lists the anno

tation fields. Workers are requested to indicate the numbe

of flower instances, and the number of distinct flower type

they identify in the print. Three text fields are available t

provide up to three labels describing the flower types in th

print. To avoid biases in the knowledge extraction process, n

auto-suggestions functionality was provided: workers had t

manually specify flower names, both in their botanical o

common form.

To account for unidentified flower types, workers can re

port a print to contain only flower occurrences of imaginativ

nature (fantasy); or simply indicate their inability to specif

any suitable label (unable). In both cases, an additional tex

box is prompted to collect comments about the reasons fo

their judgment.

Class–centric knowledge extraction. This configuration im

proves on the Artwork–centric one by providing worker

with the ability to pinpoint the occurrences of the sought vi

sual object in the artwork. Fig. 5 depicts the user interfac

used to collect Class–centric annotations: using the Anno

torius library, workers were asked to draw bounding boxes

directly on the image. Labels are specified for each iden-

tified occurrence, thus allowing for more fine-grained and

localised annotations. A bounding box is supposed to fully

contain the identified flower occurrence. Like in the Art-

t

s

Please cite this article as: J. Oosterman et al., On the impact of kn

tation of visual artworks, Computer Networks (2015), http://dx.do

Fig. 5. Class–centric annotation user interface. Interactive image on whic

bounding boxes are drawn left. Annotation fields appear per drawn bound

ing box.

work–centric user interface, on the right hand side a set o

annotation fields to allow workers to provide the numbe

of occurrences of the objects of interest; and the numbe

of distinct flower types for which a distinct label has been

provided. An additional field is automatically filled with th

number of bounding boxes drawn for the current print. Such

fields are defined for quality control purposes, but are als

used to further study the behaviour of crowd annotators.

4.3. Crowdsourced artwork annotation task design

The experiment was performed on CrowdFlower8, a hu

man computation platform that recruits workers worldwide

We set up two annotation jobs, one for each annota

tion configuration. Each task required the annotation of fiv

prints, and was rewarded with € 0.18. Compensation ha

been defined after several pilot experiments, used to test

drive the annotation interfaces and to receive feedback on th

monetary reward (which resulted in a 3 out of 5 score). Th

moderate sum was set so to attract people intrinsically mo

tivated while still giving some monetary appreciation. Work

ers were recruited in the Level 1 Contributors class of Crowd

Flower9 to exclude workers known by CrowdFlower to hav

a low quality.

Quality control mechanisms. Each annotation task was intro

duced by a description page, listing instructions about the re

quested input, an explanation of the user interface, example

of target annotations, and the same flower definition given t

trusted annotators.

Each worker had to first perform a qualification task, a

a necessary pre-condition to participate in the rest of th

8 http://crowdflower.com9 Level 1 contributors are “high performance contributors who accoun

for 60% of monthly judgments and maintain a high level of accuracy acros

a basket of CrowdFlower jobs.”

owledge extraction and aggregation on crowdsourced anno-

i.org/10.1016/j.comnet.2015.07.008

Page 7: ARTICLE IN PRESS · ARTICLE IN PRESS JID: COMPNW [m3Gdc;July 22, 2015;16:6] ComputerNetworksxxx(2015)xxx–xxx Contents lists available at ScienceDirect ComputerNetworks

J. Oosterman et al. / Computer Networks xxx (2015) xxx–xxx 7

ARTICLE IN PRESSJID: COMPNW [m3Gdc;July 22, 2015;16:6]

experiment. In the qualification task, workers were asked to458

annotate 5 prints, sampled from the ones belonging to the459

easy difficulty class, and having an unambiguously number460

of flowers and flower types, as determined by the trusted461

annotators. Each task paid € 0.18 upon completion. Work-462

ers were evaluated against the number of flowers and number463

of types defined in the ground truth; we allowed an off-by-464

one error to account for small counting errors. Workers that465

failed to correctly annotate more than one print in the qual-466

ification task were blocked from performing further tasks.467

Successful workers were allowed to continue with other an-468

notation tasks; to avoid learning bias, prints used in the qual-469

ification tasks were not shown twice to the same worker.470

For the same reason, workers that performed tasks in the471

Artwork–centric job were not allowed to perform tasks in472

the Class–centric job, and vice-versa. As an additional quality473

control mechanism one hidden control question, a so-called474

honeypot, was added to each annotation task. Each control475

question contained a print with an unambiguously number476

of flowers and flower types were used, as determined by477

the trusted annotators. Workers were paid regardless of their478

performance with the control question. Prints were shown to479

workers in random order. Each worker could annotate every480

remaining print in the set, but only once. We designed the481

ta482

a483

e484

th485

4486

487

a488

th489

w490

ti491

m492

m493

a494

w495

e496

u497

o498

a499

W500

h501

F502

h503

w504

a505

th506

cl507

4508

509

a510

1

is

sought botanical experts through our social and work envi- 511

ronments, but also looking for candidates in our geograph- 512

ical surroundings. Initially we only contacted members of a 513

garden-historical society, but unfortunately those members 514

did not have time available. We then expanded our inquiries 515

to the Wageningen University of Agriculture in the Nether- 516

lands and to friends, family and acquaintances who were 517

known to be working with, or are passionate about, flowers. 518

Our efforts resulted in three plant-related researchers from 519

Wageningen University, a former forest ranger and a prac- 520

titioner in the flower domain, who volunteered their time 521

to annotate our dataset collection with flower labels. We 522

trusted their self-assessment of their capabilities after show- 523

ing sample images from our print dataset. We let these do- 524

main experts annotate the images and test the crowd labels 525

with respect to the ones produced by these annotators with 526

known domain expertise. 527

Two domain experts annotated all (80) prints. The other 528

three annotated 24, 13 and 3 prints, respectively. In total, 186 529

flower labels were provided. The majority of all the labels 530

provided by experts (80%) were defined at the genus level. 531

The botanical form was used in 37% of the labels. In 57 532

annotation tasks (28%) at least one expert reported unable to 533

name any of the flowers in the print. Notably, for 12 prints 534

(1 535

la 536

la 537

la 538

C 539

540

1 541

e 542

v 543

c 544

H 545

c 546

p 547

u 548

le 549

4 550

551

th 552

g 553

sc 554

4 555

556

p 557

b 558

w 559

ty 560

th 561

ri 562

Matching: given a ground truth bounding box gb, a bound- 563

ing box b created by a worker is defined as matching if it over- 564

laps with gb. If b matches multiple gb’s, we take the closest 565

one (measured by the L1 distance between the bounding box 566

vectors) as the matched ground truth bounding box. 567

P

t

sk such that each print could be annotated by at most 10

nnotators. If the overall accuracy of a worker across all ex-

cuted tasks on control questions dropped below 60%, then

e annotations from such workers were discarded.

.4. Annotation labels pre-processing

Labels produced underwent a pre-processing step aimed

t providing a uniform label dictionary and, thus, support

e subsequent analysis. Due to the employment of a world-

ide workforce, labels were assumed to be provided in mul-

ple languages, and to include spelling errors. To achieve

aximum accuracy, each flower label has been manually

apped to a Wikipedia10 page using the search function-

lity on that site. If no corresponding page could be found,

e used the label plus the word wiki on a regular search

ngine to find wiki’s in other languages. Subsequently we

sed the Show this page in other languages functionality

f Wikipedia to revert back the English form. DBPedia is

semantic knowledge base, based on information from

ikipedia. As almost every page on the English Wikipedia

as a similar page on DBPedia which has the same content.

or example the DBPedia page about the Californian Rose is

ttp://dbpedia.org/page/Rosa_californica. Thanks to the link

ith DBPedia, it has been possible to reconcile synonyms,

nd automatically associate each flower label as belonging to

e Genus, Species, or Family level. Each label was also

assified as a common name or as botanical name.

.5. Label quality assessment

As the prior expertise of crowd workers is unknown, we

ssess the quality of labels they provide. For this purpose, we

0 Although Wikipedia has information on many topics, the flower domain

very well represented having most of the flower taxonomy present.

lease cite this article as: J. Oosterman et al., On the impact of know

ation of visual artworks, Computer Networks (2015), http://dx.doi.o

1 hard and 1 average) no expert was able to provide any

bel. The remaining 68 prints received on average 2.7 flower

bels (σ = 1.45, Max = 7) each. The most frequently used

bels have been Rose (20% of all labels), Lily (17%), Tulip (6%),

arnation (5%) and Iris (5%).

On average, experts respectively provided 2.40 (σ =.60), 2.40 (σ = 0.73), and 1.64 (σ = 1.47) unique labels for

asy, average, and hard prints. The perceived experts

ocabulary featured 59 distinct flowers names. By recon-

iling botanical and common name of flowers, such as

elianthus and. Sunflower, the number of unique labels de-

reases to 52. Fifty-six percent of experts’ vocabulary is ex-

ressed in botanical form. Looking at the distribution of

nique labels, most were expressed at the genus specificity

vel (66% versus 34% at the species level).

.6. Evaluation and aggregation

In this subsection we define the metrics used to assess

e quality of individual annotations and define our aggre-

ation methods for multiple annotations. Section 4.6.3 de-

ribes our novel bounding box aggregation algorithm.

.6.1. Evaluation of individual annotations

We assess crowd workers’ contribution quality by com-

aring their annotation results with the ground-truth created

y trusted annotators. In the Artwork–centric configuration,

e compare the number of flowers and the number of flower

pes. In the Class–centric configuration, for which we have

e bounding box information, we define 3 additional met-

cs related to the precision of the bounding boxes.

ledge extraction and aggregation on crowdsourced anno-

rg/10.1016/j.comnet.2015.07.008

Page 8: ARTICLE IN PRESS · ARTICLE IN PRESS JID: COMPNW [m3Gdc;July 22, 2015;16:6] ComputerNetworksxxx(2015)xxx–xxx Contents lists available at ScienceDirect ComputerNetworks

8 J. Oosterman et al. / Computer Networks xxx (2015) xxx–xxx

ARTICLE IN PRESSJID: COMPNW [m3Gdc;July 22, 2015;16:6]

a568

-569

570

571

f572

e573

o574

-575

576

577

a578

579

x580

y581

-582

s583

y584

585

-586

g587

y588

-589

590

591

-592

593

594

r595

-596

-597

r598

;599

-600

e601

602

s603

604

605

s606

607

)

r608

609

s610

611

612

613

614

615

616

617

- 618

g 619

– 620

g 621

622

- 623

- 624

- 625

s 626

y 627

y 628

- 629

- 630

s 631

y 632

e 633

, 634

. 635

- 636

t 637

638

639

g 640

641

r 642

a 643

. 644

645

- 646

o 647

- 648

- 649

s 650

e 651

652

s 653

a 654

g 655

r 656

g 657

e 658

- 659

s 660

; 661

Centroid distance: If a worker bounding box b matches

ground truth bounding box gb, we define its centroid dis

tance as

cDist =

√(gbcX − bcX

gbw

)2

+(

gbcY − bcY

gbh

)2

where gbw, gbh denotes the width and height of the ground

truth bounding box, bw, bh denotes the width and height o

the matched bounding box. Intuitively, cDist measures th

distance between the centres of the two bounding boxes; t

account for differences in bounding boxes size, cDist is calcu

lated by normalising the coordinate distances by the ground

truth bounding box width and height.

Area overlap: If a worker bounding box b matches

ground truth bounding box gb, we define its area overlap as

aOverlap = gba ∩ ba

gba ∪ ba

where gba, ba denotes the area of ground truth bounding bo

and worker bounding box. Note that for aOverlap we simplif

the intersection/union of the geometric areas of two bound

ing box using set operators. The value of the aOverlap varie

in the range (0, 1], in which 1 means that gb and b perfectl

cover each other.

Notice that cDist and aOverlap measure different proper

ties: a worker bounding box close to ground truth boundin

boxes (i.e., with a small value of cDist) might not necessaril

cover a large portion of area of the same ground truth bound

ing boxes (i.e., a large value of aOverlap), and vice versa.

4.6.2. Artwork–centric annotation aggregation

As common in crowdsourcing, the works of several con

tributors are aggregated in order to produce a single “crowd

truth”.

Identification. The number of flowers and number of flowe

types identified in a print are numerical variables. We there

fore adopt as aggregation operators traditional numeric func

tions: (1) the median of the figures provided by each worke

for a given print, which guarantees robustness to outliers

and (2) the maximum value provided by one of the annota

tors which, on the other hand, can be seen as a conservativ

measure. We decided to experiment the maximum function

to account for the under-specification tendency of worker

described in the previous section.

The ratio of erroneous number of flowers (ξ flower) and

number of types (ξ types) reported for a print are defined a

follows:

ξflower = |#FLat − #FLgt |#FLgt

; ξtypes = |#TY at − #TY gt |#TY gt

(1

where #FLat and #TY at are, respectively the average numbe

of flower and the average number of flower types provided

by workers, and #FLgt and #TY gt are the number of lower

and number of flower types in ground truth.

Recognition. We aggregate the labels provided by the crowd

for each artwork using three different methods: (1) the union

of all crowd labels; (2) labels specified by at least 2 crowd

workers; and (3) labels specified by the majority of workers

annotating a print.

r 662

s 663

664

665

Please cite this article as: J. Oosterman et al., On the impact of kn

tation of visual artworks, Computer Networks (2015), http://dx.do

(a)Individual (b)Aggregated

Fig. 6. An example of aggregating multiple individual bounding boxes.

4.6.3. Class–centric annotation aggregation

Identification. For this configuration we assess, as in the Art

work–centric, both the number of flowers and types usin

the median and maximum. Given the local nature of Class

centric annotations, we address the problem of boundin

boxes aggregation. This task is not trivial, as it is hindered

by several factors, e.g.: (1) the usage of different pointing de

vices, e.g. mouse or touch screens, which affects the draw

ing action; (2) workers might not be able to correctly iden

tify an entity instance; and (3) the presence of maliciou

or poorly motivated workers, that might incorrectly (e.g. b

drawing random or very big bounding boxes), or partiall

perform the task at hand. To cater for such problems, we pro

pose a novel method to aggregate the Class–centric annota

tions produced by multiple workers, so that bounding boxe

can be aggregated together to obtain a correct, high-qualit

identification. We propose a novel method to aggregat

Class–centric annotations produced by multiple workers

each providing multiple bounding boxes to the same image

We note that a similar method, [29], has been used for aggre

gating bounding boxes from multiple sources. However, tha

method assumes each image has only one object, i.e. each

worker only provides one bounding box to an image.

See Fig. 6 for an example of the application of a boundin

box aggregation method.

Fig. 7 summaries the steps that compose our method fo

bounding box merging. The bounding boxes defined for

print are first pre-processed to remove low quality ones, i.e

bounding boxes having an area bigger than 3σ (3 standard

deviations) of the mean value of all the bounding boxes spec

ified for the same image. The threshold of 3σ is chosen t

retain as many bounding boxes that could contribute to ag

gregation. Such an heuristic is justified by the empirical ob

servation that some users draw very large bounding boxe

that contains multiple flowers, in contrast with other, mor

committed and precise users.

Then, the method requires the identification of all group

of bounding boxes in a print. Such a step can be modelled as

clustering task, where the goal is to find clusters of boundin

boxes such that all bounding boxes within the same cluste

target the same visual object occurrence, and that boundin

boxes of different clusters target different occurrences. W

test the performance of three clustering techniques: (1) sim

ple geometric clustering, where bounding boxes are defined a

belonging to the same cluster if they overlap (also partially)

(2) k-means; and (3) Gaussian Mixture Model (GMM) [30]. Fo

the two unsupervised clustering methods, bounding box Bi i

represented by the coordinates of its upper-left point p and

its bottom-right point q, i.e., B = (px, py, qx, qy)T .

owledge extraction and aggregation on crowdsourced anno-

i.org/10.1016/j.comnet.2015.07.008

Page 9: ARTICLE IN PRESS · ARTICLE IN PRESS JID: COMPNW [m3Gdc;July 22, 2015;16:6] ComputerNetworksxxx(2015)xxx–xxx Contents lists available at ScienceDirect ComputerNetworks

J. Oosterman et al. / Computer Networks xxx (2015) xxx–xxx 9

ARTICLE IN PRESSJID: COMPNW [m3Gdc;July 22, 2015;16:6]

ic annotation aggregation method.

666

to667

ra668

fo669

fe670

to671

b672

th673

a674

B675

676

e677

b678

cl679

m680

se681

o682

R683

m684

sa685

5686

687

d688

ti689

su690

w691

5692

693

w694

(2695

4696

w697

th698

a699

5700

701

e702

d703

re704

d705

d706

o707

a708

th709

su710

fl711

a712

fo713

re714

but less correlated with the size of such instances (which is 715

smaller in average than easy prints). 716

5.1.2. Recognition 717

Workers provided a total of 461 flower name labels. Al- 718

most all labels were provided in English, except for some 719

la 720

c 721

a 722

u 723

p 724

c 725

w 726

lo 727

e 728

b 729

o 730

p 731

732

p 733

b 734

c 735

Fa 736

W 737

a 738

sa 739

2 740

v 741

p 742

a 743

v 744

fr 745

b 746

F 747

e 748

re 749

5 750

is 751

o 752

la 753

is 754

m 755

5 756

757

b 758

sp 759

m 760

h 761

a 762

h 763

n 764

c 765

fo 766

sp 767

P

t

Fig. 7. Steps of our Class–centr

Note that for unsupervised clustering methods, we need

explicitly set the number of clusters K. Due to the wide

nge of flower occurrences in all images, K is set differently

r each print according to annotations from workers. Dif-

rent estimation strategies could be instantiated according

this principle. We compare: (1) the maximum number of

ounding boxes drawn by one of the print annotators; (2)

e median number among the workers; and (3) a value

utomatically obtained according to model selection such as

ayesian information criterion (BIC) [31].

This clustering step is then followed by the derivation, for

ach cluster or bounding boxes, of a single, representative

ounding box. Similarly as for the setting of the K number of

usters, we compare the performance of several aggregation

ethods, which, for each of the four bounding box vertexes,

lect (1) the maximum; or (2) the median value among the

nes of the bounding boxes in the cluster.

ecognition. To enable comparison, bounding box labels are

anaged as global artwork labels, and assessed using the

me metrics as in the Artwork–centric configuration.

. Results

This section discusses the outcomes of the experiments

escribed in Section 4. For each configuration, we assess mul-

ple annotation aggregation methods, and we analyse the re-

lting identification and recognition performance of crowd

orkers.

.1. Artwork–centric knowledge extraction evaluation

A total of 151 workers started a qualification task, out of

hich 67 (44%) failed. Out of the remaining 84 workers, 40

6%) decided to stop at the qualification task. The remaining

4 workers performed a total of 475 annotation tasks. Each

orker annotated an average of 10.8 prints (σ = 7.6). Despite

e high variance each print was sufficiently annotated by an

verage 5.9 workers (σ = 1.4).

.1.1. Identification

For each annotation task, we analyse the number of flow-

rs and flower types indicated by crowd workers. Table 1

escribes the under-/over-specification of these values with

spect to the ground-truth, broken down according to the

ifficulty class of the corresponding prints. Regardless of the

ifficulty class, workers tend to report less flowers than the

nes actually contained in the print; hard prints, which on

verage contain more flowers, are also the ones for which

is under-specification effect is more evident. A similar re-

lt is observable for hard prints for the reported number of

ower types. Crowd workers also under-specified for easynd average prints but the effect is less evident, especially

r the number of types. Note that this trend is highly cor-

lated with the number of flower instances in the print,

lease cite this article as: J. Oosterman et al., On the impact of know

ation of visual artworks, Computer Networks (2015), http://dx.doi.o

bels provided in Italian, Spanish and Dutch. All prints re-

eived at least one flower label from at least one worker. In 69

nnotations tasks (14%) at least one crowd worker reported

nable to name any flowers in the print. Figs. 8 and 9 de-

ict the distribution of workers’ labels in the 3 print diffi-

ulty classes. Fantasy labels were equally distributed. Prints

here multiple workers were unable are distributed as fol-

ws: easy (1), average (7), and hard (11). As expected,

asy prints received on average more unique labels than

oth average and hard. We account the higher number

f unique labels for hard prints to the higher number of de-

icted flower types.

Fig. 10a depicts the distribution of label specificity for

rints belonging to the three difficulty classes. Genus la-

els and common names are consistently the most used by

rowd workers (respectively 85% and 77% of all annotations).

mily names are rarer, but mainly specified in easy prints.

orkers’ vocabulary included 74 distinct flowers labels (58

fter reconciliation of botanical and common version of the

me flower name). On average, respectively 4.30 (σ = 2.90),

.13 (σ = 1.22), and 4.03 (σ = 2.06) unique labels were pro-

ided for easy, average, and hard print. Twenty-eight

ercent of flower labels were defined at species level, 67%

t genus, 5% at family level. Sixty-one percent of workers’

ocabulary is expressed in botanical form. The labels most

equently used by crowd workers are Rose (33% of all la-

els), Lily (15%), Tulip (7%), Sunflower (6%) and Carnation (5%).

ig. 10b depicts the vocabulary size per worker. Four work-

rs, which provided only unable and fantasy labels are not

ported. On average each worker has a vocabulary size of

.4 (σ = 3.5, Max = 16) distinct labels. The vocabulary size

strongly correlated (c = 0.75, p � 0.005) with the amount

f labels a workers provided . To account for this we calcu-

te a worker’s label diversity using Shannon entropy which

also shown in 10 b (indicated by a line). Users who provide

ore labels, also use such labels more often.

.1.3. Aggregation

Table 2 compares the results, in terms of error ratio, and

roken down according to the difficulty class of the corre-

onding prints, obtained after applying the median and

aximum aggregation functions. Performance is better (i.e.

igher identification precision) for prints in the easy and

verage than in the hard difficulty class. On the other

and, workers perform worse on the identification of the

umber of flowers in the easy class than in the averagelass. Such a result is mostly due to 5 prints in the easy class,

r which there is an error rate higher than 50%. Manually in-

ecting the annotations for these prints we observed that

ledge extraction and aggregation on crowdsourced anno-

rg/10.1016/j.comnet.2015.07.008

Page 10: ARTICLE IN PRESS · ARTICLE IN PRESS JID: COMPNW [m3Gdc;July 22, 2015;16:6] ComputerNetworksxxx(2015)xxx–xxx Contents lists available at ScienceDirect ComputerNetworks

10 J. Oosterman et al. / Computer Networks xxx (2015) xxx–xxx

ARTICLE IN PRESSJID: COMPNW [m3Gdc;July 22, 2015;16:6]

Table 1

Average difference and under-/over-specification of number of flowers and types by workers in Artwork–centric annotation.

Number of flowers Number of types

Under Equal Above Avg. diff. Under Equal Above Avg. diff.

Easy 80% 20% 0% −5.35 ± 7.93 40% 40% 20% −0.94 ± 1.76

Average 72% 21% 7% −0.90 ± 0.96 24% 52% 24% −0.11 ± 1.09

6 ± 11.1

nt aggre

ard

45 ± 0.2

41 ± 0.5

sy Ave

(b) Lab

) label, c

across d

Family coFamily boGenus coGenus boSpecies cSpecies b

t class.

ls in Art

s768

t769

-770

e771

e772

773

774

s775

.776

t777

;778

-779

780

y781

r782

s 783

s 784

785

- 786

e 787

788

. 789

- 790

- 791

t 792

s 793

- 794

. 795

e 796

t 797

, 798

Hard 97% 0% 3% −9.6

Table 2

Error rate for number of flowers and types per class for differe

# Flowers error ratio

Method Easy Average H

Median 0.34 ± 0.30 0.14 ± 0.20 0.

Maximum 0.11 ± 0.20 0.16 ± 0.25 0.

Easy Average Hard

Fan

tasy

0

0.5

(a) FantasyEa

Labe

l

0.5

1.0

Fig. 8. Percentage of workers that specified at least 1 a) fantasy, b

Easy Average Hard

Nr.

labe

ls

0

10

(a) Number of labels

Fig. 9. Number of (unique) labels on prints

Easy Average Hard

Per

cent

age

(%)

0

20

40

60

80

100

(a) Specificity of used labels by per prin

Fig. 10. Analysis of labe

such errors are accountable to badly followed instruction

(contrary to our flower definition, workers did not coun

flower buds) or to wrong interpretation of a flower compo

sition (some workers counting each individual flower of th

plant — a hyacinth — while others counted them as a singl

flower).

5.1.4. Interpretation of the results

Workers tended to under-specify the number of flower

and flowers types, especially for prints in the hard category

We account this result to the fatigue effect [32] that migh

occur when the number of visual classes instances increases

we interpret the propensity of workers to conservatively es

timate the number as an indication of genuine effort.

Despite no requirements from our side, a surprisingl

large proportion of the labels (23% in total and 61% of thei

Please cite this article as: J. Oosterman et al., On the impact of kn

tation of visual artworks, Computer Networks (2015), http://dx.do

4 71% 23% 6% −1.52 ± 1.88

gation methods in Artwork–centric annotations.

# Types error ratio

Easy Average Hard

5 0.11 ± 0.17 0.11 ± 0.23 0.30 ± 0.27

0 0.27 ± 0.54 1.27 ± 3.75 0.22 ± 0.25

rage Hard

elEasy Average Hard

Una

ble

0.3

0.4

0.5

(c) Unable

) unable across different print classes in Artwork–centric annotation.

Easy Average Hard

Nr.

uniq

ue la

bels

0

10

(b) Number of unique labels

ifferent classes in Artwork–centric annotation.

mmontanicalmmontanicalommonotanical

0

5

10

15

Worker Id0 5 10 15 20 25 30 35 40

(b) Vocabulary size (points) andlabel diversity (line) per worker

work–centric annotations.

vocabulary) used the botanical form for flower names. Thi

result hints to two possible explanations: (1) the annotator

knew the botanical name; or (2) annotators actively looked

up the flower in (Web) knowledge bases to retrieve a suit

able name. We interpret the result as a sign of knowledgeabl

and intrinsically motivated workers, which stood out from

the crowd despite the task complexity and moderate reward

No aggregation function consistently provided better per

formance. The maximum aggregation method performs bet

ter for the identification of number of flowers. This resul

can be intuitively explained as follows: even when worker

tend to under-specify the number of object instances, a sin

gle good performer suffices for good-quality identification

On the other hand, median has better performance in th

identification of number of flower types. Again, such a resul

can be motivated by the behaviour of crowd workers who

owledge extraction and aggregation on crowdsourced anno-

i.org/10.1016/j.comnet.2015.07.008

Page 11: ARTICLE IN PRESS · ARTICLE IN PRESS JID: COMPNW [m3Gdc;July 22, 2015;16:6] ComputerNetworksxxx(2015)xxx–xxx Contents lists available at ScienceDirect ComputerNetworks

J. Oosterman et al. / Computer Networks xxx (2015) xxx–xxx 11

ARTICLE IN PRESSJID: COMPNW [m3Gdc;July 22, 2015;16:6]

Table 3

Average difference and under-/over-specification of number of flowers and types by workers in Class–centric annotations.

Number of flowers Number of types

Under Equal Above Avg. diff. Under Equal Above Avg.diff

Easy 75% 15% 10% −3.70 ± 4.45 45% 20% 35% −1.29 ± 2.33

Average 72% 21% 7% −0.75 ± 0.86 45% 31% 24% −0.26 ± 0.69

Hard 87% 3% 10% −11.53 ± 14.83 71% 6% 23% −1.63 ± 2.14

Table 4

Quality and number of bounding boxes compared to ground truth annotations in Class–centric annotations.

Quality of workers Number of bounding boxes

Matching ratio cDist aOverlap Under Equal Above Avg. diff.

Easy 0.70 ± 0.30 0.15 ± 0.14 0.23 ± 0.19 80% 10% 10% −3.92 ± 4.54

Average 0.76 ± 0.29 0.25 ± 0.30 0.49 ± 0.20 72% 28% 0% −1.20 ± 1.31

Hard 0.43 ± 0.28 0.33 ± 0.45 0.59 ± 0.23 100% 0% 0% −12.59 ± 15.13

on average, were more often correct. In such a condition, us-799

ing the majority of worker annotations is a better approach,800

which can compensate to some extent the presence of out-801

liers.802

5.2. Class–centric knowledge extraction evaluation803

Eighty-fourworkers started the qualification tasks, out of804

which 21 (25%) failed. Of the remaining 63 workers, 17 (27%)805

decided to stop at the qualification task. The remaining 46806

workers created 552 annotations. All labels were provided in807

English. On average each worker annotated 12.0 prints (σ =808

17.1). Despite the high variance each print was sufficiently809

annotated by, on average, 6.9 workers (σ = 1.37). A total of810

3442 bounding boxes were drawn.811

5.2.1. Identification812

Table 3 reports the identification performance in terms of813

number of flowers and number of types for Class–centric an-814

notations. Workers tend to define fewer flowers and fewer815

flower types than the ones actually contained in prints, es-816

pecially for the hard ones. On the other hand, average817

prints, which contains the least number of flowers and types,818

are the ones where workers perform best. Table 4 shows819

the quality of created bounding boxes across different print820

classes. Matching ratio in this table denotes the ratio821

o822

g823

824

w825

in826

re827

in828

ra829

m830

b831

p832

n833

a834

835

te836

ca837

specified by workers. When comparing these result with the 838

ones in Table 3, we observe how workers often (22% of the 839

executed tasks) draw less bounding boxes than the ones they 840

specify in the dedicated text field of the user interface for the 841

same annotated print. Only in a handful of tasks (0.7%) they 842

drew more bounding boxes. 843

5.2.2. Recognition 844

Workers drew 3,442 bounding boxes, of which 1,583 845

(46%) contained a label that could be mapped to a DBPedia 846

resource. 616 (18%) were annotated as fantasy, while for 1149 847

bounding boxes (33%) workers indicated they were unable 848

to name the flower. The ratio of unable annotations is sim- 849

ilarly distributed across print difficulty class — respectively 850

34% ineasy prints , 31% inaverage prints, and 37% inhard 851

prints. The size of bounding boxes has no effect on the ability 852

of workers to provide a label for a flower. The same applies to 853

the number of flowers or flower types in the print. 854

Fig. 11 a shows the specificity of the provided labels 855

per print difficulty category. Both family names (the most 856

generic) and botanical forms of species are rarely used. 857

The large majority of labels, for all three difficulty classes 858

uses the genus specificity in common name form. For prints 859

in the average difficulty class less species labels are used 860

than in the other classes. 861

In total 1566 labels corresponding to flower names were 862

p 863

fl 864

c 865

sp 866

2 867

h 868

869

le 870

th 871

is 872

u 873

(6 874

in 875

5 876

877

centric configuration, in Table 5 we first report the 878

P

t

f matched bounding boxes out of the ones defined in the

round truth for the considered prints.

The better matching ratio is obtained for average prints,

hile hard prints have significantly worse workers’ bound-

g box quality. Considering the reference ground-truth, this

sult suggests a correlation between the number of flowers

a print, as prints with less flower features better matching

tio. cDist and aOverlap are, however, measures which are

ore influenced by the sizes of the ground truth bounding

oxes: intuitively, it is easier for workers to more accurately

osition boxes for large flowers, although their overlap is not

ecessarily better (i.e. it is more difficult for workers to draw

ccurate bounding boxes).

In the right part of Table 4 we report the performance, in

rms of under-/over-specification of number flowers, which

n be achieved by counting the number of bounding boxes

lease cite this article as: J. Oosterman et al., On the impact of know

ation of visual artworks, Computer Networks (2015), http://dx.doi.o

rovided. Workers featured a vocabulary of 45 distinct

owers labels (43 after reconciliation of botanical and

ommon version of the same flower name). On average, re-

ectively 4.05 (σ = 3.78), 2.03 (σ = 1.12), and 2.97 (σ =.18) unique labels were provided for easy, average, and

ard prints.

Crowd workers specified 24% of the labels at speciesvel, 69% at the genus level, and 7% at the family level of

e flower taxonomy. Thirty-six percent of their vocabulary

expressed in botanical form. The label most frequently

sed by workers are Rose (43%), Tulip (14%), Lily (11%), Daisy

%) and Sunflower (4%). The botanical form of labels is used

6% of the labels.

.2.3. Aggregation

Similarly to the analysis performed for the Artwork–

ledge extraction and aggregation on crowdsourced anno-

rg/10.1016/j.comnet.2015.07.008

Page 12: ARTICLE IN PRESS · ARTICLE IN PRESS JID: COMPNW [m3Gdc;July 22, 2015;16:6] ComputerNetworksxxx(2015)xxx–xxx Contents lists available at ScienceDirect ComputerNetworks

12 J. Oosterman et al. / Computer Networks xxx (2015) xxx–xxx

ARTICLE IN PRESSJID: COMPNW [m3Gdc;July 22, 2015;16:6]

HardAverageEasy

Family commonFamily botanicalGenus commonGenus botanicalSpecies commonSpecies botanical

Per

cent

age

(%)

0

20

40

60

80

100

(a) Specificity of used labels by per print class.

0

5

10

15

Worker Id0 5 10 15 20 25 30 35

(b) Vocabulary size (points) andlabel diversity (line) per worker

Fig. 11. Analysis of labels in Class–centric annotations.

Table 5

Error rate for number of flowers and types per class for different aggregation methods in Class–centric annotation.

Flowers error ratio Types error ratio

Method Easy Average Hard Easy Average Hard

Median 0.25 ± 0.29 0.12 ± 0.17 0.48 ± 0.27 0.17 ± 0.28 0.13 ± 0.23 0.40 ± 0.31

Maximum 0.12 ± 0.22 0.18 ± 0.24 0.46 ± 0.70 0.57 ± 0.87 1.11 ± 4.04 0.72 ± 0.75

sters K.

H

0

0

0

-879

-880

r881

f882

883

884

885

-886

g887

s888

889

-890

891

-892

e893

-894

y895

g896

897

r898

t899

.900

e901

g902

-903

r904

-905

906

e 907

7 908

t 909

t 910

- 911

912

- 913

t 914

915

g 916

g 917

. 918

919

- 920

921

922

923

924

. 925

, 926

Table 6

Error rates of different methods for setting the number of clu

Error rate

Method Easy Average

Max 0.15 ± 0.28 0.11 ± 0.19

Median 0.23 ± 0.28 0.12 ± 0.16

BIC 0.96 ± 0.82 1.26 ± 0.97

performance of different aggregations methods on the num

ber of flowers and number of flower types explicitly speci

fied by workers. Median always outperforms maximum fo

the aggregation of the number of types. For the number o

flowers however, maximum outperforms median in both

easy and hard prints, while having worse performance in

average prints.

Next, we compare the performance of the three K esti

mation techniques per print difficulty class for the clusterin

step of our novel algorithm presented in Section 4.6.3. To thi

end, we define the error rate

ξ = |#BBs − #FLgt |#FLgt

which measures the normalised difference between K de

termined by a configuration and the ground truth. Table 6

reports the resulting performance. Simpler methods sub

stantially outperform BIC. This results suggest that, for th

purpose of estimating the number of clusters, the num

ber of bounding boxes given by the workers is a less nois

signal than the overall coordinates of all input boundin

boxes. Maximum outperforms median in both the easy and

hard difficulty classes, while being only slightly better fo

average prints. The result can be justified by the fact tha

the average category have the lowest number of flowers

By comparing the figures in Table 6 and Table 5, we observ

how using the maximum value is best both for aggregatin

the reported number of flowers and bounding boxes; how

ever, for average and hard prints, estimating the numbe

of flowers in a print using bounding boxes lead to more pre

cise results.

Please cite this article as: J. Oosterman et al., On the impact of kn

tation of visual artworks, Computer Networks (2015), http://dx.do

Error sign

ard Under Equal Above

.34 ± 0.26 32% 39% 29%

.49 ± 0.26 68% 30% 2%

.59 ± 0.42 45% 5% 50%

Table 7

Performance of different algorithm configurations.

Configuration Matching ratio cDist aOverlap

Geometric_median 0.46 ± 0.12 0.31 ± 0.20 0.45 ± 0.16

k-means_median 0.82 ± 0.22 0.28 ± 0.20 0.48 ± 0.20

GMM_median 0.81 ± 0.22 0.28 ± 0.20 0.49 ± 0.16

Geometric_max 0.48 ± 0.29 0.58 ± 0.51 0.28 ± 0.18

k-means_max 0.83 ± 0.21 0.49 ± 0.92 0.38 ± 0.18

GMM_max 0.83 ± 0.21 0.48 ± 0.92 0.39 ± 0.18

Being the best aggregation method, we select the valu

returned by maximum as an input for the next steps. Table

compares the performance in terms of matching ratio, cDis

and aOverlap of the different algorithm configurations tha

can be obtained by varying clustering techniques and strate

gies for picking the representative.

In all clustering techniques, median performs signifi

cantly better than maximum for both cDist and aOverlap, bu

with a slightly worse Matching ratio. Such a result can

be explained by the fact that larger representative boundin

boxes are more likely to match with ground truth boundin

boxes, but will also feature bigger and less accurate areas

Among the tested clustering techniques, the ones based on

unsupervised learning clearly outperform the simple geomet

ric clustering, while there is no significant difference between

k-means and GMM.

To conclude, we investigate how the performance of GMMand k-means vary according to the print difficulty class, and

report the results Fig. 12 (red for k-means and blue for GMM)

It can be observed how GMM is less effective for easy prints

owledge extraction and aggregation on crowdsourced anno-

i.org/10.1016/j.comnet.2015.07.008

Page 13: ARTICLE IN PRESS · ARTICLE IN PRESS JID: COMPNW [m3Gdc;July 22, 2015;16:6] ComputerNetworksxxx(2015)xxx–xxx Contents lists available at ScienceDirect ComputerNetworks

J. Oosterman et al. / Computer Networks xxx (2015) xxx–xxx 13

ARTICLE IN PRESSJID: COMPNW [m3Gdc;July 22, 2015;16:6]

HardAverageEasy

Mat

chin

g R

atio

0

1

(a) Matching ratioAverage HardEasy

cDis

t

0

1

(b) Centroid distance

Easy Average Hard

aOve

rlap

0

0.5

(c) Area overlapping

Fig. 12. kmeans_median (red) and GMM_median (blue) performance on images of different difficulty class.

while more effective for hard prints. Technically, the result927

could be explained by the nature of GMM, a more flexible ver-928

sion of k-means that models the variance and covariance929

of bounding box coordinate vector. The coordinates of end-930

ing point in drawing a bounding box for small flower is more931

influenced by the starting point than for large flower, which932

could make GMM perform better than the simpler k-means933

clustering on hard prints; on the other hand, by being less934

robust than k-means on this aspect, GMM performs worse935

in easy prints, where the effect of little pointing errors are936

more evident. We note that for their performance, however,937

none of the differences in the three print classes are statisti-938

cally significant.939

5.2.4. Interpretation of the results940

941

th942

b943

te944

b945

e946

n947

in948

b949

co950

h951

th952

p953

e954

th955

ce956

a957

958

o959

p960

w961

(e962

fo963

m964

965

p966

re967

b968

th969

so970

e971

types in a print, the maximum aggregation function is better 972

suited for the estimation of the number of flower instances 973

which, in turns, provide better identification (matching ra- 974

tio) performance. In contrast, the location of bounding boxes 975

(centre and area) can be better achieved using a consensus- 976

based function such as median. 977

These results demonstrate how, in this context, and con- 978

trary to traditional image annotation techniques, better out- 979

comes can be achieved by selectively using methods not 980

based on crowd consensus. This is supported by recent work 981

on crowd disagreement [33]. 982

6. Discussion 983

This section elaborates on the results reported in 984

Section 5 with respect to the research questions defined in 985

the introduction. 986

6.1. Research question 1 987

In this section we provide answers to RQ1: Can non- 988

professional annotators from crowdsourcing platforms provide 989

high quality artwork annotations? 990

6.1.1. Identification 991

Section 5.1 and Section 5.2 provide quantitative evidence 992

o 993

ca 994

th 995

m 996

ti 997

fo 998

e 999

8 1000

is 1001

C 1002

si 1003

m 1004

h 1005

w 1006

a 1007

a 1008

to 1009

p 1010

im 1011

th 1012

P

t

In this configuration, workers were asked to both count

e number of flowers in the print, and to draw bounding

oxes around them (see Section 4); a counter in the user in-

rface reported the number of currently drawn bounding

oxes. Despite the presence of a self-defined anchor, work-

rs often drew a number of bounding boxes less than the

umber of identified flowers. The result can be interpreted

two ways. On the one hand, as there was a direct relation

etween the number and size of flowers in prints, workers

uld have simply ignored very small instances; on the other

and, due to fatigue, workers simply stop annotating even if

ey knew that more flowers existed in the print. An analysis

erformed over several annotation tasks suggests that both

xplanations are correct to some extent. The result hints to

e importance of intrinsically motivated workers for Class–

ntric artwork annotation, as a high number of instances to

nnotate might be discouraging.

Annotation specificity hints at a similar conclusion: 36%

f the annotations, but only 6% of their vocabulary, are ex-

ressed in botanical form. This suggests less knowledgeable

orkers, but also workers less available for online research

.g. in knowledge bases), possibly due to the additional ef-

rt needed to draw bounding boxes, which diminishes their

otivation to extend their domain knowledge.

Our method for bounding boxes aggregation shows

romising results. The significantly improved aggregation

sults (up to 45% improvement in matching ratio, 20% in

ounding box distance, and 30% in area overlap) achieved by

e GMM and k-means algorithms demonstrates the need for

phisticated aggregation techniques. Due to workers gen-

rally under-specifying the number of flowers and flowers

lease cite this article as: J. Oosterman et al., On the impact of know

ation of visual artworks, Computer Networks (2015), http://dx.doi.o

f the effectiveness of crowd workers in identifying and lo-

ting flower instances. Despite the moderate reward and

e demanding nature of the annotation tasks, our experi-

ents attracted a considerable number of skilled and mo-

vated workers; they often matched the identification per-

rmance of our trusted annotators, especially on prints of

asy and average difficulty (matching rates of 89% and

4%, respectively); hard prints lead to worse, but still sat-

factory figures (a matching rate of 59%). Even with the

lass–centric configuration, which was more labour inten-

ve, workers achieved good location accuracy (bounding box

atching rates of 70%, 76% and 43% for easy, average and

ard prints, respectively). These results are indicative of the

illingness and ability of crowds to identify visual classes in

rtworks. Indeed, the identification of elements on images is

task that is familiar to crowd workers and requires little

no domain knowledge. On the other hand, artworks often

resent additional complexities with respect to photographic

ages (e.g. abstract, symbolic, or allegoric interpretations):

e results of our experiments are in line with the outcomes

ledge extraction and aggregation on crowdsourced anno-

rg/10.1016/j.comnet.2015.07.008

Page 14: ARTICLE IN PRESS · ARTICLE IN PRESS JID: COMPNW [m3Gdc;July 22, 2015;16:6] ComputerNetworksxxx(2015)xxx–xxx Contents lists available at ScienceDirect ComputerNetworks

14 J. Oosterman et al. / Computer Networks xxx (2015) xxx–xxx

ARTICLE IN PRESSJID: COMPNW [m3Gdc;July 22, 2015;16:6]

Table 8

Comparison of number of common and new labels, split by print difficulty class and aggregation methods, in Artwork–centric

annotations.

Matching label New label

Method Easy Average Hard Easy Average Hard

All 1.30 ± 0.80 0.79 ± 0.50 1.00 ± 0.79 3.00 ± 2.88 1.25 ± 2.35 3.45 ± 2.35

Lab.freq. ≥ 2 0.95 ± 0.60 0.68 ± 0.55 0.65 ± 0.67 0.65 ± 1.09 0.29 ± 0.46 0.65 ± 0.81

Majority 0.70 ± 0.66 0.64 ± 0.56 0.40 ± 0.50 0.15 ± 0.37 0.18 ± 0.39 0.25 ± 0.55

Table 9

Label comparison per print difficulty class for different aggregation methods in visual class annotation in Class–centric annotation.

Matching label New label

Method Easy Average Hard Easy Average Hard

1.05

0.80

0.80

-1013

l1014

1015

1016

-1017

-1018

-1019

e1020

-1021

1022

1023

1024

1025

-1026

1027

-1028

e1029

e1030

1031

o1032

1033

y1034

1035

-1036

1037

-1038

s1039

:1040

1041

.1042

1043

r1044

s1045

s1046

s1047

1048

-1049

-1050

-1051

.1052

s1053

- 1054

- 1055

1056

l 1057

1058

r 1059

y 1060

- 1061

1062

1063

- 1064

d 1065

d 1066

1067

1068

f 1069

- 1070

1071

1072

1073

1074

e 1075

o 1076

t 1077

- 1078

f 1079

- 1080

o 1081

- 1082

1083

s 1084

e 1085

e 1086

s 1087

1088

- 1089

e 1090

l 1091

All 1.21 ± 1.24 0.79 ± 0.57

Label.freq. ≥ 2 1.11 ± 1.24 0.64 ± 0.62

Majority 0.95 ± 1.18 0.64 ± 0.62

of other recent studies [25,26], and provide additional evi

dences on the suitability of crowdsourcing as an accurate too

for cultural heritage content annotation.

6.1.2. Recognition

In terms of recognition performance, workers consis

tently provide a high number of labels and show a rich vo

cabulary. In Table 8 and Table 9 we compare, for each experi

mental configuration, the number of unique labels which ar

provided by the domain experts and the crowd (Matching la

bel) and the number of unique labels provided by the crowd

but not by the experts (New label)

The vocabulary size of crowd workers is comparable (58

and 43 compared to 52 for the Artwork–centric and Class–

centric, respectively) to the one of experts, and, for both con

figurations, we observed how workers often used labels with

lower specificity (genus and family) in the flower classifi

cation taxonomy. This result suggests familiarity with th

domain-specific vocabulary. On the other hand, it can also b

interpreted as an indicator of the potential information need

of crowd-workers, who prefer using laymen terminology t

describe botanical entities: workers were allowed to look up

flower names online, but they deliberately choose to specif

common names at lower level of specificity.

The previous interpretation is supported by the follow

ing observation. In both configurations (Artwork–centric and

Class–centric), and even using the most conservative la

bel aggregation policy (i.e. using all crowd labels), expert

and crowd workers respectively share a limited vocabulary

34%, 28%, 34% (Artwork–centric) and 43%, 44%, 52% (Class–

centric) respectively for easy, average, and hard prints

The size of the shared vocabulary only slightly decreases (on

average) with stricter aggregation conditions. On the othe

hand, the decrease is more evident in terms of new label

(right part of Table 8 and Table 9): the number of new label

introduced by crowd workers is relatively consistent acros

aggregation methods and print difficulty.

Both crowd workers and experts showed a similar ten

dency to have low agreement on their labels. This is an in

teresting phenomena, also observed in other knowledge

intensive content annotation use cases (e.g. medical [33])

The result suggests the need for more articulated annotation

Please cite this article as: J. Oosterman et al., On the impact of kn

tation of visual artworks, Computer Networks (2015), http://dx.do

± 1.00 2.84 ± 2.98 1.18 ± 1.06 2.35 ± 2.39

± 0.89 1.79 ± 2.04 0.89 ± 0.88 1.70 ± 1.42

± 0.89 1.53 ± 1.58 0.89 ± 0.83 1.45 ± 1.19

campaigns, possibly organised in workflows [34] that inter

leave automatic and human operations. While this is sub

ject of future work, we can envision the following annotation

flow: crowd labels are used to instrument a Web retrieva

step, where images of flowers associated with the provided

label are collected. Such images can then be used in anothe

crowdsourcing task as a comparison term, to visually verif

the similarity of the labelled flower instance with the real

world examples.

6.2. Research question 2

In this section we provide answers to RQ2: To what ex

tent can the extraction and aggregation steps of a crowdsource

knowledge creation process influence the identification an

recognition aspects of visual artwork annotation?

6.2.1. Extraction

The experimental evaluation shows that the adoption o

different knowledge extraction interfaces has a relevant ef

fect on the identification and recognition performance. Such

an effect is not uniformly distributed across print annotation

difficulties, but it allows for interesting considerations.

By comparing the figures of Table 1 (Artwork–centric) and

Table 3 (Class–centric) we observe how the presence of th

bounding box functionality, which should push workers t

a more precise identification of flower instances, does no

result in a significant difference (less than 10%) in the dis

tribution of under- and over-specification of the number o

flowers. However, we observe that using drawn bounding

boxes as a way to count the number of flowers do lead t

significantly better results (see Tables 2, 5 and 6). This is es

pecially observable with more difficult prints; 21% and 24%

decrease of the error rate for average and hard print

using the optimal aggregation method, respectively. On th

other hand, the identification performance concerning th

number of flower types shows a different trend, as worker

in Artwork–centric annotation achieve a lower error than in

Class–centric annotation. These results suggest that the iden

tification aspect could benefit from an annotation interfac

that benefits from both global (Artwork–centric) and loca

owledge extraction and aggregation on crowdsourced anno-

i.org/10.1016/j.comnet.2015.07.008

Page 15: ARTICLE IN PRESS · ARTICLE IN PRESS JID: COMPNW [m3Gdc;July 22, 2015;16:6] ComputerNetworksxxx(2015)xxx–xxx Contents lists available at ScienceDirect ComputerNetworks

J. Oosterman et al. / Computer Networks xxx (2015) xxx–xxx 15

ARTICLE IN PRESSJID: COMPNW [m3Gdc;July 22, 2015;16:6]

(Class–centric) knowledge extraction interfaces, to be used1092

according to the difficulty of the print to annotate.1093

Different observations can be made for the recognition as-1094

pect. The presence of a bounding box forces workers into pro-1095

viding a label for each identified visual class instance. Despite1096

the availability of a “Don’t Know” option for labels, workers1097

often provided the same annotation even for flower having1098

clear visual differences. Moreover, by comparing Fig. 10 with1099

Fig. 11, we observe that the labels provided in the Class–1100

centric configuration are more frequent of a more generic1101

nature (genus and family) compared to the Artwork–centric1102

configuration. A higher proportion of labels is also specified1103

using the common name. The vocabulary size of workers in1104

the Class–centric configuration is also smaller. These results1105

suggest that, while a Class–centric configuration can help1106

obtaining higher-granularity annotations, the overhead re-1107

quired for drawing bounding boxes might penalise the recog-1108

nition capabilities of workers. A possible explanation of this1109

result can be attributed to the well-known fatigue effect1110

[31111

w1112

in1113

K1114

th1115

a1116

n1117

re1118

im1119

th1120

61121

1122

p1123

m1124

p1125

n1126

ti1127

1128

q1129

e1130

in1131

d1132

th1133

ti1134

e1135

m1136

ti1137

p1138

ta1139

1140

cl1141

fe1142

im1143

te1144

to1145

a1146

71147

1148

d1149

and aimed at studying how different knowledge extraction 1150

and aggregation configurations affect the identification and 1151

recognition aspects of artwork annotation. We instrumented 1152

two knowledge extraction configurations: an Artwork–centric 1153

design, where textual annotations about visual objects are 1154

specified for the whole artwork; and a Class–centric de- 1155

sign, where occurrences of visual objects are identified using 1156

bounding boxes with distinct textual annotations. To support 1157

the Class–centric design, we proposed a novel bounding-box 1158

aggregation algorithm. Then, we experimented with differ- 1159

ent annotation aggregation methods, and tested their impact 1160

on identification and recognition performance. 1161

We engaged with 235 workers from a crowdsourcing plat- 1162

form, and asked them to annotate 80 Rijksmuseum prints of 1163

varying annotation difficulty; easy, where both identification 1164

and recognition of flower instances is easy; average, where 1165

identification is easy and recognition is difficult; and hard, 1166

where both identification and recognition is difficult. 1167

Both knowledge extraction configurations (Artwork– 1168

c 1169

ti 1170

o 1171

a 1172

to 1173

o 1174

1175

a 1176

c 1177

re 1178

C 1179

b 1180

su 1181

ta 1182

re 1183

re 1184

1185

le 1186

1187

d 1188

m 1189

su 1190

th 1191

a 1192

o 1193

ie 1194

1195

fu 1196

d 1197

m 1198

w 1199

in 1200

in 1201

b 1202

m 1203

1204

e 1205

m 1206

p 1207

a 1208

im 1209

m 1210

P

t

2], that often occurs with repetitive tasks. Due to fatigue,

orkers could be led to provide wrong labels, thus introduc-

g noise. On the other hand, the cost of providing a “Don’t

now” annotation was of a single click, considerably lower

an typing a work, or copy/paste it. We weren’t able to find

comprehensive explanation to justify the different recog-

ition performance with the Class–centric configuration. We

ly on future work to obtain a better understanding of the

pact of the cognitive load of an annotation interface over

e quality of the retrieved annotation.

.2.2. Aggregation

The experiments show that the aggregation step can im-

act the quality the visual class identification results. The

aximum and median functions find different optimal ap-

lications. The former allowed a better estimation of the

umber of flowers, whole the latter was more suited to es-

mate the number of flower types contained in the print.

Once more, we can explain these results as a conse-

uence of a fatigue effect [32] that emerged with work-

rs. This was most apparent for images with many small

stances in the Class–centric configuration. Some workers

rew fewer bounding boxes than the number of flowers

ey had counted and reported. Countermeasures to the fa-

gue effect are also studied in the field of crowdsourcing, for

xample in [35], where they studied the effect of inserting

icro-breaks during tasks. Our results show that the adop-

on of a different aggregation function, e.g. maximum, can

rovide satisfactory results without the need for additional

sk execution time and, consequently, cost.

Altogether, the results discussed in this sub-section

early indicate how artworks annotation demands from dif-

rent aggregation techniques with respect to photographic

age annotation: consensus-based knowledge aggregation

chniques [4,16,17] need to be supported by other methods,

counter some of the additional visual complexity of

rtworks.

. Conclusion

In this paper we report the results of an evaluation, con-

ucted in collaboration with the Rijksmuseum Amsterdam,

lease cite this article as: J. Oosterman et al., On the impact of know

ation of visual artworks, Computer Networks (2015), http://dx.doi.o

entric and Class–centric) resulted in satisfactory identifica-

on performance (Sections 5.1.1, 5.2.1 and 6.1.1). For tasks

f easy and average difficulty crowd workers can achieve

n identification performance comparable to trusted annota-

rs. However, as print difficulty increases the performance

f crowd workers lowers considerably.

In terms of recognition performance (Sections 5.1.2, 5.1.2,

nd 6.1.2), we observed that the crowd provided a rich vo-

abulary with little overlap with respect to domain experts,

gardless of the knowledge extraction configuration. In the

lass–centric configuration more workers provide a single la-

el than in the Artwork–centric configuration. These results

ggest that, while a Class–centric configuration can help ob-

ining higher-granularity annotations, the work overhead

quired for drawing bounding boxes might penalise the

cognition capabilities of workers.

Crowd workers consistently provide labels at varying

vel of specificity

(species, genus, family) and show their familiarity with

omain specific (i.e. botanical) names. The crowd provides

ore distinct labels per image than domain experts. This

ggests an opportunity for the creation of annotation sets

at are complementary to the ones of experts, and that can

ccommodate a broader variety of information needs. On the

ther hand, the general low agreement calls for further stud-

s, in order to better assess the quality of each annotation.

Our experiments with different annotation aggregation

nctions (Sections 5.1.3, 5.2.3 and 6.2.2) show performance

iversities. In terms of identification performance, the

aximum function outperforms the median, to counter the

orkers’ tendency to under-specify the number of identified

stances in complex artworks. On the other hand, aggregat-

g the location of bounding boxes (centre and area) can be

etter achieved using a consensus-based function such as

edian.

These results shows how: (1) the adoption of differ-

nt knowledge extraction configurations and aggregation

ethods influences both the identification and recognition

erformance; (2) artworks annotation demands for different

ggregation techniques than the ones used for photographic

age annotation (e.g. image-centric annotations and

ajority voting).

ledge extraction and aggregation on crowdsourced anno-

rg/10.1016/j.comnet.2015.07.008

Page 16: ARTICLE IN PRESS · ARTICLE IN PRESS JID: COMPNW [m3Gdc;July 22, 2015;16:6] ComputerNetworksxxx(2015)xxx–xxx Contents lists available at ScienceDirect ComputerNetworks

16 J. Oosterman et al. / Computer Networks xxx (2015) xxx–xxx

ARTICLE IN PRESSJID: COMPNW [m3Gdc;July 22, 2015;16:6]

-1211

e1212

1213

e1214

f1215

1216

e1217

1218

t1219

e1220

-1221

s1222

1223

1224

l1225

e1226

1227

1228

1229

-1230.1231

1232-1233-1234.1235

1236d1237-1238.1239

1240:1241-1242-1243,1244

1245-1246f1247

d1248.1249

1250-1251)1252

1253g1254d1255n1256,1257

1258l-1259e1260n1261/1262

1263t1264).1265

1266e1267-1268

1269l1270-1271

1272t1273-1274

1275y127661277

1278

’s 1279b 1280

1281- 1282e 1283- 1284- 1285– 1286

1287: 1288- 1289. 1290

1291, 1292- 1293e 1294

1295k 1296s, 1297, 1298

1299i- 1300s 1301, 1302

1303s 1304e 1305n 1306- 1307. 1308

1309s 1310

1311, 1312- 1313: 1314, 1315

1316i- 1317l 1318

n 13191320

e 1321f 1322

l- 1323. 1324

1325k 1326n 1327l 1328, 1329

1330: 1331I 1332, 1333/ 1334

1335n 1336- 1337, 1338, 1339

1340, 1341l 1342. 1343

1344- 1345- 1346- 1347. 1348

13491, 1350

1351) 1352

1353k 1354s 1355/ 1356

1357

While promising, these results were obtained by study

ing a single knowledge domain. Further investigations ar

needed in order to assess the impact of the crowdsourced

knowledge creation process in area of knowledge that ar

less common in the general population (e.g. annotation o

birds, castles, etc.). As part of the future work we also plan

to test the impact of other steps of the Crowd Knowledg

Creation process, e.g. the discovery of expert workers from

the crowd to dynamically assign annotation tasks to the mos

suited performer, also proposed as challenge in “The Futur

of Crowd Work” [34] . Other future work includes the assess

ment of our novel algorithm for aggregation bounding boxe

in other contexts and on other datasets.

Acknowledgements

This publication was supported by the Dutch nationa

program COMMIT. We like to thank the members of th

SEALINCMedia worktable and in particular the Rijksmuseum

Amsterdam for their support.

References

[1] G. Carneiro, Artistic image analysis using graph-based learning ap

proaches, IEEE Trans. Image Process. 22 (8) (2013) 3168–3178http://dx.doi.org/10.1109/TIP.2013.2260167.

[2] Q. Le, Building high-level features using large scale unsupervised learning, in: 2013 IEEE International Conference on Acous

tics, Speech and Signal Processing (ICASSP), 2013, pp. 8595–8598

http://dx.doi.org/10.1109/ICASSP.2013.6639343.[3] Q. Wang, K. Boyer, Feature learning by multidimensional scaling an

its applications in object recognition, in: Conference on Graphics, Patterns and Images (SIBGRAPI), 2013 26th SIBGRAPI, 2013, pp. 8–15

http://dx.doi.org/10.1109/SIBGRAPI.2013.11.[4] S. Nowak, S. Rüger, How reliable are annotations via crowdsourcing

a study about inter-annotator agreement for multi-label image anno

tation, in: Proceedings of the International Conference on Multimedia Information Retrieval, in: MIR ’10, ACM, New York, NY, USA, 2010

pp. 557–566. http://doi.acm.org/10.1145/1743384.1743478.[5] T. Yan, V. Kumar, D. Ganesan, Crowdsearch: exploiting crowds for ac

curate real-time image search on mobile phones, in: Proceedings othe 8th International Conference on Mobile Systems, Applications, an

Services, in: MobiSys ’10, ACM, New York, NY, USA, 2010, pp. 77–90

http://doi.acm.org/10.1145/1814433.1814443.[6] N. Sawant, J. Li, J.Z. Wang, Automatic image semantic interpretation us

ing social action and tagging data, Multimedia Tools Appl. 51 (1) (2011213–246. http://dx.doi.org/10.1007/s11042-010-0650-8.

[7] S. Park, G. Mohammadi, R. Artstein, L.-P. Morency, Crowdsourcinmicro-level multimedia annotations: the challenges of evaluation an

interface, in: Proceedings of the ACM Multimedia 2012 Workshop oCrowdsourcing for Multimedia, in: CrowdMM ’12, ACM, New York, NY

USA, 2012, pp. 29–34. http://doi.acm.org/10.1145/2390803.2390816.

[8] J. Oosterman, et al., Crowd versus experts: Nichesourcing for knowedge intensive tasks in cultural heritage, in: Proceedings of th

Companion Publication of the 23rd International Conference oWorld Wide Web, 2014, pp. 567–568. http://dx.doi.org/10.1145

2567948.2576960.[9] E.L. Spratt, A. Elgammal, Computational beauty: aesthetic judgmen

at the intersection of art and science, CoRR abs/1410.2488 (2014

http://arxiv.org/abs/1410.2488.[10] J. Whitehill, T.-f. Wu, J. Bergsma, J.R. Movellan, P.L. Ruvolo, Whose vot

should count more: optimal integration of labels from labelers of unknown expertise, in: NIPS’09, 2009, pp. 2035–2043.

[11] P. Welinder, S. Branson, P. Perona, S.J. Belongie, The multidimensionawisdom of crowds, in: Advances in neural information processing sys

tems, 2010, pp. 2424–2432.

[12] H. Su, J. Deng, L. Fei-Fei, Crowdsourcing annotations for visual objecdetection, in: Workshops at the Twenty-Sixth AAAI Conference on Ar

tificial Intelligence, 2012.[13] J. Trant, B. Wyman, Steve, Investigating social tagging and folksonom

in art museums with steve.museum, in: Proceedings of the WWW’0Collaborative Web Tagging Workshop, 2006.

Please cite this article as: J. Oosterman et al., On the impact of kn

tation of visual artworks, Computer Networks (2015), http://dx.do

[14] A. Ellis, D. Gluckman, A. Cooper, A. Greg, Your paintings: a nationoil paintings go online, tagged by the public, Museum and the We

(2012).[15] M.C. Traub, J. van Ossenbruggen, J. He, L. Hardman, Measuring the ef

fectiveness of gamesourcing expert oil painting annotations, in: M. dRijke, T. Kenter, A. de Vries, C. Zhai, F. de Jong, K. Radinsky, K. Hof

mann (Eds.), Advances in Information Retrieval, Lecture Notes in Com

puter Science, 8416, Springer International Publishing, 2014, pp. 112123. http://dx.doi.org/10.1007/978-3-319-06028-6_10.

[16] L. von Ahn, L. Dabbish, Labeling images with a computer game, inProceedings of the SIGCHI Conference on Human Factors in Comput

ing Systems, in: CHI ’04, ACM, New York, NY, USA, 2004, pp. 319–326http://doi.acm.org/10.1145/985692.985733.

[17] R. Gligorov, L.B. Baltussen, J. van Ossenbruggen, L. Aroyo, M. BrinkerinkJ. Oomen, A. van Ees, Towards integration of end-user tags with pro

fessional annotations, in: Proceedings of the WebSci10: Extending th

Frontiers of Society On-Line, 2010.[18] C. Wieser, F. Bry, A. Bérard, R. Lagrange, Artigo: building an artwor

search engine with games and higher-order latent semantic analysiin: First AAAI Conference on Human Computation and Crowdsourcing

2013.[19] A. Kittur, E.H. Chi, B. Suh, Crowdsourcing user studies with mechan

cal turk, in: Proceedings of the SIGCHI Conference on Human Factor

in Computing Systems, in: CHI ’08, ACM, New York, NY, USA, 2008pp. 453–456. http://doi.acm.org/10.1145/1357054.1357127.

[20] R. Snow, B. O’Connor, D. Jurafsky, A.Y. Ng, Cheap and fast—but iit good?: Evaluating non-expert annotations for natural languag

tasks, in: Proceedings of the Conference on Empirical Methods iNatural Language Processing, in: EMNLP ’08, Association for Com

putational Linguistics, Stroudsburg, PA, USA, 2008, pp. 254–263

http://dl.acm.org/citation.cfm?id=1613715.1613751.[21] A. Kittur, Crowdsourcing, collaboration and creativity, ACM Crossroad

17 (2) (2010) 22–26.[22] V. Boer, M. Hildebrand, L. Aroyo, P. Leenheer, C. Dijkshoorn, B. Tesfa

G. Schreiber, Nichesourcing: harnessing the power of crowds of experts, in: Knowledge Engineering and Knowledge Management, in

Lecture Notes in Computer Science, 7603, Springer Berlin Heidelberg

2012, pp. 16–20. http://doi.acm.org/10.1007/978-3-642-33876-2_3.[23] K. Heimerl, B. Gawalt, K. Chen, T. Parikh, B. Hartmann, Commun

tysourcing: engaging local crowds to perform expert work via physicakiosks, in: Proceedings of the 2012 ACM annual conference on Huma

Factors in Computing Systems, ACM, 2012, pp. 1539–1548.[24] A. Bozzon, M. Brambilla, S. Ceri, M. Silvestri, G. Vesci, Choosing th

right crowd: Expert finding in social networks, in: Proceedings o

the 16th International Conference on Extending Database Technoogy, in: EDBT ’13, ACM, New York, NY, USA, 2013, pp. 637–648

http://doi.acm.org/10.1145/2452376.2452451.[25] N.F. Noy, J. Mortensen, M.A. Musen, P.R. Alexander, Mechanical tur

as an ontology engineer?: Using microtasks as a component of aontology-engineering workflow, in: Proceedings of the 5th Annua

ACM Web Science Conference, in: WebSci ’13, ACM, New York, NY, USA

2013, pp. 262–271. http://doi.acm.org/10.1145/2464464.2464482.[26] L.B. Chilton, G. Little, D. Edge, D.S. Weld, J.A. Landay, Cascade

crowdsourcing taxonomy creation, in: Proceedings of the SIGCHConference on Human Factors in Computing Systems, in: CHI ’13

ACM, New York, NY, USA, 2013, pp. 1999–2008. http://doi.acm.org10.1145/2470654.2466265.

[27] C. Dijkshoorn, J. Oosterman, L. Aroyo, G. Houben, Personalization icrowd-driven annotation for cultural heritage collections, in: Work

shop and Poster Proceedings of the 20th Conference on User Modeling

Adaptation, and Personalization, Montreal, Canada, July 16-20, 20122012. http://ceur-ws.org/Vol-872/patch2012_paper_3.pdf.

[28] J. Oosterman, A. Nottamkandath, C. Dijkshoorn, A. Bozzon, G. HoubenL. Aroyo, Crowdsourcing knowledge-intensive tasks in cultura

heritage, in: ACM Web Science Conference, 2014, pp. 267–268http://doi.acm.org/10.1145/2615569.2615644.

[29] P. Welinder, P. Perona, Online crowdsourcing: rating annotators and ob

taining cost-effective labels, in: Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Confer

ence on, 2010, pp. 25–32. http://doi.acm.org/10.1109/CVPRW.20105543189.

[30] C.M. Bishop, et al., Pattern recognition and machine learning,springer New York, 2006.

[31] G. Schwarz, Estimating the dimension of a model, Ann. Stat. 6 (2) (1978

461–464. http://dx.doi.org/10.1214/aos/1176344136.[32] R.A. Hennfng, S. Sauter, G. Salvendy, E.F.J. Krieg, Microbrea

length, performance, and stress in a data entry task, Ergonomic32 (7) (1989) 855–864 pMID: 2806221. http://dx.doi.org/10.1080

00140138908966848.

owledge extraction and aggregation on crowdsourced anno-

i.org/10.1016/j.comnet.2015.07.008

Page 17: ARTICLE IN PRESS · ARTICLE IN PRESS JID: COMPNW [m3Gdc;July 22, 2015;16:6] ComputerNetworksxxx(2015)xxx–xxx Contents lists available at ScienceDirect ComputerNetworks

J. Oosterman et al. / Computer Networks xxx (2015) xxx–xxx 17

ARTICLE IN PRESSJID: COMPNW [m3Gdc;July 22, 2015;16:6]

[31358135913601361136213631364

[313651366136713681369

[31370137113721373137413751376

13771378137913801381138213831384138513861387

138813891390139113921393

139413951396139713981399140014011402

1403140414051406140714081409141014111412141314141415

ti 1416

1417141814191420142114221423142414251426142714281429

ga 1430

P

t

3] O. Inel, K. Khamkham, T. Cristea, A. Dumitrache, A. Rutjes, J. van derPloeg, L. Romaszko, L. Aroyo, R. Sips, Crowdtruth: machine-human

computation framework for harnessing disagreement in gatheringannotated data, in: The Semantic Web - ISWC 2014-13th International

Semantic Web Conference, Riva del Garda, Italy, October 19-23, 2014.Proceedings, Part II, 2014, pp. 486–504. http://dx.doi.org/10.1007/978-

3-319-11915-1_31.

4] A. Kittur, J.V. Nickerson, M. Bernstein, E. Gerber, A. Shaw, J. Zim-merman, M. Lease, J. Horton, The future of crowd work, in: Pro-

ceedings of the 2013 Conference on Computer Supported CooperativeWork, in: CSCW ’13, ACM, New York, NY, USA, 2013, pp. 1301–1318.

http://dx.doi.org/10.1145/2441776.2441923.5] J.M. Rzeszotarski, E.H. Chi, P. Paritosh, P. Dai, Inserting micro-

breaks into crowdsourcing workflows, in: Human Computationand Crowdsourcing: Works in Progress and Demonstration Ab-

stracts, An Adjunct to the Proceedings of the First AAAI Con-

ference on Human Computation and Crowdsourcing, Novem-ber 7-9, 2013, Palm Springs, CA, USA, 2013. http://www.aaai.

org/ocs/index.php/HCOMP/HCOMP13/paper/view/7544.

Born in Amsterdam, the Netherlands, Jasper

Oosterman studied Computer Science at theDelft University of Technology. His MSc thesis,

titled ’Finding Experts in a Corporate Environ-

ment’, concluded his MSc in 2011. Following hismaster, Jasper started a PhD student position su-

pervised by Alessandro Bozzon at the Web Infor-mation Systems research group led by Geert-Jan

Houben. The topic of his thesis is the maximiza-tion of Crowd Generated Knowledge and he plans

to finish in 2015.

Jie Yang is a PhD student at the Web Infor-

mation Systems Group of the Faculty of En-gineering, Mathematics and Computer Science

(EEMCS/EWI), Delft University of Technology (TU

Delft). Jie’s research is focused on crowdsourcingfor knowledge intensive tasks.

lease cite this article as: J. Oosterman et al., On the impact of know

ation of visual artworks, Computer Networks (2015), http://dx.doi.o

Dr. Alessandro Bozzon is an Assistant Professor

at the Delft University of Technology, and hadcompleted his PhD from Politecnico di Milano,

Italy in 2009. His current research interests are

in the fields of Web data management; humancomputation, crowdsourcing, and games with a

purpose; and information retrieval; with applica-tions to crowdsourced knowledge creation, urban

computing, and smart enterprise workforce.

Lora Aroyo is associate professor Intelligent In-

formation Systems Web and Media Departmentof Computer Science, at the Free University Am-

sterdam, in the Netherlands. Her specialities areall explicitly ’user centred’ crowdsourcing, user

and context modelling, recommender systems,

user centred design for semantic systems on e-learning. In the recent past Aroyo was scientific

coordinator of the EU Integrated Project: NoTube:integration of Web and TV data with the help

of semantics. Aroyo also coordinated the NWOproject: CHIP: Cultural Heritage Information Per-

sonalization. This project was awarded a Seman-

c Web Challenge prize.

Geert-Jan Houben is full professor of Web In-

formation Systems at the Software & ComputerTechnology department at TU Delft. His main re-

search interests are in Web Engineering, Web Sci-ence, and User Modeling, Adaptation and Per-

sonalization. He is managing editor of the Jour-nal of Web Engineering, editorial board member

for Journal of Web Science, International Journal

of Web Science, and ACM Transactions on theWeb. He is scientific director of Delft Data Sci-

ence, TU Delft’s coordinating initiative in DataScience, leading TU Delft’s research program on

Open & Online Education, and principal investi-tor in AMS, Amsterdam Institute for Advanced Metropolitan Solutions.

ledge extraction and aggregation on crowdsourced anno-

rg/10.1016/j.comnet.2015.07.008