tags vs shelves: from social tagging to social classification

31
Tags vs Shelves: From Social Tagging to Social Classification Hypertext 2011 Arkaitz Zubiaga, Christian K¨ orner, Markus Strohmaier UNED (Madrid, Spain) & Graz University of Technology (Graz, Austria) June 8th, 2011

Upload: azubiaga

Post on 04-Jul-2015

744 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Tags vs Shelves: From Social Tagging to Social Classification

Tags vs Shelves:From Social Tagging to Social Classification

Hypertext 2011

Arkaitz Zubiaga, Christian Korner, Markus Strohmaier

UNED (Madrid, Spain)&

Graz University of Technology (Graz, Austria)

June 8th, 2011

Page 2: Tags vs Shelves: From Social Tagging to Social Classification

Motivation

Index

1 Motivation

2 User Behavior Measures

3 Experiments

4 Results

5 Conclusions & Outlook

Zubiaga, Korner, Strohmaier () Tags vs Shelves June 8th, 2011 2 / 31

Page 3: Tags vs Shelves: From Social Tagging to Social Classification

Motivation

Book Cataloging

Zubiaga, Korner, Strohmaier () Tags vs Shelves June 8th, 2011 3 / 31

Page 4: Tags vs Shelves: From Social Tagging to Social Classification

Motivation

Book Cataloging

Zubiaga, Korner, Strohmaier () Tags vs Shelves June 8th, 2011 4 / 31

Page 5: Tags vs Shelves: From Social Tagging to Social Classification

Motivation

Book Cataloging

Zubiaga, Korner, Strohmaier () Tags vs Shelves June 8th, 2011 5 / 31

Page 6: Tags vs Shelves: From Social Tagging to Social Classification

Motivation

Book Cataloging

Librarians have been cataloging books for centuries.

The task of manually cataloging books becomes very expensive andeffortful for large collections.

For instance, the Library of Congress reported an average cost of $94.58for cataloging each book in 2002 (291,749 books, total: $27.5 million)

Given the enormous costs and efforts required for the task, research ismoving towards automatic classification.

Zubiaga, Korner, Strohmaier () Tags vs Shelves June 8th, 2011 6 / 31

Page 7: Tags vs Shelves: From Social Tagging to Social Classification

Motivation

Automatic Classification of Books

Problem: it is not easy to get data representing the aboutness of thebooks.

In addition, content of books is not always available digitally.

Solution:

Social tags provided by users have shown to be helpful (Zubiaga et al,2009)1.Social tagging sites like LibraryThing and GoodReads are gatheringvast amounts of tag annotations on books.

1A. Zubiaga, R. Martınez, V. Fresno. Getting the Most Out of Social Annotations for Web Page Classification. DocEng

2009.

Zubiaga, Korner, Strohmaier () Tags vs Shelves June 8th, 2011 7 / 31

Page 8: Tags vs Shelves: From Social Tagging to Social Classification

Motivation

Tagging

Zubiaga, Korner, Strohmaier () Tags vs Shelves June 8th, 2011 8 / 31

Page 9: Tags vs Shelves: From Social Tagging to Social Classification

Motivation

Social Tagging

Zubiaga, Korner, Strohmaier () Tags vs Shelves June 8th, 2011 9 / 31

Page 10: Tags vs Shelves: From Social Tagging to Social Classification

Motivation

Social Tagging

Zubiaga, Korner, Strohmaier () Tags vs Shelves June 8th, 2011 10 / 31

Page 11: Tags vs Shelves: From Social Tagging to Social Classification

Motivation

Problem Statement

Can we find a type of user whose tags further resemble the categorizationby experts?Can we characterize those users?

Zubiaga, Korner, Strohmaier () Tags vs Shelves June 8th, 2011 11 / 31

Page 12: Tags vs Shelves: From Social Tagging to Social Classification

Motivation

User Behavior

Korner et al.2 suggested and described the existence of two kinds ofuser behavior: Categorizers and Describers.

Categorizer DescriberGoal of Tagging later browsing later retrievalChange of Tag Vocabulary costly cheapSize of Tag Vocabulary limited openTags subjective objective

Previous works suggest that Describers rather help infer semanticrelations among tags.

Our goal is to discover whether this kind of tagging behavior affectsthe usefulness of tags as to the social classification of books.

2C. Korner. Understanding the Motivation behind Tagging. Hypertext 2009.

Zubiaga, Korner, Strohmaier () Tags vs Shelves June 8th, 2011 12 / 31

Page 13: Tags vs Shelves: From Social Tagging to Social Classification

Motivation

User Behavior

Zubiaga, Korner, Strohmaier () Tags vs Shelves June 8th, 2011 13 / 31

Page 14: Tags vs Shelves: From Social Tagging to Social Classification

Motivation

User Behavior

Zubiaga, Korner, Strohmaier () Tags vs Shelves June 8th, 2011 14 / 31

Page 15: Tags vs Shelves: From Social Tagging to Social Classification

User Behavior Measures

Index

1 Motivation

2 User Behavior Measures

3 Experiments

4 Results

5 Conclusions & Outlook

Zubiaga, Korner, Strohmaier () Tags vs Shelves June 8th, 2011 15 / 31

Page 16: Tags vs Shelves: From Social Tagging to Social Classification

User Behavior Measures

User Behavior Measures

Tags per Post (TPP) – Verbosity

TPP(u) =

r∑|Tur ||Ru|

(1)

Orphan Ratio (ORPHAN) – Diversity

n =

⌈|R(tmax)|

100

⌉(2)

ORPHAN(u) =|T o

u ||Tu|

,T ou = {t||R(t)| ≤ n} (3)

Tag Resource Ratio (TRR) – Verbosity + Diversity

TRR(u) =|Tu||Ru|

(4)

C. Korner, R. Kern, H.-P. Grahsl, and M. Strohmaier. Of categorizers and Describers: an evaluation of quantitative measures fortagging motivation. Hypertext 2010.

Zubiaga, Korner, Strohmaier () Tags vs Shelves June 8th, 2011 16 / 31

Page 17: Tags vs Shelves: From Social Tagging to Social Classification

User Behavior Measures

Computing measures

These 3 measures provide a weight for each user.

These weights enable to infer a ranking of users according to eachmeasure.

From these rankings, we choose subsets of users as extremeCategorizers (highest-ranked) and extreme Describers (lowest-ranked).

Subsets range from 10% to 100%, with a step size of 10%.

Zubiaga, Korner, Strohmaier () Tags vs Shelves June 8th, 2011 17 / 31

Page 18: Tags vs Shelves: From Social Tagging to Social Classification

User Behavior Measures

Book Cataloging

We select subsets of users according to number of tag assignments.

Selecting by percents of users would be unfair, since it would providedifferent amounts of data.

Zubiaga, Korner, Strohmaier () Tags vs Shelves June 8th, 2011 18 / 31

Page 19: Tags vs Shelves: From Social Tagging to Social Classification

User Behavior Measures

Objective

We aim at analyzing whether:

Categorizers provide tags that further help infer categorizationperformed by experts.Describers provide tags that further resemble book descriptions.

Zubiaga, Korner, Strohmaier () Tags vs Shelves June 8th, 2011 19 / 31

Page 20: Tags vs Shelves: From Social Tagging to Social Classification

Experiments

Index

1 Motivation

2 User Behavior Measures

3 Experiments

4 Results

5 Conclusions & Outlook

Zubiaga, Korner, Strohmaier () Tags vs Shelves June 8th, 2011 20 / 31

Page 21: Tags vs Shelves: From Social Tagging to Social Classification

Experiments

Datasets

Set of 38,149 popular books, with categorization data made byexperts:

27,299 categorized according to DDC (10 categories).24,861 categorized according to LCC (20 categories).

Tagging data from 153k+ users on LibraryThing and 110k+ users onGoodReads (100+ users annotated each book).

Additional descriptive data:

Book synopses (Barnes&Noble).User reviews (LibraryThing, GoodReads, and Amazon.com).Editorial reviews (Amazon.com).

Zubiaga, Korner, Strohmaier () Tags vs Shelves June 8th, 2011 21 / 31

Page 22: Tags vs Shelves: From Social Tagging to Social Classification

Experiments

Tag-based Book Classification

Software: Multiclass Support Vector Machines (svm-multiclass3).

Vectorial representation of books, using tag frequency values.

We perform 6 different training set selections of 18,000 books, andshow the average accuracy.

Accuracy: #correctguesses#testset .

3http://svmlight.joachims.org/svm multiclass.htmlZubiaga, Korner, Strohmaier () Tags vs Shelves June 8th, 2011 22 / 31

Page 23: Tags vs Shelves: From Social Tagging to Social Classification

Experiments

Descriptiveness of Tags

Vectorial representation of books (Tr ), using tag frequency values.

Vectorial representation of books (Rr ), using term frequency valueson descriptive data (synopses, reviews).

Cosine similarity between Tr and Rr :

similarityr = cos(θr ) =Tr · Rr

‖Tr‖‖Rr‖=

n∑i=1

Tri × Rri√∑ni=1 (Tri )2 ×

√∑ni=1 (Rri )2

(5)

Zubiaga, Korner, Strohmaier () Tags vs Shelves June 8th, 2011 23 / 31

Page 24: Tags vs Shelves: From Social Tagging to Social Classification

Results

Index

1 Motivation

2 User Behavior Measures

3 Experiments

4 Results

5 Conclusions & Outlook

Zubiaga, Korner, Strohmaier () Tags vs Shelves June 8th, 2011 24 / 31

Page 25: Tags vs Shelves: From Social Tagging to Social Classification

Results

Results

GoodReads LibraryThing

TPP (verb.) TRR (div.) ORP. (verb. + div.) TPP (verb.) TRR (div.) ORP. (verb. + div.)

Cla

ssifi

cati

on

Des

crip

tive

nes

s

1 TPP measure: Categorizers outperform Describers for classification.2 All the measures (though especially TRR): Describers further

resemble descriptive data.

Zubiaga, Korner, Strohmaier () Tags vs Shelves June 8th, 2011 25 / 31

Page 26: Tags vs Shelves: From Social Tagging to Social Classification

Results

Results

GoodReads LibraryThing

TPP (verb.) TRR (div.) ORP. (verb. + div.) TPP (verb.) TRR (div.) ORP. (verb. + div.)

Cla

ssifi

cati

on

Des

crip

tive

nes

s

3 Verbosity helps find extreme Categorizers.Users who think of a specific shelf to place the book tend to assign atag identifying the shelf.

Zubiaga, Korner, Strohmaier () Tags vs Shelves June 8th, 2011 26 / 31

Page 27: Tags vs Shelves: From Social Tagging to Social Classification

Results

Results

GoodReads LibraryThing

TPP (verb.) TRR (div.) ORP. (verb. + div.) TPP (verb.) TRR (div.) ORP. (verb. + div.)

Cla

ssifi

cati

on

Des

crip

tive

nes

s

4 Diversity does not work to find Categorizers on GoodReads.GoodReads suggests previously used tags to the user, so that it affectsdiversity of tags.

Zubiaga, Korner, Strohmaier () Tags vs Shelves June 8th, 2011 27 / 31

Page 28: Tags vs Shelves: From Social Tagging to Social Classification

Results

Results

GoodReads LibraryThing

TPP (verb.) TRR (div.) ORP. (verb. + div.) TPP (verb.) TRR (div.) ORP. (verb. + div.)

Cla

ssifi

cati

on

Des

crip

tive

nes

s

5 Users providing non-descriptive tags (i.e., different from Describers)produce more accurate classification.

Zubiaga, Korner, Strohmaier () Tags vs Shelves June 8th, 2011 28 / 31

Page 29: Tags vs Shelves: From Social Tagging to Social Classification

Conclusions & Outlook

Index

1 Motivation

2 User Behavior Measures

3 Experiments

4 Results

5 Conclusions & Outlook

Zubiaga, Korner, Strohmaier () Tags vs Shelves June 8th, 2011 29 / 31

Page 30: Tags vs Shelves: From Social Tagging to Social Classification

Conclusions & Outlook

Conclusions & Outlook

Social classification of books with tagging data, discriminatingextreme Categorizers and Describers.

It complements previous research by showing that users so-calledCategorizers produce more accurate classification.

Non-verbose, non-descriptive, shelf-driven tagging produces moreaccurate classification of books.

Outlook: Further analyzing tagging behavior to find: generalists(users who provide general tags), and specialists (users who providemore specific tags rather focused on the subject).

Zubiaga, Korner, Strohmaier () Tags vs Shelves June 8th, 2011 30 / 31

Page 31: Tags vs Shelves: From Social Tagging to Social Classification

Conclusions & Outlook

Thank You

Achiu Arigato Danke Dhannvaad Dua Netjer en ek Efcharisto

Gracias Gracies Gratia Grazie GuishepeliHvala Kiitos Koszonom Merce Merci Milaesker Obrigado Shukran Shukriya Tack Tak Takk

Tanan Tapadh leat Tesekkur ederim Thankyou Toda

E-mail: [email protected]

@arkaitz

Zubiaga, Korner, Strohmaier () Tags vs Shelves June 8th, 2011 31 / 31