web mining for wikipedia - hong, · pdf fileweb mining for wikipedia the depth-study...

98
Web Mining for Wikipedia The Depth-Study Presentation Liangjie Hong Department of Computer Science & Engineering Lehigh University December 17th, 2010 Liangjie Hong Web Mining for Wikipedia

Upload: trinhthu

Post on 30-Jan-2018

249 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Web Mining for WikipediaThe Depth-Study Presentation

Liangjie Hong

Department of Computer Science & EngineeringLehigh University

December 17th, 2010

Liangjie Hong Web Mining for Wikipedia

Page 2: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Outline

1 Wikipedia and Research about Wikipedia

2 Wikipedia as Social Media

3 Wikipedia as an external resource

4 Discussions

5 References

Liangjie Hong Web Mining for Wikipedia

Page 3: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Table of Contents

1 Wikipedia and Research about Wikipedia

2 Wikipedia as Social Media

3 Wikipedia as an external resource

4 Discussions

5 References

Liangjie Hong Web Mining for Wikipedia

Page 4: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

The Success of Wikipedia

In many ways:

the 7th most visited website

17 million articles

365 million readers

257 active language editions

Liangjie Hong Web Mining for Wikipedia

Page 5: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

The Key Characteristics of Wikipedia

participants from all over the worlds, totally voluntarily

a vast reservoir of knowledge base

a rich platform for collaborations

a showcase of Web 2.0 technology

Liangjie Hong Web Mining for Wikipedia

Page 6: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

The Research about Wikipedia

over 170 publications on top venues (journals, conferences)

from Computer Science, Sociology to Economics

a wide range of aspects regarding Wikipedia

consensus and debates

from early stages of Wikipedia

Liangjie Hong Web Mining for Wikipedia

Page 7: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

The Research about Wikipedia

Researchers:

Fernanda B. Viegas, MIT Media lab → IBM

Ed H. Chi, Palo Alto Research Center

Daniel S. Weld, Professor of Computer Science andEngineering at the University of Washington

Aniket Kittur, Assistant Professor of Human-ComputerInteraction Institute at Carnegie Mellon University

Jure Leskovec, Assistant Professor of Computer Science atStanford University

Liangjie Hong Web Mining for Wikipedia

Page 8: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Research Questions

What are motivations behind the contributors?

How collaborations have taken place between users?

What is the quality and trustworthiness of the content,compared to traditional expert-based encyclopedia?

How to utilize Wikipedia as a great external resource?

Liangjie Hong Web Mining for Wikipedia

Page 9: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Table of Contents

1 Wikipedia and Research about Wikipedia

2 Wikipedia as Social Media

3 Wikipedia as an external resource

4 Discussions

5 References

Liangjie Hong Web Mining for Wikipedia

Page 10: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Wikipedia as Social Media

We will discuss:

User Interactions

Link structures & Topical Coverage

Quality Measurement

Liangjie Hong Web Mining for Wikipedia

Page 11: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

User Interactions

motivations

characteristics of collaboration

newcomersvandalism

policies, conflicts and coordinates

potential applicable

Liangjie Hong Web Mining for Wikipedia

Page 12: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

User Interactions: Motivations

The research concerning the motivations behind contributors ismostly conducted by surveys on a small group of users.

[Forte and Bruckman2005] 22 users from Wikipedia

[Kuznetsov2006] 102 volunteer students in New York University

[Nov2007] 140 users from Wikipedia

[Viegas2007] 29 image contributors from Wikipedia

[Schroera and Hertel2009] 106 users from Wikipedia German

Liangjie Hong Web Mining for Wikipedia

Page 13: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

User Interactions: Motivations

Some interesting findings:

ownership of articles [Forte and Bruckman2005]

“fun” and “ideology” are two strong reasons[Nov2007, Schroera and Hertel2009]

information should be free [Viegas2007]

articles can be improved by adding images [Viegas2007]

credits and recognition [Schroera and Hertel2009]

Liangjie Hong Web Mining for Wikipedia

Page 14: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

User Interactions: Motivations: Discussion

However, many of questions regarding Wikipedia are stillunanswered.

What are motivations behind users from different cultures?

Do motivations change over time?

Are there any active contributors who left Wikipedia and whatare their motivations?

Liangjie Hong Web Mining for Wikipedia

Page 15: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

User Interactions: Characteristics : Newcomers

It is important to understand how newcomers get involved inWikipedia.

What type of contribution made by newcomers?

What is the scope of their contribution?

How do they transit to senior users?

Liangjie Hong Web Mining for Wikipedia

Page 16: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

User Interactions: Characteristics : Newcomers

Some interesting findings :[Viegas et al.2004, Bryant et al.2005, Kittur et al.2007, Choi et al.2010]

first edits involved topics with expertise.

Wikipedia becomes more important as a whole

newcomers → vandalism

lacking awareness of the community

meta-activities increase over time

Liangjie Hong Web Mining for Wikipedia

Page 17: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

User Interactions: Characteristics: Users Work Over Time

From [Kittur et al.2007]

Figure: Direct work decrease over time

Figure: In-direct work increase over time

Liangjie Hong Web Mining for Wikipedia

Page 18: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

User Interactions: Characteristics: Users Work Over Time

From [Kittur et al.2007]

Figure: Changes of different types of work over time

Liangjie Hong Web Mining for Wikipedia

Page 19: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

User Interactions: Characteristics: Vandalism

From Wikipedia:

Vandalism is any addition, removal, or change of contentmade in a deliberate attempt to compromise the integrityof Wikipedia.

Liangjie Hong Web Mining for Wikipedia

Page 20: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

User Interactions: Characteristics: Vandalism

According to [Viegas et al.2004], there are five types of vandalism:

mass deletion

offensive copy

phony copyOn the “Chemistry”, the full text from the “Windows 98 readme” file was

inserted

phony redirection

idiosyncratic copyOn “Cat”, a reader posted a lengthy note on the Unix cat command

and indeed image vandalism! [Viegas2007]

Liangjie Hong Web Mining for Wikipedia

Page 21: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

User Interactions: Characteristics: Vandalism

Surprisingly, “vandalism” is usually detected very quickly andrevised back.

mass deletion (Median time: 2.8 minutes; Mean time 7.7days) [Viegas et al.2004]

user labeled vandalism (Median time: 11.3 min; Mean time2.1 days) [Kittur et al.2007]

the prob. of viewing a damage page is increasing[Priedhorsky et al.2007]

Liangjie Hong Web Mining for Wikipedia

Page 22: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

User Interactions: Characteristics: Vandalism

From [Viegas et al.2004]

Figure: Left: history flow for “Abortion” page, versions equally spaced;Right: history flow for “Abortion” page, spaced by date

Liangjie Hong Web Mining for Wikipedia

Page 23: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

User Interactions: Characteristics: Vandalism

Sometimes, it is hard to detect “vandalism” or “edit wars”.From [Viegas et al.2004]

Figure: “Chocolate” page spaced out by number of versions;

two users fought over whether a kind of chocolate sculpture called “coulage” really

existed and consequently, whether or not the paragraph about it should appear on the

page.

Liangjie Hong Web Mining for Wikipedia

Page 24: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

User Interactions: Characteristics: Vandalism

From [Kittur et al.2007, Priedhorsky et al.2007]

Figure: Small increase in proportion of edits marked as vandalism.

Figure: Probability of a typical view returning a damaged article.

Liangjie Hong Web Mining for Wikipedia

Page 25: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

User Interactions: Characteristics: Vandalism

Some policies eventually came out. Certain pages are in“protection mode” (e.g., the first page).

Liangjie Hong Web Mining for Wikipedia

Page 26: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

User Interactions: Policies and Coordination

Some indications of collaboration and coordination: From[Viegas et al.2007]

Liangjie Hong Web Mining for Wikipedia

Page 27: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

User Interactions: Policies and Coordination

“Talk pages” are places to policy making and coordination.

Talk pages were characterized as places where conflict wasresolved. [Viegas et al.2004]

Talk pages seem quite effective to resolve conflicts and settleagreements. [Bryant et al.2005]

Talk pages serve as a place for planning and other types ofcoordination. [Viegas et al.2007]

Liangjie Hong Web Mining for Wikipedia

Page 28: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

User Interactions: Policies and Coordination

A closer view of policies,[Butler et al.2008, Beschastnikh et al.2008, Kriplean et al.2008]:

policies expanded

the details of policies grow significantly.

Copyrights: 341 words → 3200 words.What Wikipedia is not: 541 words → 5031 words.Civility: 1741 words → 2131 words.Consensus: 132 words → 2054 words.Deletion: 405 words → 2349 words.

users frequently cite policies

“value systems” becomes vital

Liangjie Hong Web Mining for Wikipedia

Page 29: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

User Interactions: Policies and Coordination

From [Beschastnikh et al.2008]:

Figure: Each striation is a single policy. The width represents the fractionof cumulative policy citations. The 50 bottom striations are ordered frommost cited policies in the last week of the dataset (bottom) to least cited(top). The top striation aggregates the remaining 142 policies.

Liangjie Hong Web Mining for Wikipedia

Page 30: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

User Interactions: Policies and Coordination

From [Beschastnikh et al.2008]:

Figure: Each striation in this graph represents a single category. Thewidth of the striation represents the fraction of cumulative policycitations for policies in the category. Striations are ordered from bottomto top according to their rank in the last week of the dataset.

Liangjie Hong Web Mining for Wikipedia

Page 31: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

User Interactions: Policies and Coordination

1443 recipients as observed by 1537 givers (382 administrators).These barnstar givers and receivers have contributed 9.7% and11.8% of all edits to Wikipedia articles, respectively.[Kriplean et al.2008]

Liangjie Hong Web Mining for Wikipedia

Page 32: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

User Interactions: Discussion

Many efforts are from the community of Computer-HumanInteraction, Sociology and Journalism. Some current issues:

Many results are based on surveys on a small scale.

Most of work only focus on Wikipedia English.

No work on interactions between articles and talk pages

Liangjie Hong Web Mining for Wikipedia

Page 33: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Link Structures and Content Topical Coverage

Content of articles and the rich structures built between them aretwo heated research topics regarding Wikipedia.

Liangjie Hong Web Mining for Wikipedia

Page 34: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Link Structures

Research questions:

Is the link structures of Wikipedia different from the Web?

What is the evolution of Wikipedia graph?

How Wikipedia graph helps us in various tasks?

Note, the rich “entities” in Wikipedia:

articles

users

categories

edit history

· · ·

Liangjie Hong Web Mining for Wikipedia

Page 35: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Link Structures: Basic Observations

The graph of Wikipedia articles is similar to that of the Web.

“bow tie” structure exists [Capocci et al.2006, Buriol et al.2006]

indegree and outdegree distributions follow Power-lawdistribution [Capocci et al.2006, Buriol et al.2006, Kamps and Koolen2009]

preferential attachment [Capocci et al.2006]

the graph becomes more denser over time[Buriol et al.2006, Kamps and Koolen2009]

a unique growth process across different languages[Zlatic et al.2006]

Outlinks behave similar to inlinks. [Kamps and Koolen2009]

Liangjie Hong Web Mining for Wikipedia

Page 36: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Link Structures: Edit Graph

Users participate into the revisions of articles and therefore, therelationships between articles, articles-users and users can beexploited in graph structures.

reveal same edit patterns among users [Jesus et al.2009]

revision actions are associated to users and build graph toidentify opposed groups, discover controversial topics[Vuong et al.2008, Brandes et al.2009]

However, these methods are only evaluated anecdotally or on asmall subset of Wikpedia.

Liangjie Hong Web Mining for Wikipedia

Page 37: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Link Structures: Edit Graph

From [Vuong et al.2008]

Figure: Top Revision Count 20 Articles

Liangjie Hong Web Mining for Wikipedia

Page 38: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Link Structures: Predicting Missing Links

Since Wikipedia is fully generated by volunteers, it is possible thatthe links between articles are missing or misplaced.

cluster similar pages → candidates [Adafre and de Rijke2005]

matrix factorization [West et al.2009]

However,:

no ground truth

human evaluation

anecdotal results

Liangjie Hong Web Mining for Wikipedia

Page 39: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Topical Coverage

People wish to know how wide the topics Wikipedia has covered.Therefore, two kinds of coverage are investigated:

Comparable Coverage: the coverage compared to othersources

Independent Coverage: the coverage of topics in its owntaxonomy

Liangjie Hong Web Mining for Wikipedia

Page 40: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Topical Coverage: Comparable Coverage

[Halavais and Lackaff2008] conducted two experiments:

3, 000 articles from Wikipedia

assign these articles into Library of Congress category

(at the broadest level)

manually by two labellers

compared to Bowkers Books in Print, a catalogue ofprinted books

Liangjie Hong Web Mining for Wikipedia

Page 41: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Topical Coverage: Comparable Coverage

From [Halavais and Lackaff2008]:

Figure: Wikipedia vs. Books in Print, percentage of total in eachcategory, ranked by percentage difference between the two collections.

Liangjie Hong Web Mining for Wikipedia

Page 42: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Topical Coverage: Comparable Coverage

From [Halavais and Lackaff2008]:

Figure: Wikipedia vs. Books in Print, percentage of total in eachcategory, ranked by percentage difference between the two collections.

Liangjie Hong Web Mining for Wikipedia

Page 43: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Topical Coverage: Comparable Coverage

From [Halavais and Lackaff2008]:

Figure: Average number of edits of articles in each category

Liangjie Hong Web Mining for Wikipedia

Page 44: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Topical Coverage: Comparable Coverage

The second experiments done by [Halavais and Lackaff2008] is tocompare articles from Wikipedia with:

Encyclopedia of Linguistics

New Princeton Encyclopedia of Poetry and Poetics

Encyclopedia of Physics

Liangjie Hong Web Mining for Wikipedia

Page 45: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Topical Coverage: Comparable Coverage

From [Halavais and Lackaff2008]:

Figure: Coverage comparison between printed encyclopedias andWikipedia

Liangjie Hong Web Mining for Wikipedia

Page 46: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Topical Coverage: Independent Coverage

From [Kittur et al.2009]:

Figure: Distribution of topics in Wikipedia from January 2008 along withchange since July 2006

Liangjie Hong Web Mining for Wikipedia

Page 47: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Topical Coverage: Independent Coverage

From [Kittur et al.2009]:

Figure: Distribution of conflict in Wikipedia. Sizes represent normalizedconflict,while the topic order (clockwise from People) reflects theabsolute amount.

Liangjie Hong Web Mining for Wikipedia

Page 48: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Quality modeling and Trustworthiness

There is a huge debate that whether the content of Wikipedia is inhigh quality.

Liangjie Hong Web Mining for Wikipedia

Page 49: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Quality modeling and Trustworthiness

Research communities pay attention to this problem from 2004.

Liangjie Hong Web Mining for Wikipedia

Page 50: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Quality modeling and Trustworthiness

Research communities pay attention to this problem from 2004.No consensus.

Liangjie Hong Web Mining for Wikipedia

Page 51: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Quality modeling and Trustworthiness

Two types of “quality”.

The quality or trustworthiness compared to otherencyclopaedia

The quality of articles within Wikipedia

Liangjie Hong Web Mining for Wikipedia

Page 52: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Quality modeling and Trustworthiness: “Absolute Quality”

All studies are surveys.

Yes. 42 articles, compared to Britannica. [Giles2005]

No. No real comparison, only concerns. [Denning et al.2005]

Yes. No real comparison, many side evidence. [Fallis2008]

Hard to judge. No transparency. [Santana and Wood2009]

Liangjie Hong Web Mining for Wikipedia

Page 53: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Quality modeling and Trustworthiness: “Absolute Quality”

The Nature study, [Giles2005]

factual errors or misleading statements in both sides.

writing style is an issue in Wikipedia

nearly 17% Nature authors among 1000 routinely referred toWikipedia

But,

there exist Wikipedia articles went wrong significantly

editors in Britannica refused to comment

Liangjie Hong Web Mining for Wikipedia

Page 54: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Quality modeling and Trustworthiness: “Absolute Quality”

An interesting fact: Wikipedia articles are frequently cited in newsmedia. [Lih2004]

72 news sources

three most cited articles are World War II, Islam andAstronomy

Daily Telegraph online, a third of citations

Liangjie Hong Web Mining for Wikipedia

Page 55: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Quality modeling and Trustworthiness: “Relative Quality”

The task becomes to model and predict the “quality” of articles inWikipedia.

featured articles vs. normal articles (vandalism articles)

machine learning techniques

high accuracy

Liangjie Hong Web Mining for Wikipedia

Page 56: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Quality modeling and Trustworthiness: “Relative Quality”

Features or indicators used in many methods:

revision history of articles[Stvilia et al.2005, Zeng et al.2006, Stein and Hess2007, Hu et al.2007a,

Hu et al.2007b, Wilkinson and Huberman2007, Wohner and Peters2009]

content of articles (e.g., the length)[Hu et al.2007a, Hu et al.2007b, Blumenstock2008]

reputation of authors and quality of articles[Hu et al.2007a, Hu et al.2007b, Stein and Hess2007]

links to external resources [Lopes and Carrico2008]

Liangjie Hong Web Mining for Wikipedia

Page 57: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Quality modeling and Trustworthiness: “Relative Quality”

From [Blumenstock2008]

Figure: Word counts for featured/random article.

Liangjie Hong Web Mining for Wikipedia

Page 58: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Quality modeling and Trustworthiness: “Relative Quality”

From [Blumenstock2008]

Figure: Performance of word count in classifying featured vs. randomarticles.

Liangjie Hong Web Mining for Wikipedia

Page 59: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Quality modeling and Trustworthiness: Discuss

“Absolute Quality” needs more investigates to reach consensus.“Relative Quality” is “solved”.

Liangjie Hong Web Mining for Wikipedia

Page 60: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Table of Contents

1 Wikipedia and Research about Wikipedia

2 Wikipedia as Social Media

3 Wikipedia as an external resource

4 Discussions

5 References

Liangjie Hong Web Mining for Wikipedia

Page 61: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Research Problems

A wide range of research communities are using Wikipedia as aresource to tackle the problems like:

Knowledge Extraction

Classification & Clustering

Entity Disambiguation

Information Retrieval & Web Search

and many others!

Liangjie Hong Web Mining for Wikipedia

Page 62: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Structures of Wikipedia

The structures provided by Wikipedia:

redirect pages

categories

disambiguation pages

hyperlinks

Liangjie Hong Web Mining for Wikipedia

Page 63: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Knowledge Extraction

Natural Language Processing (NLP):

building thesaurus dictionaries

measuring semantic “relatedness”

[Milne et al.2006, Strube and Ponzetto2006, Gabrilovich and Markovitch2007,

Nakayama et al.2007, Giuliano et al.2010]

Liangjie Hong Web Mining for Wikipedia

Page 64: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Knowledge Extraction

Semantic Web:

Tim Berners-Lee’s compelling vision of Semantic Web ishindered by a chicken-and-egg problem, requiring richstructured data to motivate the development ofapplications.

[Wu and Weld2007, Weld et al.2008, Wu et al.2008, Wu and Weld2008]

a bootstrap tool to build knowledge base

Liangjie Hong Web Mining for Wikipedia

Page 65: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Classification and Clustering

Document classification and clustering are two of the mostimportant applications of text mining.

enrich document representation

significant performance improved

Liangjie Hong Web Mining for Wikipedia

Page 66: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Classification and Clustering

Ways to enrich document representation

category information[Schonhofen2006]

entities[Huang et al.2008, Wang et al.2009, Odon de Alencar et al.2010]

and usually better than WordNet or Open Directory Project

Liangjie Hong Web Mining for Wikipedia

Page 67: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Entity Disambiguation

Entity disambiguation, especially Name Disambiguation, plays acentral role in many NLP applications (e.g., measuring relatedness,relation extraction).

Performance has been significantly improved by utilizing thestructures of Wikipedia.[Bunescu and Pasca2006, Mihalcea2007, Cucerzan2007, Han and Zhao2009,

Fogarolli2009]

Liangjie Hong Web Mining for Wikipedia

Page 68: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Information Retrieval and Web Search

Wikipedia has shown being effective that can be used in a varietyof aspects in IR systems:

query expansion and pseudo relevance feedback[Li et al.2007, Xu et al.2009, Meij and de Rijke2010]

query segmentation[Tan and Peng2008]

cross-lingual IR[Potthast et al.2008]

Liangjie Hong Web Mining for Wikipedia

Page 69: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Table of Contents

1 Wikipedia and Research about Wikipedia

2 Wikipedia as Social Media

3 Wikipedia as an external resource

4 Discussions

5 References

Liangjie Hong Web Mining for Wikipedia

Page 70: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Discussions

Current achievements:

“Relative Quality” assessment, semantic relatedness

general characteristics

many surveys conducted

Liangjie Hong Web Mining for Wikipedia

Page 71: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Discussions

And issues:

hardly find that research projects have been conducted on thewhole dataset.

even the sampling methods are different

seldom compare to each other

nearly no study on the problem of applicable of the technology

Liangjie Hong Web Mining for Wikipedia

Page 72: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Table of Contents

1 Wikipedia and Research about Wikipedia

2 Wikipedia as Social Media

3 Wikipedia as an external resource

4 Discussions

5 References

Liangjie Hong Web Mining for Wikipedia

Page 73: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Bibliography I

Viegas, F. B., Wattenberg, M., and Dave, K. 2004.Studying cooperation and conflict between authors withhistory flow visualizations.In CHI ’04: Proceedings of the SIGCHI conference on Humanfactors in computing systems. ACM, New York, NY, USA,575–582.

Lih, A. 2004.Wikipedia as participatory journalism: Reliable sources?metrics for evaluating collaborative media as a news resource.In 5th International Symposium on Online Journalism.

Liangjie Hong Web Mining for Wikipedia

Page 74: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Bibliography II

Bryant, S. L., Forte, A., and Bruckman, A. 2005.Becoming wikipedian: transformation of participation in acollaborative online encyclopedia.In GROUP ’05: Proceedings of the 2005 international ACMSIGGROUP conference on Supporting group work. ACMPress, New York, NY, USA, 1–10.

Adafre, S. F. and de Rijke, M. 2005.Discovering missing links in wikipedia.In LinkKDD ’05: Proceedings of the 3rd internationalworkshop on Link discovery. ACM, New York, NY, USA,90–97.

Liangjie Hong Web Mining for Wikipedia

Page 75: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Bibliography III

Forte, A. and Bruckman, A. 2005.Why do people write for wikipedia? incentives to contribute toopen-content publishing.In GROUP 05 workshop: Sustaining community: The role anddesign of incentive mechanisms in online systems.

Stvilia, B., Twidale, M. B., Smith, L. C., andGasser, L. 2005.Assessing information quality of a community-basedencyclopedia.In Proceedings of the International Conference on InformationQuality - ICIQ 2005. 442–454.

Giles, J. 2005.Internet encyclopaedias go head to head.Nature 438, 7070 (December), 900–901.

Liangjie Hong Web Mining for Wikipedia

Page 76: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Bibliography IV

Denning, P., Horning, J., Parnas, D., andWeinstein, L. 2005.Wikipedia risks.Communications of the ACM 48, 12, 152–152.

Strube, M. and Ponzetto, S. P. 2006.Wikirelate! computing semantic relatedness using wikipedia.In AAAI’06: proceedings of the 21st national conference onArtificial intelligence. AAAI Press, 1419–1424.

Zeng, H., Alhossaini, M. A., Ding, L., Fikes, R., andMcGuinness, D. L. 2006.Computing trust from revision history.In PST ’06: Proceedings of the 2006 International Conferenceon Privacy, Security and Trust. ACM, New York, NY, USA,1–1.

Liangjie Hong Web Mining for Wikipedia

Page 77: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Bibliography V

Zlatic, V., Bozicevic, M., Stefancic, H., andDomazet, M. 2006.Wikipedias: Collaborative web-based encyclopedias as complexnetworks.Phys. Rev. E 74, 1 (Jul), 016115.

Buriol, L. S., Castillo, C., Donato, D., Leonardi,S., and Millozzi, S. 2006.Temporal analysis of the wikigraph.In IEEE/WIC/ACM International Conference on WebIntelligence (WI ’06). IEEE Computer Society, Los Alamitos,CA, USA, 45–51.

Kuznetsov, S. 2006.Motivations of contributors to wikipedia.ACM SIGCAS Computers and Society 36, 2, 1.

Liangjie Hong Web Mining for Wikipedia

Page 78: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Bibliography VI

Bunescu, R. and Pasca, M. 2006.Using encyclopedic knowledge for named entitydisambiguation.In Proceesings of the 11th Conference of the EuropeanChapter of the Association for Computational Linguistics(EACL-06). Trento, Italy, 9–16.

Capocci, A., Servedio, V. D. P., Colaiori, F.,Buriol, L. S., Donato, D., Leonardi, S., andCaldarelli, G. 2006.Preferential attachment in the growth of social networks: Theinternet encyclopedia wikipedia.Physical Review E. 74, 3 (Sep), 036116.

Liangjie Hong Web Mining for Wikipedia

Page 79: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Bibliography VII

Milne, D., Medelyan, O., and Witten, I. H. 2006.Mining domain-specific thesauri from wikipedia: A case study.In WI ’06: Proceedings of the 2006 IEEE/WIC/ACMInternational Conference on Web Intelligence. IEEE ComputerSociety, Washington, DC, USA, 442–448.

Schonhofen, P. 2006.Identifying document topics using the wikipedia categorynetwork.In WI ’06: Proceedings of the 2006 IEEE/WIC/ACMInternational Conference on Web Intelligence. IEEE ComputerSociety, Washington, DC, USA, 456–462.

Liangjie Hong Web Mining for Wikipedia

Page 80: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Bibliography VIII

Priedhorsky, R., Chen, J., Lam, S. T. K., Panciera,K., Terveen, L., and Riedl, J. 2007.Creating, destroying, and restoring value in wikipedia.In GROUP ’07: Proceedings of the 2007 international ACMconference on Supporting group work. ACM, New York, NY,USA, 259–268.

Viegas, F. B., Wattenberg, M., Kriss, J., and vanHam, F. 2007.Talk before you type: Coordination in wikipedia.In HICSS ’07: Proceedings of the 40th Annual HawaiiInternational Conference on System Sciences. IEEE ComputerSociety, Washington, DC, USA, 78.

Liangjie Hong Web Mining for Wikipedia

Page 81: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Bibliography IX

Gabrilovich, E. and Markovitch, S. 2007.Computing semantic relatedness using wikipedia-based explicitsemantic analysis.In IJCAI’07: Proceedings of the 20th international jointconference on Artifical intelligence. Morgan KaufmannPublishers Inc., San Francisco, CA, USA, 1606–1611.

Hu, M., Lim, E.-P., Sun, A., Lauw, H. W., andVuong, B.-Q. 2007a.Measuring article quality in wikipedia: models and evaluation.In CIKM ’07: Proceedings of the sixteenth ACM conference onConference on information and knowledge management. ACM,New York, NY, USA, 243–252.

Liangjie Hong Web Mining for Wikipedia

Page 82: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Bibliography X

Hu, M., Lim, E.-P., Sun, A., Lauw, H. W., andVuong, B.-Q. 2007b.On improving wikipedia search using article quality.In WIDM ’07: Proceedings of the 9th annual ACMinternational workshop on Web information and datamanagement. ACM, New York, NY, USA, 145–152.

Nakayama, K., Hara, T., and Nishio, S. 2007.Wikipedia mining for an association web thesaurusconstruction.In Web Information Systems Engineering C WISE 2007.Lecture Notes in Computer Science, vol. 4831. Springer Berlin/ Heidelberg, 322–334.

Liangjie Hong Web Mining for Wikipedia

Page 83: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Bibliography XI

Wilkinson, D. M. and Huberman, B. A. 2007.Cooperation and quality in wikipedia.In WikiSym ’07: Proceedings of the 2007 internationalsymposium on Wikis. ACM, New York, NY, USA, 157–164.

Stein, K. and Hess, C. 2007.Does it matter who contributes: a study on featured articles inthe german wikipedia.In HT ’07: Proceedings of the eighteenth conference onHypertext and hypermedia. ACM, New York, NY, USA,171–174.

Viegas, F. B. 2007.The visual side of wikipedia.In HICSS ’07: Proceedings of the 40th Annual HawaiiInternational Conference on System Sciences. IEEE ComputerSociety, Washington, DC, USA, 85.

Liangjie Hong Web Mining for Wikipedia

Page 84: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Bibliography XII

Nov, O. 2007.What motivates wikipedians?Communications of the ACM 50, 11, 60–64.

Wu, F. and Weld, D. S. 2007.Autonomously semantifying wikipedia.In CIKM ’07: Proceedings of the sixteenth ACM conference onConference on information and knowledge management. ACM,New York, NY, USA, 41–50.

Mihalcea, R. 2007.Using Wikipedia for Automatic Word Sense Disambiguation.In North American Chapter of the Association forComputational Linguistics (NAACL 2007).

Liangjie Hong Web Mining for Wikipedia

Page 85: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Bibliography XIII

Cucerzan, S. 2007.Large-scale named entity disambiguation based on Wikipediadata.In Proceedings of the 2007 Joint Conference on EmpiricalMethods in Natural Language Processing and ComputationalNatural Language Learning (EMNLP-CoNLL). Association forComputational Linguistics, Prague, Czech Republic, 708–716.

Li, Y., Luk, W. P. R., Ho, K. S. E., and Chung, F.L. K. 2007.Improving weak ad-hoc queries using wikipedia as externalcorpus.In SIGIR ’07: Proceedings of the 30th annual internationalACM SIGIR conference on Research and development ininformation retrieval. ACM, New York, NY, USA, 797–798.

Liangjie Hong Web Mining for Wikipedia

Page 86: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Bibliography XIV

Kittur, A., Suh, B., Pendleton, B. A., and Chi,E. H. 2007.He says, she says: conflict and coordination in wikipedia.In CHI ’07: Proceedings of the SIGCHI conference on Humanfactors in computing systems. ACM, New York, NY, USA,453–462.

Butler, B., Joyce, E., and Pike, J. 2008.Don’t look now, but we’ve created a bureaucracy: the natureand roles of policies and rules in wikipedia.In CHI ’08: Proceeding of the twenty-sixth annual SIGCHIconference on Human factors in computing systems. ACM,New York, NY, USA, 1101–1110.

Liangjie Hong Web Mining for Wikipedia

Page 87: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Bibliography XV

Halavais, A. and Lackaff, D. 2008.An analysis of topical coverage of wikipedia.Journal of Computer-Mediated Communication 13, 2,429–440.

Lopes, R. and Carrico, L. 2008.On the credibility of wikipedia: an accessibility perspective.In WICOW ’08: Proceeding of the 2nd ACM workshop onInformation credibility on the web. ACM, New York, NY, USA,27–34.

Blumenstock, J. E. 2008.Size matters: word count as a measure of quality on wikipedia.

In WWW ’08: Proceeding of the 17th international conferenceon World Wide Web. ACM, New York, NY, USA, 1095–1096.

Liangjie Hong Web Mining for Wikipedia

Page 88: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Bibliography XVI

Beschastnikh, I., Kriplean, T., and McDonald, D.2008.Wikipedian self-governance in action: Motivating the policylens.In Proceedings of the 2008 AAAI International Conference onWeblogs and Social Media (ICWSM ’08).

Kriplean, T., Beschastnikh, I., and McDonald,D. W. 2008.Articulations of wikiwork: uncovering valued work in wikipediathrough barnstars.In CSCW ’08: Proceedings of the 2008 ACM conference onComputer supported cooperative work. ACM, New York, NY,USA, 47–56.

Liangjie Hong Web Mining for Wikipedia

Page 89: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Bibliography XVII

Fallis, D. 2008.Toward an epistemology of wikipedia.Journal of the American Society for Information Science andTechnology 59, 10, 1662–1674.

Weld, D. S., Hoffmann, R., and Wu, F. 2008.Using wikipedia to bootstrap open information extraction.SIGMOD Rec. 37, 4, 62–68.

Wu, F., Hoffmann, R., and Weld, D. S. 2008.Information extraction from wikipedia: moving down the longtail.In KDD ’08: Proceeding of the 14th ACM SIGKDDinternational conference on Knowledge discovery and datamining. ACM, New York, NY, USA, 731–739.

Liangjie Hong Web Mining for Wikipedia

Page 90: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Bibliography XVIII

Potthast, M., Stein, B., and Anderka, M. 2008.A wikipedia-based multilingual retrieval model.In Proceedings of the IR research, 30th European conferenceon Advances in information retrieval. ECIR’08.Springer-Verlag, Berlin, Heidelberg, 522–530.

Huang, A., Milne, D., Frank, E., and Witten, I. H.2008.Clustering documents with active learning using wikipedia.In ICDM ’08: Proceedings of the 2008 Eighth IEEEInternational Conference on Data Mining. IEEE ComputerSociety, Washington, DC, USA, 839–844.

Liangjie Hong Web Mining for Wikipedia

Page 91: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Bibliography XIX

Tan, B. and Peng, F. 2008.Unsupervised query segmentation using generative languagemodels and wikipedia.In WWW ’08: Proceeding of the 17th international conferenceon World Wide Web. ACM, New York, NY, USA, 347–356.

Vuong, B.-Q., Lim, E.-P., Sun, A., Le, M.-T., andLauw, H. W. 2008.On ranking controversies in wikipedia: models and evaluation.In WSDM ’08: Proceedings of the international conference onWeb search and web data mining. ACM, New York, NY, USA,171–182.

Wu, F. and Weld, D. S. 2008.Automatically refining the wikipedia infobox ontology.In WWW ’08: Proceeding of the 17th international conferenceon World Wide Web. ACM, New York, NY, USA, 635–644.

Liangjie Hong Web Mining for Wikipedia

Page 92: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Bibliography XX

Brandes, U., Kenis, P., Lerner, J., and van Raaij,D. 2009.Network analysis of collaboration structure in wikipedia.In WWW ’09: Proceedings of the 18th internationalconference on World wide web. ACM, New York, NY, USA,731–740.

Santana, A. and Wood, D. J. 2009.Transparency and social responsibility issues for wikipedia.Ethics and Information Technology 11, 2, 133–144.

Wang, P., Hu, J., Zeng, H.-J., and Chen, Z. 2009.Using wikipedia knowledge to improve text classification.Knowl. Inf. Syst. 19, 3, 265–281.

Liangjie Hong Web Mining for Wikipedia

Page 93: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Bibliography XXI

Schroera, J. and Hertel, G. 2009.Voluntary engagement in an open web-based encyclopedia:Wikipedians and why they do it.Media Psychology 12.1, 96–120.

Kamps, J. and Koolen, M. 2009.Is wikipedia link structure different?In WSDM ’09: Proceedings of the Second ACM InternationalConference on Web Search and Data Mining. ACM, New York,NY, USA, 232–241.

Kittur, A., Chi, E. H., and Suh, B. 2009.What’s in wikipedia?: mapping topics and conflict usingsocially annotated category structure.In CHI ’09: Proceedings of the 27th international conferenceon Human factors in computing systems. ACM, New York,NY, USA, 1509–1512.

Liangjie Hong Web Mining for Wikipedia

Page 94: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Bibliography XXII

Han, X. and Zhao, J. 2009.Named entity disambiguation by leveraging wikipedia semanticknowledge.In CIKM ’09: Proceeding of the 18th ACM conference onInformation and knowledge management. ACM, New York,NY, USA, 215–224.

Jesus, R., Schwartz, M., and Lehmann, S. 2009.Bipartite networks of wikipedia’s articles and authors: ameso-level approach.In WikiSym ’09: Proceedings of the 5th InternationalSymposium on Wikis and Open Collaboration. ACM, NewYork, NY, USA, 1–10.

Liangjie Hong Web Mining for Wikipedia

Page 95: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Bibliography XXIII

Fogarolli, A. 2009.Word sense disambiguation based on wikipedia link structure.In Proceedings of the 2009 IEEE International Conference onSemantic Computing. ICSC ’09. IEEE Computer Society,Washington, DC, USA, 77–82.

Xu, Y., Jones, G. J., and Wang, B. 2009.Query dependent pseudo-relevance feedback based onwikipedia.In SIGIR ’09: Proceedings of the 32nd international ACMSIGIR conference on Research and development in informationretrieval. ACM, New York, NY, USA, 59–66.

Liangjie Hong Web Mining for Wikipedia

Page 96: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Bibliography XXIV

West, R., Precup, D., and Pineau, J. 2009.Completing wikipedia’s hyperlink structure throughdimensionality reduction.In CIKM ’09: Proceeding of the 18th ACM conference onInformation and knowledge management. ACM, New York,NY, USA, 1097–1106.

Wohner, T. and Peters, R. 2009.Assessing the quality of wikipedia articles with lifecycle basedmetrics.In Proceedings of the 5th International Symposium on Wikisand Open Collaboration. WikiSym ’09. ACM, New York, NY,USA, 16:1–16:10.

Liangjie Hong Web Mining for Wikipedia

Page 97: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Bibliography XXV

Odon de Alencar, R., Davis, Jr., C. A., andGoncalves, M. A. 2010.Geographical classification of documents using evidence fromwikipedia.In GIR ’10: Proceedings of the 6th Workshop on GeographicInformation Retrieval. ACM, New York, NY, USA, 1–8.

Giuliano, C., Gliozzo, A., Gangemi, A., andTymoshenko, K. 2010.Acquiring thesauri from wikis by exploiting domain models andlexical substitution.In The Semantic Web: Research and Applications. LectureNotes in Computer Science, vol. 6089. Springer Berlin /Heidelberg, 121–135.

Liangjie Hong Web Mining for Wikipedia

Page 98: Web Mining for Wikipedia - Hong, · PDF fileWeb Mining for Wikipedia The Depth-Study Presentation ... The second experiments done by [Halavais and Lacka 2008] is to compare articles

Bibliography XXVI

Meij, E. and de Rijke, M. 2010.Supervised query modeling using wikipedia.In Proceeding of the 33rd international ACM SIGIR conferenceon Research and development in information retrieval. SIGIR’10. ACM, New York, NY, USA, 875–876.

Choi, B., Alexander, K., Kraut, R. E., and Levine,J. M. 2010.Socialization tactics in wikipedia and their effects.In CSCW ’10: Proceedings of the 2010 ACM conference onComputer supported cooperative work. ACM, New York, NY,USA, 107–116.

Liangjie Hong Web Mining for Wikipedia