wikipedia on twitter: analyzing tweets about wikipedia

Post on 21-Jan-2018

597 Views

Category:

Science

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

The University of Innsbruck was founded in 1669 and is one of Austria’s oldest universities. Today, with over 28.000 students and 4.500 staff, it is

western Austria’s largest institution of higher education and research. For further information visit: www.uibk.ac.at.

#Wikipedia on Twitter:

Analyzing Tweets about Wikipedia

Eva Zangerle, Georg Schmidhammer, Günther Specht

4

5

Research Questions

RQ3: Does the number of tweets about a certain articlecorrelate to a recent edit and hence, an update of thepage?

RQ2: Which features do Wikipedia articles that are popularon Twitter exhibit/share?

RQ1: How popular are the various Wikipedias on Twitter andin which language contexts are these referenced?

6

Dataset

• Crawl of Twitter using keyword „wikipedia“

• 2014/10/20 – 2015/03/10

• Total of 4.5 million tweets

• Cleaning of dataset

• Tweets with Wikipedia URL

• Normalization of URLs (also mobile URLs)

• Retweets remain within the set

22% of all Wikipedia-URLs articlesare mobile URLs

7

Dataset

Characteristic Raw Cleaned

Tweets 4,530,967 2,468,055

Retweets 1,440,122 659,641

Distinct Users 1,730,984 844,975

Mentions 3,334,848 1,880,687

Distinct Hashtags 159,231 118,912

Hashtag Usages 1,528,458 778,737

Distinct URLs 1,447,124 1,121,825

URL Usages 3,393,846 2,793,900

63.24% of all tweets contain 1

URL (maximum: 6 URLs)

77.72% of all URLs point to a

Wikipedia page

8

Tweets per Day

9

General Observations: Users

• Long-tailed distribution

• Average number of tweets per user: 2.92

• However: maximum number of tweets per user: 64,521

• 19 of 20 most popular users are bots (404 users in total; 264k tweets)

E. Zangerle, G. Schmidhammer, G. Specht: Analysing the Usage of Wikipedia on Twitter: Understanding Inter-Language Links

(accepted at HICSS 2016)

RQ1

Language Analyses

11

Language Distribution

• Analysis of tweeted Wikipedia article in regards to language

• Extract Wikipedia edition (language) from URL

Missing: context, underlying data.

Language Total Share

English (en) 1,349,623 52.81%

Japanese (ja) 579,157 22.66%

Spanish (es) 140,396 5.49%

Turkish (tr) 78,235 3.06%

French (fr) 64,139 2.51%

German (de) 52,256 2.04%

Russian (ru) 44,347 1.74%

Arabian (ar) 38,757 1.52%

Korean (ko) 27,261 1.07%

Portuguese (pt) 26,442 1.03%

12

Correlation of Language and Wikipedia Size Measures

Measure Spearman‘s ρ

Total number of articles .76*

Edits .65*

Users .46*

Admins .42*

Active users .39*

Images .39*

Depth1 .35*

* Significant at the 0.001 level

1 Depth = Edits/Articles x Non-Articles/Articles x [1-Stub-ratio]

13

Tweet Languages

Language Share

English 42.90%

Japanese 21.92%

Spanish 5.77%

Arabian 2.56%

French 2.37%

Turkish 2.24%

German 1.75%

Indonesian 1.56%

Russian 1.35%

Language Share

English (en) 52.81%

Japanese (ja) 22.66%

Spanish (es) 5.49%

Turkish (tr) 3.06%

French (fr) 2.51%

German (de) 2.04%

Russian (ru) 1.74%

Arabian (ar) 1.52%

Korean (ko) 1.07%

Tweets Wikipedias referenced

14

Inter-language links

Wikipedia Language

Twit

ter

Lan

guag

e

en ja es ar fr tr de id ru pt

en 97.33% 0.19% 0.42% 0.03% 0.33% 0.05% 0.35% 0.12% 0.10% 0.05%

ja 5.48% 93.56% 0.04% 0.01% 0.11% 0.03% 0.20% 0.01% 0.05% 0.01%

es 19.65% 0.28% 77.48% 0.01% 0.62% 0.03% 0.32% 0.07% 0.03% 0.51%

ar 26.58% 0.02% 0.12% 72.79% 0.17% 0.02% 0.02% 0.00% 0.00% 0.00%

fr 20.21% 0.19% 1.11% 1.92% 74.73% 0.03% 0.73% 0.02% 0.05% 0.17%

tr 20.78% 0.01% 0.17% 0.00% 0.18% 77.62% 0.83% 0.04% 0.10% 0.02%

de 21.15% 0.59% 1.41% 0.06% 0.44% 0.13% 74.94% 0.04% 0.04% 0.06%

id 49.83% 1.20% 1.77% 0.16% 0.60% 0.40% 0.91% 42.84% 0.06% 0.26%

ru 17.74% 0.10% 0.05% 0.00% 0.14% 0.03% 0.32% 0.00% 78.38% 0.01%

pt 28.90% 0.73% 6.91% 0.01% 0.75% 0.05% 0.46% 0.09% 0.03% 60.87%

20% of all tweets link toanother language.

85% of all inter-languagelinks do not have a

counterpart in original language.

15

Inter-Language Links

• 85% of all links leading to a Wikipedia of a language different from thetweet‘s language do not have a counterpart in the user‘s language

• Remaining 15%: Wikipedia actually used is significantly better in terms ofquality than language in tweet‘s language

E. Zangerle, G. Schmidhammer, G. Specht: Analysing the Usage of Wikipedia on Twitter: Understanding Inter-Language Links

(accepted at HICSS 2016)

RQ2

Top Articles and Categories

17

Methods

• Tweets about English Wikipedia

• 52.81% of all tweets

• Total of 724,974 references to Wikipedia

• Total of 336,605 distinct English Wikipedia articles

• Extract article titles and categories from DBPedia

• Resolve extended URLs (e.g., diff-pages, access to old revisions, etc).

18

Distribution: Tweets per Articles

64% of all articlesonly tweeted once

19

Top Articles

Article No. of Tweets Share

diff 54,432 7,51%

cod_wars 6,868 0,95%

user:Giraffedata/comprised_of 4,541 0,63%

matthew_ziff 2,100 0,29%

kidz_bop 2,015 0,28%

gamergate 1,703 0,23%

old_revision 1,517 0,21%

search 1,383 0,19%

the_little_mermaid_(1989_film) 1,370 0,19%

No article standing out particularly.

20

Top Categories

Category No. of Tweets Share

Living people 105,895 14,61%

English-language films 18,331 2,53%

American films 9,605 1,32%

Wars involving the United Kingdom 7,487 1,03%

American male television actors 7,255 1,00%

20th-century conflicts 7,158 0,99%

American male film actors 6,981 0,96%

20th-century military history of the United Kingdom 6,968 0,96%

Law of the sea 6,953 0,96%

Wars involving Iceland 6,928 0,96%

RQ3

Edits and Tweets

22

Methods

• Crawled via MediaWiki API

• Tweets about English Wikipedia articles (724,974 references to 336,605 distinct articles)

• Observation period: +/- 24 hours of a tweet

• 543,788 edits in total

• 91,577 edits marked as minor

• 312,160 tweets link to an article edited within +/- 24 hours of tweet

• 233,962 tweets: edit occured before tweet

• 215,192 tweets: edit occured after tweet

• No correlation between number of edits and number of tweets: Pearson‘s r: 0.06 (at0.001 significance level)

• Exception: events

23

Conclusion

RQ1: 20% of all tweets link to a Wikipedia of another language.

RQ2: No particular categories or articles are significantly more popular onTwitter. Longtail-distribution for articles (64% of all English articles only tweetedonce).

RQ3: No correlation between number of edits and popularity of article onTwitter can be detected.

24

Future Work

• Look into inter-language links

• Tweets as quality measure

• Look into those tweets about Wikipedia without mentioning a particulararticle (qualitatively)

• Interested in joining forces?

25

#questions? http://en.wikipedia.org/wiki/Q&A #wikipedia

@eva_zangerle

eva.zangerle@uibk.ac.at

http://www.evazangerle.at

@dbisibk

http://dbis-informatik.uibk.ac.at

https://www.facebook.com/dbisibk

26

The University of Innsbruck was founded in 1669 and is one of Austria’s oldest universities. Today, with over 28.000 students and 4.500 staff, it is

western Austria’s largest institution of higher education and research. For further information visit: www.uibk.ac.at.

#Wikipedia on Twitter:

Analyzing Tweets about Wikipedia

Eva Zangerle, Georg Schmidhammer, Günther Specht

top related