gender gap in collaborative platforms: language and emotions in wikipedia discussions

58
Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions David Laniado, Daniela Iosub, Carlos Castillo, Mayo Fuster Morell and Andreas Kaltenbrunner [email protected] Universitat Pompeu Fabra, January 17, 2017 David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 1 / 58

Upload: david-laniado

Post on 29-Jan-2018

332 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Gender Gap in Collaborative Platforms:Language and emotions in Wikipedia Discussions

David Laniado, Daniela Iosub, Carlos Castillo,Mayo Fuster Morell and Andreas Kaltenbrunner

[email protected]

Universitat Pompeu Fabra, January 17, 2017

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 1 / 58

Page 2: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Outline

1 IntroductionWikipedia content biasesWikipedia gender gapWikipedia discussion spacesGoal and research questions

2 Framework of analysisData acquisition and pre-processingUser gender labellingLanguage and sentiment analysis

3 ResultsEmotions and statusEmotions and genderNetworked emotions

4 Conclusions

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 2 / 58

Page 3: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Outline

1 IntroductionWikipedia content biasesWikipedia gender gapWikipedia discussion spacesGoal and research questions

2 Framework of analysisData acquisition and pre-processingUser gender labellingLanguage and sentiment analysis

3 ResultsEmotions and statusEmotions and genderNetworked emotions

4 Conclusions

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 3 / 58

Page 4: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Wikipedia is a teenager

Happy birthdayWikipedia!

English Wikipedia isnow 16 years old

Catalan Wikipedia willbe 16 in March (thesecond oldest one)

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 4 / 58

Page 5: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

The largest human knowledge repository

Fifth most visited web siteAmong top results for search queries about almost any topicLargest collaborative projectConditions and reflects public opinion... with some bias

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 5 / 58

Page 6: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Wikipedia’s social experiment

"The problem with Wikipedia is that it only works in practice. In theory,it can never work."

(Wikipedian popular joke)

A crazy idea: anyone can editAll relevant points of view should be representedPolicies of Notability and Neutral Point of viewQuality assured by editors’ negotiations over contentThe more people with different points of view contributing, thebetter the quality→ Biases in the editor community may cause biases in the content

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 6 / 58

Page 7: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Outline

1 IntroductionWikipedia content biasesWikipedia gender gapWikipedia discussion spacesGoal and research questions

2 Framework of analysisData acquisition and pre-processingUser gender labellingLanguage and sentiment analysis

3 ResultsEmotions and statusEmotions and genderNetworked emotions

4 Conclusions

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 7 / 58

Page 8: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Bias in the content?

Top global biographies by birth country (Young-Ho Eom et al, 2015)top central biographies from each of the 24 major Wikipediasthe 100 most central (PageRank) in each version’s hyperlink network

→ striking geographic biashttp://www.quantware.ups-tlse.fr/QWLIB/topwikipeople/geofigs/pagerank24x100.html

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 8 / 58

Page 9: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Top Women Biographies

Rank NA PageRank female figures CC Century LC1 24 Elizabeth II UK 20 EN2 17 Mary (mother of Jesus) IL -1 HE3 12 Queen Victoria UK 19 EN4 6 Elizabeth I of England UK 16 EN5 2 Maria Theresa AT 18 DE6 1 Benazir Bhutto PK 20 HI7 1 Catherine the Great PL 18 PL8 1 Anne Frank DE 20 DE9 1 Indira Gandhi IN 20 HI10 1 Margrethe II of Denmark DK 20 DA

Top 10 global female historical figures by PageRank for the 24 majorWikipedia editions (Young-Ho Eom et al, 2015)

NA → number of language editions in which a biography appears in thetop 100 rankCC → birth country codeLC → language code corresponding to the birth country

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 9 / 58

Page 10: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Top biographies by gender

Number of women among the top global biographies by birth century

Number of women among the 100 most central biographies for eachlanguage edition

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 10 / 58

Page 11: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Outline

1 IntroductionWikipedia content biasesWikipedia gender gapWikipedia discussion spacesGoal and research questions

2 Framework of analysisData acquisition and pre-processingUser gender labellingLanguage and sentiment analysis

3 ResultsEmotions and statusEmotions and genderNetworked emotions

4 Conclusions

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 11 / 58

Page 12: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Wikipedia editor gender gap

Estimated women participation2011 editor survey: 9%2013 editor survey: 13%

corrected with propensity score estimation: 16% (Mako Hill andShaw, 2013)

while in most online social networks women are more active!

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 12 / 58

Page 13: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Why women do not edit Wikipedia?

1 A lack of user-friendliness in the editing interface2 Not having enough free time3 A lack of self-confidence4 Aversion to conflict and an unwillingness to participate in lengthy

edit wars5 Belief that their contributions are too likely to be reverted or

deleted6 Some find its overall atmosphere misogynistic7 Wikipedia culture is sexual in ways they find off-putting8 Being addressed as male is off-putting to women whose primary

language has grammatical gender9 Fewer opportunities than other sites for social relationships and a

welcoming tone

(Sue Gardener, 2011)David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 13 / 58

Page 14: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Outline

1 IntroductionWikipedia content biasesWikipedia gender gapWikipedia discussion spacesGoal and research questions

2 Framework of analysisData acquisition and pre-processingUser gender labellingLanguage and sentiment analysis

3 ResultsEmotions and statusEmotions and genderNetworked emotions

4 Conclusions

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 14 / 58

Page 15: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Emotional factors and discussions

Importance ofemotional factorsDiscussion spaces arefundamental to thecollaborative processDiscussion triggersemotions and breedsparticular emotionalenvironments

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 15 / 58

Page 16: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Wikipedia’s most visible side

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 16 / 58

Page 17: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Article talk pages

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 17 / 58

Page 18: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Discussions in article talk pages

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 18 / 58

Page 19: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Outline

1 IntroductionWikipedia content biasesWikipedia gender gapWikipedia discussion spacesGoal and research questions

2 Framework of analysisData acquisition and pre-processingUser gender labellingLanguage and sentiment analysis

3 ResultsEmotions and statusEmotions and genderNetworked emotions

4 Conclusions

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 19 / 58

Page 20: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Studying emotions and language in talk pages

Interactions in WikipediaImplicit→ editingExplicit→ communication

Article talk pages→ discussions about how to improve articlesUser talk pages→ a kind of public in-boxes

Goal: Shed light on the emotional dimension of the interactionsextensive analysis of emotions in explicit communicationsentiment analysis of comments in article talk and personal talkpages

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 20 / 58

Page 21: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Research questions

1 How are the emotional and communication styles of editorsaffected by their status?

2 How are the emotional and communication styles of editorsaffected by their gender?

3 How are the emotional expressions affected by interacting withothers in comment threads (emotional congruence)?

4 How are the emotional styles of editors related to those of theeditors they interact more frequently with (emotionalhomophily)?

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 21 / 58

Page 22: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Publications

Results published in:

Laniado, D., Castillo, C., Kaltenbrunner, A., and Fuster Morell, M. F. (2012)Emotions and dialogue in a peer-production community: the case of Wikipedia.8th International Symposium on Wikis and Open Collaboration, WikiSym’12

Iosub, D., Laniado, D., Castillo, C., Fuster Morell, M. F., and Kaltenbrunner, A. (2014)Emotions under Discussion: Gender, Status and Communication in Online Collaboration.Plos One, 9(8)

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 22 / 58

Page 23: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Outline

1 IntroductionWikipedia content biasesWikipedia gender gapWikipedia discussion spacesGoal and research questions

2 Framework of analysisData acquisition and pre-processingUser gender labellingLanguage and sentiment analysis

3 ResultsEmotions and statusEmotions and genderNetworked emotions

4 Conclusions

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 23 / 58

Page 24: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Outline

1 IntroductionWikipedia content biasesWikipedia gender gapWikipedia discussion spacesGoal and research questions

2 Framework of analysisData acquisition and pre-processingUser gender labellingLanguage and sentiment analysis

3 ResultsEmotions and statusEmotions and genderNetworked emotions

4 Conclusions

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 24 / 58

Page 25: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Dataset: conversationsExtracting conversations among editors from the English Wikipedia

Articles 3 210 039Articles with talk page (ATP) 871 485 (27.1%)Editors who comment articles 350 958Editors with ≥ 100 comments on ATP 12 231 (3.5%)Total comments in ATP 11 041 246Comments containing ANEW words 7 414 411 (67.2%)Comments by editors with ≥ 100 comments on ATP 5 480 544 (49.6%)Comments by these editors and with ANEW words 3 649 297 (33.3%)

Table: Data extracted from a complete dump of the English Wikipedia, datedMarch 12th, 2010

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 25 / 58

Page 26: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Outline

1 IntroductionWikipedia content biasesWikipedia gender gapWikipedia discussion spacesGoal and research questions

2 Framework of analysisData acquisition and pre-processingUser gender labellingLanguage and sentiment analysis

3 ResultsEmotions and statusEmotions and genderNetworked emotions

4 Conclusions

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 26 / 58

Page 27: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

User gender labelling

≈ 12 000 users wrote ≥ 100 comments in articles talk pagesGender identified through Wikipedia API for ≈ 2 000 of themA sample of 1 385 users for manual labelling throughcrowdsourcing (Crowdflower)

Non-admins Admins TotalMen 1 087 1 526 2 613Women 68 97 165Unknown 6 850 2 603 9 453Total 8 005 4 226 12 231

Table: Users with ≥ 100 comments by gender and administrator status.

Gender could be identified only for ≈ 50% of users:real name or username (50% of those identified)implicitly stated gender (27% of women, 20% of men)pronoun (15% of women, 10% of men)other indicators: userboxes, pictures, links to personal blogs...

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 27 / 58

Page 28: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Outline

1 IntroductionWikipedia content biasesWikipedia gender gapWikipedia discussion spacesGoal and research questions

2 Framework of analysisData acquisition and pre-processingUser gender labellingLanguage and sentiment analysis

3 ResultsEmotions and statusEmotions and genderNetworked emotions

4 Conclusions

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 28 / 58

Page 29: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Measuring the Emotional Content of Discussions

Lexicon-based methods

relying on three different instruments:

Affective norms for English words (ANEW)

Linguistic Inquiry and Word Count (LIWC)

SentiStrength

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 29 / 58

Page 30: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Measuring the Emotional Content of DiscussionsMethod 1: Affective norms for English words (ANEW)

Rates a list of 1060 frequent words on a 9 point scale in threedimensions:

Valence

Arousal

Dominance

assign emotion scores to each word from the lexicon

Bradley and Lang. (1999).Affective norms for English words (ANEW) Technical report C-1.The Center for Research in Psychophysiology, University of Florida, FL.

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 30 / 58

Page 31: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Measuring the Emotional Content of DiscussionsMethod 2: Linguistic Inquiry and Word Count (LIWC)

Two scores for basic emotion (compared with ANEW valence)positive emotionnegative emotion

Discrete measures of emotions (anger, anxiety, sadness, affect)Other classes of words to characterize language (i.e. personalpronouns, tentative words, fillers...)

→ Count the proportion of words belonging to each class

Pennebaker J, Chung C, Ireland M, Gonzales A, Booth R (2010).The development and psychometric properties of LIWC2007. Austin, TX.

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 31 / 58

Page 32: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Measuring the Emotional Content of DiscussionsMethod 2: Linguistic Inquiry and Word Count (LIWC)

Dictionary size ExamplesAnger 91 hate, kill, annoyed

Anxiety 84 worried, fearful, nervousSadness 101 crying, grief, sadTentative 155 maybe, perhaps, guessCertainty 83 always, never

Fillers 9 blah, you knowPast 155 went, ran, had

Present 169 is, does, hearFuture 48 will, gonna

Social words 455 mate, talk, child

Table: Description of LIWC measures (as per http://www.liwc.net).

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 32 / 58

Page 33: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Measuring Relationship-Orientation with LIWC

DefinitionCommunication that promotes socialaffiliation and emotional connection:

preoccupation with others (use ofpersonal pronouns, e.g., I, you)preoccupation with the larger socialdomain (e.g., references to friends andfamily)expression of positive emotion

ExamplesWe are glad to have you. If I can help at all letme know :)

A-giau has smiled at you. Smiles promoteWikiLove and hopefully this one has madeyour day better...Happy editing

Congrats! Thank you for your dedication.

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 33 / 58

Page 34: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Measuring the Emotional Content of DiscussionsMethod 3: SentiStrength

SentiStrengthBased on LIWC and developed for short web textsAccounts for modes of textual expression specific to the onlineenvironment, e.g. emoticons and abbreviationsProvides a positive and a negative score for emotional valenceEmotion score is the strongest positive and negative emotionexpressed in a commentFinal scores are averages over comments in a given category

Thelwall M, Buckley K, Paltoglou G, Cai D, Kappas A (2010)Sentiment strength detection in short informal text.Journal of the American Society for Information Science and Technology 61: 2544 – 2558.

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 34 / 58

Page 35: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Example: results with three different emotional lexica

Table: Example messages with their corresponding Valence(ANEW) orpositive & negative scores (LIWC, SentiStrength)

ANEW LIWC SentiSt.Valence + - + -

Sounds like a good challenge - to be proven or disproven. I’mhappy if it can be shown to go further using closed cubic poly-nomial solutions. The nice thing about these are that they arepretty easy to test numerically . . .

7.4 12.5 0 3 -2

–in “Exact trigonometric constants”

Seems you have not yet seen female lover after having sexwho do not wish to have sex with the same lover any more :)Once you’ve seen it, you understand very well what war of Venusmeans compared to war of Mars.

5.5 6.8 4.5 4 -3

–in “House (astrology)”

What about the whirlie hazing, the alcohol abuse, the emotionalpoverty, the suicide in 1995/6, the biotech plans which werestopped by pitzer protests . . .

1.6 4 8 1 -4

–in “Harvey Mudd College”

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 35 / 58

Page 36: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Sentiment analysisStatistical tests

Compute average values with the three lexica for each user

Compare distribution of values for two groups of users (e.g.:admins vs regulars, women vs men)

Most variables are not normally distributed

Mann-Whitney U-testCompare distributions of rankings

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 36 / 58

Page 37: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Outline

1 IntroductionWikipedia content biasesWikipedia gender gapWikipedia discussion spacesGoal and research questions

2 Framework of analysisData acquisition and pre-processingUser gender labellingLanguage and sentiment analysis

3 ResultsEmotions and statusEmotions and genderNetworked emotions

4 Conclusions

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 37 / 58

Page 38: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Outline

1 IntroductionWikipedia content biasesWikipedia gender gapWikipedia discussion spacesGoal and research questions

2 Framework of analysisData acquisition and pre-processingUser gender labellingLanguage and sentiment analysis

3 ResultsEmotions and statusEmotions and genderNetworked emotions

4 Conclusions

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 38 / 58

Page 39: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Emotions and Status

Table: Emotions and Status: Administrators promote a generally neutral toneon article talk pages. Regular editors express more negative emotion, andare more emotional.

(Article Talk) Regular Admin Mann-Whitney U-Test p-valueLIWCPositive 2.369 2.409 -4.308 p < 0.001Negative 1.368 1.120 -18.578 p < 0.001Affect 3.784 3.661 -8.466 p < 0.001Anxiety 0.180 0.166 -5.834 p < 0.001Anger 0.554 0.446 -19.217 p < 0.001Sadness 0.175 0.166 -4.450 p < 0.001SentiStrengthPositive 1.805 1.774 -14.603 p < 0.001Negative -2.005 -1.912 -23.046 p < 0.001

When difference is statistically significant (p-value in bold) the larger absolute value is underlined

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 39 / 58

Page 40: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Emotions and Status

Admins:more positive emotion(ANEW and LIWC)generally, emotionallyreserved compared toregular users (LIWC)

Regular users:more emotional

more affect, and moreanxiety, anger andsadness (LIWC)

stronger positive andnegative words thanadmins (SentiStrength)

Personal talk pagesIn personal talk pages, admins are more emotional compared tothe article talk pages

more positive emotion compared to regular editors, but also moreanxiety and sadness

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 40 / 58

Page 41: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Dialogue and Status

Table: Dialogue and Status: Administrators are more impersonal in article talkpages. Regular editors are more concerned with others.

(Article Talk) Regular Admin Mann-Whitney U-test p-valueRelationship-orientationPersonal pronouns 5.135 4.815 -13.561 p < 0.001Use of “I” 2.456 2.429 -1.733 p=0.083Use of “You” 1.043 0.892 -12.573 p < 0.001Use of “Shehe” 0.609 0.526 -8.657 p < 0.001Social words 6.320 5.810 -19.013 p < 0.001CertaintyCertainty 1.426 1.317 -16.824 p < 0.001Tentativeness 3.199 3.169 -2.210 p < 0.001Filler words 0.168 0.155 -6.687 p < 0.001Temporal OrientationPast 2.376 2.305 -5.696 p < 0.001Present 8.011 7.841 -8.060 p < 0.001Future 1.114 1.166 -9.887 p < 0.001

When difference is statistically significant (p-value in bold) the larger absolute value is underlined

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 41 / 58

Page 42: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Dialogue and Status

Adminsmore neutral andimpersonal toneless relationship orientedmore concerned with thefuturetend to "rule with reason"

Regular usersmore relationship-oriented

more personal pronounsand more social words

more concerned with pastmore insecure, but not inpersonal spaces

more certainty, tentativeand filler words in articletalk pages

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 42 / 58

Page 43: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Outline

1 IntroductionWikipedia content biasesWikipedia gender gapWikipedia discussion spacesGoal and research questions

2 Framework of analysisData acquisition and pre-processingUser gender labellingLanguage and sentiment analysis

3 ResultsEmotions and statusEmotions and genderNetworked emotions

4 Conclusions

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 43 / 58

Page 44: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Emotions and gender

ANEW Words more used by women and menSize accounts for difference in frequency

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 44 / 58

Page 45: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Emotions and gender

Women use words associated to more positive emotions

Result consistent and significant with the three lexicons

ANEW: Difference is not significant when normalising by article→ difference might be due to topic selection: women choose toparticipate in topics which have more positive discussions

No significant difference in expression of negative emotions

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 45 / 58

Page 46: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Topics, emotions and genderN≥1 ANEW words; corr=−0.64 (p=0.002)

prop. of male comments

mea

n va

lenc

e

0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.964.7

4.8

4.9

5

5.1

5.2

5.3

5.4

5.5

5.6

ComputingArtsPhilosophyLanguageHealthMathematicsBeliefSportsAgricultureEnvironmentTechn. & app. sci.LawSocietyBusinessEducationCulturePeopleSciencePoliticsGeography and placesHistory and events

Figure: Mean valence (ANEW) for discussions of articles in different topiccategories, vs the proportion of comments written by men

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 46 / 58

Page 47: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Dialogue and gender

Table: Dialogue and Gender: Women use a more relationship-orientedspeech style.

(Article Talk) Men Women Mann-Whitney U-test p-valueRelationship-orientationPersonal pronouns 4.964 5.420 -4.375 p < 0.001Use of “I” 2.488 2.764 -3.945 p < 0.001Use of “You” 0.936 0.957 -0.926 p=0.355Use of “Shehe” pronouns 0.541 0.713 -4.657 p < 0.001Social words 5.960 6.353 -3.487 p < 0.001CertaintyCertainty 1.346 (1397) 1.300 (1263) -2.078 p = 0.038*Tentativeness 3.150 3.215 -1.162 p=0.245Filler words 0.161 0.160 -0.137 p=0.891Temporal OrientationPast 2.325 2.543 -4.305 p < 0.001Present 7.897 8.180 -3.086 p = 0.002Future 1.168 1.147 -1.008 p=0.314

When difference is statistically significant (p-value in bold) the larger absolute value isunderlined. Cases where the averages are not informative are marked with an asterisk * and

include the mean ranks Mann-Whitney U-test next to the averages in parentheses.

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 47 / 58

Page 48: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Dialogue and Gender

Women write longer messages

Women are more relationship-orientedmore personal pronouns, in particular “I”, more social words

Women are not more insecureLess certainty words, no significant difference for tentativeness andfiller words

Women admins are more relationship oriented than men adminsDifferent leadership style

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 48 / 58

Page 49: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Qualitative analysis: Relationship orientation

Manual classification of 100 commentsThree main types of comments high in relationship-orientation:

inviting comments that explain the edit in a friendly tone, and call forfurther intervention and collaborationcommon perspective-building comments that are focused onunderstanding others and solving debates in a constructive mannerappreciative comments that contain positive emotions andcelebrate others’ actions

⇒ This suggests that relationship-orientation may be conducive tosuccessful collaboration

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 49 / 58

Page 50: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Outline

1 IntroductionWikipedia content biasesWikipedia gender gapWikipedia discussion spacesGoal and research questions

2 Framework of analysisData acquisition and pre-processingUser gender labellingLanguage and sentiment analysis

3 ResultsEmotions and statusEmotions and genderNetworked emotions

4 Conclusions

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 50 / 58

Page 51: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Emotional congruence

Comparison of each comment with the comment it replies tonot based only on our set of users, but on all comments (from allusers)

Emotions: editors tend to reply with:more positive emotionless negative emotionless anger, anxiety and sadnessstronger words, both positive and negative (SentiStrength)

Dialogue: editors tend to reply with:more relationship oriented speechless tentative and certainty words

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 51 / 58

Page 52: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Emotional homophilyMixing patterns: do users interact preferentially with similar users?

Disassortativity by activityusers who write more comments tend to reply preferentially to lessactive users, and viceversa

Assortativity by genderMen interact more with other men, and women with other women

Assortativity by emotion and languageUsers interact more with others similar in emotional expression andcommunication style

also in the network of communication on personal talk pages

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 52 / 58

Page 53: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Emotional homophilyExample: homophily by expression of anger

edges connect users whohave exchanged at least 10replies

node color represents thelevel of anger expressed by auser, from low to high

node size → proportional tothe number of connections ofa user

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 53 / 58

Page 54: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Outline

1 IntroductionWikipedia content biasesWikipedia gender gapWikipedia discussion spacesGoal and research questions

2 Framework of analysisData acquisition and pre-processingUser gender labellingLanguage and sentiment analysis

3 ResultsEmotions and statusEmotions and genderNetworked emotions

4 Conclusions

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 54 / 58

Page 55: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Conclusions

Administrators and experienced users play a pivotal rolethey tend to interact especially with less experienced usersthey promote a positive but impersonal environment

Men and women have a different communication stylewomen participate in discussions that have a more positive tonemen interact more with men, and women with womenwomen use a more emotional and relationship-oriented languagewomen admins have a relationship-oriented leadership style

⇒ promoting relationship-orientated leadership could lead to a morepositive environment

⇒ giving women more space in the community could result in a morewelcoming envoronment, for both women and men

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 55 / 58

Page 56: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Future work

Longitudinal analysishow do emotional styles of editors change over time and withincreasing experience?how do emotions in the discussions affect participation?

Qualitative analysis and human annotationinclude non-textual emotional aspects such as emoticons, barnstars and virtual giftsdeal with sarcasm, measure the extent of condescending orpaternalistic language in comments addressed at women editors

Examine other online spacesSimilar conclusions might hold for other online spacesespecially in discussions involving conflict, decision making andpower dynamics

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 56 / 58

Page 57: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Some referencesM. M. Bradley and P. J .Lang.Affective norms for English words (ANEW) Technical report C-1.The Center for Research in Psychophysiology, University of Florida, FL, 2012.

B. Collier and J. Bear.Conflict, criticism, or confidence: an empirical examination of the gender gap in wikipedia contributions.In Proc. of CSCW, 2012.

Eom, Y.H., Aragón, P., Laniado, D., Kaltenbrunner, A., Vigna, S., Shepelyansky, D.L.: Interactions of cultures and toppeople of wikipedia from ranking of 24 language editions.PLoS ONE 10(3), e0114,825 (2015).

Iosub, D., Laniado, D., Castillo, C., Fuster Morell, M. F., and Kaltenbrunner, A. (2014)Emotions under Discussion: Gender, Status and Communication in Online Collaboration.Plos One, 9(8)

O. Kucuktunc, B. B. Cambazoglu, I. Weber, and H. Ferhatosmanoglu.A large-scale sentiment analysis for Yahoo! answers.In Proc. of WSDM, 2012.

D. Laniado, R. Tasso, Y. Volkovich, and A. Kaltenbrunner.When the Wikipedians talk: Network and tree structure of Wikipedia discussion pages.In Proc. of ICWSM, 2011.

Laniado, D., Castillo, C., Kaltenbrunner, A., and Fuster Morell, M. F. (2012)Emotions and dialogue in a peer-production community: the case of Wikipedia.8th International Symposium on Wikis and Open Collaboration, WikiSym’12

H. Zhu, R. Kraut, A. KitturEffectiveness of shared leadership in online communities.In Proc. of CSCW, 2012.

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 57 / 58

Page 58: Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions

Questions?

David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 58 / 58