visual analysis of controversy in user-generated encyclopedias

15
Visual analysis of controversy in user-generated encyclopedias Ulrik Brandes 1 Jürgen Lerner 1 1 Department of Computer & Information Science, University of Konstanz, Konstanz, Germany Correspondence: Jürgen Lerner, Department of Computer and Information Science, University of Konstanz, Box D 67, 78457 Konstanz, Germany. Tel: +49 7531 88 4436; Fax: +49 7531 88 3577; E-mail: [email protected] A preliminary version of this paper appeared in Brandes and Lerner. 1 Abstract Wikipedia is a large and rapidly growing Web-based collaborative authoring environment, where anyone on the Internet can create, modify, and delete pages about encyclopedic topics. A remarkable property of some Wikipedia pages is that they are written by up to thousands of authors who may have contradicting opinions. In this paper, we show that a visual analysis of the 'who revises whom'-network gives deep insight into controversies. We propose a set of analysis and visualization techniques that reveal the dominant authors of a page, the roles they play, and the alters they confront. Thereby we provide tools to understand how Wikipedia authors collaborate in the presence of controversy. Information Visualization (2008) 7, 34 -- 48. doi:10.1057/palgrave.ivs.9500171 Keywords: Wikipedia; social network analysis; controversy Introduction Recently, the World Wide Web (WWW) has witnessed a shift from websites supplied by traditional information providers like universities or compa- nies to sites where every user can not only read but also modify content. A remarkable example of such sites is the user-generated online encyclopedia Wikipedia which allows every user (even anonymously) to create, modify, and delete pages about encyclopedic topics. This approach – which is so entirely different from traditional encyclopedia-writing by domain experts and supervised by editors – seemed to be destined to fail from the begin- ning. Not only could users (ignorantly or maliciously) introduce inaccurate information but also could delete previously written good articles, thereby making every progress impossible. Despite these concerns, Wikipedia turned out to produce much better articles than expected. A study carried out by Nature in 2005 suggests that the accuracy of Wikipedia articles about scien- tific topics comes close to the accuracy of their counterparts in the Ency- clopædia Britannica. 2 Viégas et al. 3,4 observed that antisocial behavior like vandalism (e.g., deletion of whole pages or insertion of vulgarities) is often repaired within minutes. Another indicator of Wikipedia’s success is simply its ever-increasing popularity: at the end of 2006, Wikipedia has more than five million articles – about 1.5 million alone in the English Wikipedia – and grows by several thousand articles per day (http://stats.wikimedia.org/). Furthermore, Wikipedia ranges among the top 20 in Alexa’s most visited sites (http://www.alexa.com/). In this paper, we are interested in how do Wikipedia authors collaborate when writing about controversial topics (such as abortion, gun rights vs gun control), delicate historic events, or persons who are highly impor- tant in politics. Such pages have often been revised up to tens of thou- sands of times by several thousand authors who, arguably, not all share the same opinion on the particular topic. Although Wikipedia policies 5

Upload: others

Post on 25-May-2022

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Visual analysis of controversy in user-generated encyclopedias

Visual analysis of controversy inuser-generated encyclopedias�

Ulrik Brandes1

Jürgen Lerner1

1Department of Computer & InformationScience, University of Konstanz, Konstanz,Germany

Correspondence:Jürgen Lerner,Department of Computer andInformation Science, University ofKonstanz, Box D 67, 78457 Konstanz, Germany.Tel: +497531884436;Fax: +497531883577;E-mail: [email protected]

� A preliminary version of this paperappeared in Brandes and Lerner.1

AbstractWikipedia is a large and rapidly growing Web-based collaborative authoringenvironment, where anyone on the Internet can create, modify, and deletepages about encyclopedic topics. A remarkable property of some Wikipediapages is that they are written by up to thousands of authors who may havecontradicting opinions. In this paper, we show that a visual analysis of the 'whorevises whom'-network gives deep insight into controversies. We propose aset of analysis and visualization techniques that reveal the dominant authors ofa page, the roles they play, and the alters they confront. Thereby we providetools to understand how Wikipedia authors collaborate in the presence ofcontroversy.Information Visualization (2008) 7, 34--48. doi:10.1057/palgrave.ivs.9500171

Keywords: Wikipedia; social network analysis; controversy

IntroductionRecently, the World Wide Web (WWW) has witnessed a shift from websitessupplied by traditional information providers like universities or compa-nies to sites where every user can not only read but also modify content. Aremarkable example of such sites is the user-generated online encyclopediaWikipedia which allows every user (even anonymously) to create, modify,and delete pages about encyclopedic topics. This approach – which is soentirely different from traditional encyclopedia-writing by domain expertsand supervised by editors – seemed to be destined to fail from the begin-ning. Not only could users (ignorantly or maliciously) introduce inaccurateinformation but also could delete previously written good articles, therebymaking every progress impossible. Despite these concerns, Wikipedia turnedout to produce much better articles than expected. A study carried out byNature in 2005 suggests that the accuracy of Wikipedia articles about scien-tific topics comes close to the accuracy of their counterparts in the Ency-clopædia Britannica.2 Viégas et al.3,4 observed that antisocial behavior likevandalism (e.g., deletion of whole pages or insertion of vulgarities) is oftenrepaired within minutes. Another indicator of Wikipedia’s success is simplyits ever-increasing popularity: at the end of 2006, Wikipedia has more thanfive million articles – about 1.5 million alone in the English Wikipedia –and grows by several thousand articles per day (http://stats.wikimedia.org/).Furthermore, Wikipedia ranges among the top 20 in Alexa’s most visitedsites (http://www.alexa.com/).

In this paper, we are interested in how do Wikipedia authors collaboratewhen writing about controversial topics (such as abortion, gun rights vsgun control), delicate historic events, or persons who are highly impor-tant in politics. Such pages have often been revised up to tens of thou-sands of times by several thousand authors who, arguably, not all sharethe same opinion on the particular topic. Although Wikipedia policies5

Page 2: Visual analysis of controversy in user-generated encyclopedias

35

Figure 1 Small part of the revision history of the page Gunpolitics. This page has 1101 revisions in the November 2006database dump.

urge authors to take a neutral point of view and to provideonly facts rather than opinions, controversies are never-theless reflected in some pages. Since some facts appear tosupport more a certain opinion and reject or discredit theother, it is fiercely fought over whether such facts shouldbe mentioned and how could balance be established.

We do not see it as a fundamental drawback ofWikipedia that controversies are reflected in the devel-opment of (some of) its pages. Different opinions simplyexist in society and, since Wikipedia is ‘the free encyclo-pedia that anyone can edit,’ it is a good mirror of suchcontroversies. However, this gives rise to several impor-tant questions: First of all, to assess the neutrality of agiven controversial page, it is crucial (and very informa-tive) to know about ongoing and past disputes and aboutbeliefs and opinions of the various authors. Even moreimportant is to understand in general the social processof content-generation in Wikipedia. Concrete questionsinclude whether controversial pages converge at all orwhether they are destined to perpetual editing and, ifthey converge, is their content balanced or determinedby opinion groups. Furthermore, what are the roles thatWikipedia authors typically play when arguing for oragainst specific statements in the page.

Support to answer these questions comes fromWikipedia itself which makes available not only thecurrent content of a page but also its complete history.The analyst is thereby enabled to see all past versionsand time, content, comment, and author of the variousedits (see e.g., Figure 1). Needless to say that thetypical size of the revision history of a disputed pagecalls for automated visual and analytic support toget insight into the page’s development and authorcommunity.

In this paper, we show that a visual analysis of the ‘whorevises whom’-network gives deep insight into the author-community behind a controversial page. We provide a setof analysis and visualization techniques that reveal the

dominant authors, the roles they play, and the alters theyconfront.

The rest of this paper is organized as follows. Our contri-butions in relation to previous work are explained in thenext section. In the subsequent section we define therevision network. The fourth section introduces severalmeaningful author properties and how they are visuallyrepresented and the next section presents some illus-trating findings on particular pages.

Related work and contributionsWeb 2.0 is a common term for denoting those sites ofthe WWW where Internet users are not just readers butcan actively participate. Specific forms include blogs,wikis, podcasting, file sharing, and social networkingsites (see, e.g., Kolbitsch and Maurer6 for an overview).In this paper, we analyze the author community ofwikis, that is, Web-based collaborative authoring envi-ronments where anyone on the Internet can create,edit, and delete pages. The term wiki was coined byWard Cunningham, who launched the first wiki in1995.7 Wikipedia, which is currently the largest wiki,has been established in 2001 to collectively create anencyclopedia. Maybe due to its size, popularity, and rele-vance for understanding new forms of collective knowl-edge creation, Wikipedia receives increasing interest inresearch. For instance, Wikipedia’s growth rate, informa-tion quality, or edit histories have been analyzed.3,4,8–10

Other papers (e.g., Gabrilovich and Markovitch11 andStrube and Ponzetto12) use the collection of Wikipediaarticles to improve machine learning techniques for textcategorization and detection of semantic relatedness ofterms.

It has been widely recognized that user-generatedcontent is also a rich source for user opinions. Some papers(e.g., Chen et al.,13 Glance et al.,14 Liu et al.15 and Nigamand Hurst16) apply natural language processing (NLP) todetermine users’ sentiments about positive or negativeaspects of commercial products. Agrawal et al.17 arguedthat ‘links carry less noisy information than text’ andapplied a network analysis approach to divide newsgroupauthors into two opposite camps: those that have a posi-tive opinion on a certain topic and those that have anegative opinion. They completely ignored the contentof postings and used only the ‘responded-to’ relationshipbetween authors. It is argued (and validated) that peoplerespond more frequently to a message when they disagreethan when they agree. Thus, partitioning the networkinto two groups such that most links are between thegroups will reveal the opposing camps. Note that previouswork on user opinions13–17 assumes the existence ofonly two poles of opinion (positive and negative), whichis certainly a restriction to generality. However, researchabout multipolar conflicts (i.e., situations where thereare more than two camps that are mutually in oppo-sition) can be found in political science, for exampleRosecrance.18

Page 3: Visual analysis of controversy in user-generated encyclopedias

36

Our work here is based on the idea from Agrawalet al.17 that controversy is reflected in the replybehavior (revision behavior in our case) of authorsbut achieves several improvements. Instead of thestrict partition of authors into opinion groups, wepropose a visual analytics approach that can deal withmore complex and more realistic controversy struc-tures and in addition reveals authors’ involvement androles.

Independently, Kittur et al.19,20 applied a similar ideaby building the RevertGraph to analyze disagreementamong authors. Our proposal of the revision networkcan encode conflicts in more general situations, sinceusing only reverts ‘cannot detect conflicts between userswho were not involved in reverts’ Kittur,19 [p. 460].Furthermore, in addition to different opinion groups,our method reveals several author characteristics. Lastbut not least, the spectral layout method outlined inSection ‘What position do they take?’ seems to be prefer-able to the force-directed method from Kittur et al.19

and Suh et al.20 since it optimizes a well-defined crite-rion function, cannot be stuck in local minima, and isquite robust to noise (compare Brandes et al.21). Notethat Kittur et al. provide additional results in differentdirections by analyzing the global cost of coordinationand learning models to predict whether an article iscontroversial.

Viégas et al.3,4 proposed a history flow approach forthe visual analysis of the page history. The history flowdiagrams show the development of the content of a pageover time and are therefore orthogonal to our work sincewe analyze the page’s author community.

The determination of the authors’ positions developedin Section ‘What position do they take?’ is a general-ization of the method that we proposed for the anal-ysis of political conflicts.21 The method from the currentpaper can deal with more general conflict structures (e.g.,multipolar conflicts). Furthermore, we make several visualand analytical enhancements that have been necessary torepresent well the complex interaction structure betweenWikipedia authors.

Concrete contributions of our paper include thefollowing. First, the definition of the revision network is asimple, efficient, and language-independent way to repre-sent controversies among Wikipedia authors. Note thatthis approach can be applied to Wikipedia articles in anylanguage without the need for adapting NLP algorithms.This is a significant advantage since for most languages,text processing algorithms are not so highly developed asfor English. Second, we define a set of author character-istics or properties that give deep insight into the overallstructure of the community as well as into individualauthors’ roles. Third, we develop visualization techniquesto show the author characteristics simultaneously in asimple and easy to understand picture. Last but not least,several case studies of controversial pages have a valueon their own in revealing some typical author roles andpatterns of confrontation.

It is important to note that our analysis cannot anddoes not attempt to determine which opinion is moreacceptable.

Revision networkThe definition of the ‘who-revises-whom’-network (inshort revision network) is a crucial step to develop an effi-cient and robust method for analyzing interaction amongWikipedia authors. In contrast, approaches based on NLPwould not only have to solve the difficult task of auto-matically understand natural language (compare Agrawalet al.17) but would also have to deal with much largerfile sizes (see Section ‘Input data’). We describe the inputdata in the subsection below before defining the revisionnetwork in the next subsection.

Input dataWikipedia makes its complete database (containing allversions of every article since its initial creation) avail-able in XML-format.22 The files containing the completehistory of all pages can be extremely large. For instance,the complete dump for the English Wikipedia unpacksto more than 600 gigabytes (GB). Wikipedia makes alsoavailable so-called stub-files. These files contain meta-data about every revision but not the text (see Figure 2 fora small portion) and are still quite large. For the presentstudy we used the stub-file for the English Wikipedia(which is the largest one) from the 20061130 dump witha size of 23GB. (Note that this dump includes some revi-sions from December 2006, since it takes several daysto create it.) The number of revisions (edits) of a pageand the number of authors that made at least one revi-sions can also be quite large. The most-revised page inthe English Wikipedia is George W. Bush having 33,086revisions and 10,167 different authors (registered oranonymous). Parsing the XML-document has been doneusing a Java implementation of the event-based SAXinterfaces23 which proved to be very efficient for parsingsuch huge files. Constructing the whole document tree,as this is normally done by DOM parsers,24 would simplybe impossible (at least very inefficient and/or requiringuncommonly huge memory), given the file sizes.

To abstract from the particular format we define arevision or edit to be a tuple of the form

r = ( page, time, author, comment, revert),

where page is a text-string denoting the page-title, timecontains the exact timestamp of the revision (given bythe second), author is a real user name if the contributorof the revision has been logged in or an IP-address if therevision has been done anonymously, comment is free textexplaining what has been done or why this revision hasbeen necessary (often authors have kind of a discussionin consecutive comments, compare Figure 2), and revert isa Boolean flag labeling the revision. (A revert is a specific

Page 4: Visual analysis of controversy in user-generated encyclopedias

37

Figure 2 Six consecutive revisions of the page Gun politics in XML format. (The corresponding HTML-view is part of Figure 1.)

edit where the author sets back the page content to anearlier version.)

Network constructionGiven a sequence R= (r1, . . . , rN) of revisions on the samepage, which is ordered by increasing timestamps, the asso-ciated revision network is a directed, weighted graph G =(V, E,�) defined as follows (also compare Figure 3).

• V is the set of authors that performed a revision in R.• E ⊆ V × V is the set of revision edges. For two different

authors u, v ∈ V the edge (u, v) ∈ E is introduced if thereare two consecutive revisions ri, ri+1 ∈ R such that u isthe author of ri+1 and v the author of ri. An edge (u, v)can be read as ‘u revises changes made by v’.

Figure 3 Revision network arising from the six revisions shownin Figure 2. Both edges go in both directions but edges from theleft to the right have higher weights since the correspondingrevisions are performed faster, compare (1).

• The function �:R → R assigns weights to edges. For anedge (u, v) the weight �(u, v) indicates how ‘urgent’ uconsiders it to revise the changes made by v (see moredetailed explanation below).

Page 5: Visual analysis of controversy in user-generated encyclopedias

38

Before explaining how the edge weights are definedwe will briefly discuss the meaningfulness of the revisionnetwork. Edges with high weight are interpreted later asdisagreements between the connected authors. To see howthe edge weights have to be defined to achieve this goal,assume that there are two (fictitious) authors Alice andBob connected by an edge. If Alice makes only once arevision immediately after Bob, then this may or may notindicate that she disagrees with his edits. If, on the otherhand, it is the case that Alice revises dozens of timesBob’s revisions (and especially if these revisions happenvery fast, e.g., within an hour or even within minutes),then it becomes very likely that she does not at all agreewith his edits. It turns out later that there are indeed suchpairs of authors on some highly controversial pages. Tosummarize these considerations, we assume that domi-nant revision patterns are meaningful but that not toomuch confidence should be put on single revisions. Thissimply means that the revision network has a typical char-acteristic of social network data, namely that of beingnoisy, and that it should only be analyzed/visualized withrobust methods. In a sense, the same considerations wouldapply to the construction of ‘quotation links’ for the anal-ysis of newsgroups.17 There it has been claimed that ‘it ismore likely that the quotation is made by a person chal-lenging or rebutting it rather than by someone supportingit’ Agrawal et al.17 [p. 529]. Of course not every singlequotation is necessarily antagonistic, but a huge numberis likely to indicate disagreement.

Thus, to define edge weights such that they are likely toindicate the magnitude of disagreement, fast revisions areassigned higher weights and weights of several revisionsbetween the same authors are added up. So, let ri, ri+1 betwo consecutive revisions on the same page where u is theauthor of ri+1 and v the author of ri. Let ti and ti+1 denotethe timestamps of ri and ri+1 respectively, � = ti+1 − tithe time difference between the two revisions, and �maxa maximum time limit when a revision is still consideredas a disagreement. Then, the weight of the edge (u, v) isdefined to be

�(u, v)={−�/�max + 1 if ���max,

0 else.(1)

If there are more pairs of consecutive revisions where urevises v, then the edge weights of (u, v) are summed up.

In the examples we defined the time limit �max to beequal to the average revision time. If a revision occurs atabout the average time, it becomes more unlikely thatit is meant as a disagreement. On the other hand if therevision occurs much faster than the average time, theprobability increases that it is indeed a correction ofthe previous edit. It is reasonable to count revisions moreheavily if they are reverts since this indicates that thereverting author considers the previous edit as obsolete oreven harmful. An even more sophisticated constructionof revision edges could be achieved by taking into accountthe comments made by authors. Since comments are free

text and not standardized this would involve NLP andwill not be considered in this paper.

Sometimes several Wikipedia pages have stronglyrelated topics (see, e.g., Section ‘Gun politics’) and thenoften largely overlapping sets of authors. In these situa-tions, it is appropriate to combine the associated revisionnetworks by taking the union of their author sets andadding up edge weights.

Visual analysis of the revision networkIn this section, we define a series of characteristics of therevision network and its actors (the Wikipedia authors)and how they are visually represented. These characteris-tics include for all authors their position (i.e., which otherauthors do they confront), their involvement in contro-versy, an indicator telling whether they are mostly revisorsor mostly being revised, and an indicator telling whethertheir edit behavior is rather constant over time (so thatthey showed sustained interest in the page) or highlyconcentrated on small time periods. See Figure 4 for animage showing these and a few other properties. Techni-cally most involved is the determination of the authors’positions. We will treat this issue in the next subsection.Graphical representation of this and other indicators isexplained and illustrated in Section ‘Visual representationof author properties’. In Section ‘Filtering’, two possibili-ties to prune the revision network and to detect relevantsubstructures are examined.

What position do they take?The position of a particular author should express whichother authors she confronts. Confrontation is reflected inthe revision edges: if two authors take different positionsthey disagree with the edits of the other and thereforewill frequently revise each other. (Asymmetry of edgesis ignored here but will be used later to determine theauthors’ roles.) Thus, if two authors u and v are connectedby a revision edge of large weight, then we want to drawu and v on opposite sides of the image. The difficulty liesin the fact that we have to draw not only two authors butalso the whole network such that all confronting pairs aresimultaneously as far from each other as possible (compareFigure 5). This objective (which contrasts to most objec-tive functions for graph drawing that traditionally wantto keep edge lengths as short as possible25) is of coursedue to the negative interpretation of the revision edges.The good news is that this problem is efficiently solvable,as will be derived next.

Let G = (V, E,�) be a revision network with author setV of cardinality n= |V |. We associate with G its symmetricadjacency matrix A= (auv) with rows and columns indexedby V and entries auv = �(u, v) + �(v, u) corresponding tothe sum of the weights of the two directed edges betweenthe two endpoints (if an edge is not present, the weightis simply equal to zero). We want to draw the conflictnetwork in two-dimensional space. Thus, the positions of

Page 6: Visual analysis of controversy in user-generated encyclopedias

39

Figure 4 Example visualization of a revision network (determined from Gun politics and related pages). Nodes represent thedifferent authors. If two authors are on opposite sides they strongly revise each other. Other characteristics are represented asdescribed in the legend on the right-hand side (also see Section 'Visual representation of author properties'). The diagram at thebottom shows the total number of edits per month. For more on this particular network, see Section 'Gun politics'.

all n authors are represented by two vectors x, y ∈ Rn. Iffor two authors u and v the entry auv in the adjacencymatrix is large (i.e., if they frequently revise each other),then they are well-represented by the coordinate vector xif the entry xu is (say) strongly negative and the entry xvstrongly positive. Then, the value xuauvxv is negative andhas quite large absolute value. Summing this up over allpairs of authors, x is determined to minimize the objectivefunction

�A(x)=∑u,v∈V

xuauvxv = xTAx

under the condition that x must have unit length (to keepthe drawing to the screen size). It follows from an alter-native description of the eigenvalues of a matrix that thisterm is minimized if and only if x is equal to the eigen-vector of A associated to the smallest eigenvalue �min (see,e.g., Golub and van Loan26). The second coordinate vectory is chosen to minimize �A(y) under the condition that yis normalized and orthogonal to x. This is solved by takingfor y the eigenvector of A associated to the second smallesteigenvalue �′

min.The coordinate vectors derived so far would already

represent well some pure conflict patterns as in Figure 5(middle) and (right). However, real data is normally notso balanced. For instance, it might be the case that inFigure 5 (middle) one side of the triangle consists only ofvery weak edges so that it approaches a bipolar conflict

Figure 5 Sample of pure conflict patterns. Bipolar conflict(left), 3-polar conflict (middle), and two independent bipolarconflicts (right). Actors that are in conflict are drawn as far fromeach other as possible. Conflicts in real data are often a mixtureof these types.

Figure 6 Smooth transformation from pure 3-polar conflict(left) to bipolar conflict (right). The dashed edges of the inter-mediate graph (middle) are assumed to have lower weight.

(compare Figure 6). To achieve a smooth transformationbetween different conflict patterns, we scale y with theratio between the two minimal eigenvalues �′

min/�min.The derivation why this rescaling interpolates betweendifferent conflict patterns is deferred to the next section‘Derivation of the scaling factor’. Note that a justification

Page 7: Visual analysis of controversy in user-generated encyclopedias

40

Figure 7 Revision network related to the page Kroatien (Croatia) in the German Wikipedia. The three most involved authorsare mutually in conflict and are displayed as a triangle. Note that the edge weights between these three authors vary and lessconnected users are drawn closer together, compare Section `Derivation of the scaling factor'.

is also provided by the examples shown in this article(compare Figure 7).

The absolute values of the two coordinates of an authorv are a measure of how much v is involved in controversy,since they indicate how strongly v is connected to othersvia revision edges.

Putting this together, we get the following algorithmfor determining the authors’ positions and involvement,which takes as input the symmetric adjacency matrix Aof the revision network.

1. Compute the smallest and second smallest eigenvalue�min and �′

min of A and the associated (normalized andorthogonal) eigenvectors x and y.

2. Set s=�′min/�min as the network’s skewness and define for

an author v its position p(v)= (p1(v), p2(v))= (xv, syv) ∈R2 and its involvement i(v)=

√p1(v)

2 + p2(v)2.

Efficient computation of the extremal eigenvalues andvectors is possible, for example, with the so-called orthog-onal iteration, which can also exploit sparsity of thenetwork (see Golub and van Loan.26)

Note that, although our layout method seems to besimilar to multidimensional scaling (MDS) on a distancematrix, it enjoys a further desirable property: MDS wouldtry to achieve distance zero for all authors that are notconnected, whereas our method requires in addition thatauthors must confront (approximately) the same othersto be placed at the same position. Thereby, independentconflicts (as in Figure 5 (right)) can be recognized as suchin the final drawing.

Derivation of the scaling factor As it is mentioned above,we propose to scale the eigenvector y that is associatedto the second smallest eigenvalue with the ratio betweenthe two minimal eigenvalues �′

min/�min. Here we want togive a heuristic derivation of this particular choice, illus-trated on hypothetical networks that exhibit certain pureconflict structures.

As a simple example, assume that the revision networkhas a structure similar to that in Figure 5 (right), butwhere the disagreement edges on the vertical axis have asmaller weight than those on the horizontal axis. Moreprecisely, assume that the authors of the revision networkare partitioned into four classes C1, . . . , C4, where eachclass consists of r authors. Furthermore, assume that everyauthor in class C1 is connected to every author in classC2 by an edge of weight w (the horizontal conflict), thatevery author in class C3 is connected to every author inclass C4 by an edge of weight �w, for a � between zeroand one, (the vertical conflict), and that the network hasno other edges. It is easy to show that in this case thetwo minimal eigenvalues are �min=−rw and �′

min=−�rw.Furthermore, the eigenvector x associated to �min assigns(modulo normalization) the value 1 to every author inclass C1, value −1 to every author in class C2 and zeroto authors in classes three and four. Likewise, the eigen-vector y associated to �min assigns (modulo normaliza-tion) the value 1 to every author in class C3, value −1 toevery author in class C4 and zero to authors in classes oneand two. By scaling y with the factor �′

min/�min = � andusing x and �y as coordinates we obtain a network visu-alization similar to Figure 5 (right) but where the authors

Page 8: Visual analysis of controversy in user-generated encyclopedias

41

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

d(2,

3)/d

(1,2

)

delta

Figure 8 The dashed line shows the computed ratiod(2,3)/d(1,2) for varying �. The straight line is the plot of thefunction y = x.

in classes C3 and C4 (the vertical conflict) are drawn at �times the distance between authors in classes C1 and C2(the horizontal conflict). Since the former pairs of authorsdisagree less (by a � factor) than the latter, this layoutrepresents the data quite well. From a different angle, thelayout satisfies the property that the edge length betweenconflicting authors is proportional to the weight of theedges connecting them.

Likewise, we can show by simulation that this prop-erty (almost) holds for three-polar conflict structures andhence that the proposed scaling interpolates well betweenthree-polar and bipolar conflicts, as illustrated in Figure 6.For the simulation setup, assume that the set of authorsis partitioned into three classes C1, . . . , C3, each of sizer, such that each author is connected by disagreementedges to all authors in the two other classes but not toauthors in her own class. Assume further that the edgesbetween classes C1 and C2 as well as the edges betweenclasses C1 and C3 have a positive weight w, while theedges between classes C2 and C3 have a lower weight �wfor a � between zero and one. By letting � vary from oneto zero we get a smooth interpolation from a balancedthree-polar to a bipolar conflict structure and we want toshow that the computed positions reflect this transfor-mation. It is easy to show that authors in the same classare assigned the same position and authors in differentclasses are assigned different positions. Let d(i, j), i, j =1,2,3 denote the distance between an author in class Ciand an author in class Cj. Our goal is to show that it(almost) holds that d(2,3)=�d(1,2)=�d(1,3), that is, thatthe classes C2 and C3 are by a �-factor closer to each otherthan class C1 to C2 or to C3 (in particular, if � varies fromone to zero, then the classes C2 and C3 get closer untilwe display the structure of a pure bipolar conflict). Thisequation does not hold exactly but, as Figure 8 shows, theratio d(2,3)/d(1,2) is almost equal to �, and hence, the

computed positions follow the network structure reason-ably close.

Visual representation of author propertiesNext we define several additional characteristics of theauthors and explain how these (and the previouslydefined position and involvement) are graphically repre-sented.

Position and involvement Let ( p1(v), p2(v)) be the posi-tion of an author v and i(v) her involvement as definedin the previous subsection. The position coordinates( p1(v), p2(v)) could be directly used for drawing authorsin two-dimensional space. However, when doing so,many of the not-so-important authors would be placednear the center of the drawing, making it hard to recog-nize their positions (compare Figure 9 with Figure 10).To overcome this we normalize the positions to drawauthors on an ellipse: Let r1 be the horizontal half axis(value determined by the size of the image) and r2 = sr1the vertical half axis. We draw author v at the coordinates(r1 · p1(v)/i(v), r2p2(v)/i(v)). Normalizing author positionsto an ellipse rather than a circle has the advantage thatwe can still distinguish between the dominant conflict(shown horizontally) and secondary conflict (shownvertically) if their magnitude is different. Compare, forexample, Figure 11, where the dominant conflict is muchstronger than the secondary conflict, and 11 where thetwo major conflicts are of similar magnitude.

The area size of the node representing v is proportionalto the involvement i(v) (shape will be explained in thesubsection below). Thus, even after the normalization ofpositions it is still possible to distinguish between differentinvolvement of authors.

We draw the usernames (or IP-addresses in case ofanonymous contributors) of the most involved authorsas node labels. Printing all usernames would produceclutter, considering that the revision networks can haveseveral thousand authors.

Revisor vs being revised The out-degree d+(v) = ∑(v,u)∈E

�(v, u) of an author v indicates how strongly she revisesothers and is called her degree as a revisor, the indegreed−(v) = ∑

(u,v)∈E�(u, v) indicates how strongly she isrevised by others and is called her degree of being revised.We draw v as an ellipse with height/width-ratio propor-tional to d+(v)/d−(v), while keeping the area size propor-tional to the involvement i(v). (However, to avoid verythin ellipses we bound the aspect ratio.)

The distinction between revisors (high and narrow) andthose that are revised (wide and flat) is a very interestingone: Those who are mostly revisors seem to be quitesatisfied with a page and react only to revise changesmade by others. On the other hand, those that are mostlyrevised attempt to initialize changes to a page that arenot approved by the revisors and therefore corrected

Page 9: Visual analysis of controversy in user-generated encyclopedias

42

Figure 9 Revision network of the page George W. Bush in 2005. Two very busy revisors (Everyking and Shanes) opposedto numerous anonymous users -- all of them much less involved than the former two. It is likely that Everyking and Shanesplay the role of 'watchdogs', defending the page against vandalism.

Figure 10 Same image as Figure 9 without the normalization of positions. The structure among the less-involved authors is hardto recognize. Note that the authors' involvement is also encoded in Figure 9 by their size.

very fast. To use terms that are easy to remember, therevised authors play the role of revolutionaries, the revisorsthe role of reactionaries. Note that these roles are to be

understood relative to the content of a page: writing thepage as desired by the revised would probably interchangethe roles.

Page 10: Visual analysis of controversy in user-generated encyclopedias

43

Figure 11 Filtering in time: a peak in the revision plot of Gun politics during 2003 has been caused by authors that vanish inthe global image shown in Figure 4.

As an example consider Figure 4, where the anonymousauthor 24.12.208.181 is frequently revised – mostly bythe revisors Yaf and Mmx1 (see Section ‘Gun politics’). Itis likely that authors keeping a page on their watch-list,as well as authors fighting vandalism also play the role ofrevisors (see Section ‘Vandalism defense’).

Direction of revision edges The edges show a dark-grey tolight-grey gradient from the revising author to the revisedauthor (compare the edge from Yaf to 24.12.208.181in Figure 4). If an edge is almost symmetric it showsuniformly dark-grey. The information encoded by asym-metric edges is finer grained than that of the nodes’ aspectratio (as defined in Section ‘Revisor vs being revised’):An author who is both, revisor and revised, appears asa circle, nevertheless, she may have asymmetric edgesto some specific alters. The line thickness of an edge ischosen proportional to its weight and we show only theedges with the highest weights.

Steady vs unsteady participation One further indicatorprovides an important distinction between differentauthor roles: there are authors that show sustainedinterest in editing a certain page and there are authorsthat perform a huge number of edits in a small timeinterval and loose their interest afterwards (or some-times get blocked from editing Wikipedia). To assess thisdistinction we define a measure of how much does theweekly participation of an author vary. The decision ‘oneweek’ is in a certain sense arbitrary and exchangeable by

longer or shorter intervals of time. However, we havechosen a week as this marks how people normally orga-nize their work (an author being very active on weekendsand inactive during the week will not be considered asunsteady).

Let a particular author and page be fixed and let(ei)i=1,...,K denote this author’s number of edits on thatpage performed in week i. The sum � = ∑K

i=1ei/K is the

mean value (edits per week) and �2 = ∑Ki=1(ei − �)2/K is

the variance of the author’s edit volume. However, vari-ance is not yet an appropriate measure for the unsteadi-ness of a author, since authors with higher mean willnormally have higher variance. This drawback can beovercome by considering the relative standard deviation�/�. This makes sense since the edit volume is alwayspositive (authors with no edits are not in the network).However, the normalization gives un-proportional weightto authors that have very small mean, for example, thosethat perform only one edit to the page. Since we arenot interested in such peripheral authors, we will simplyignore them and apply the normalization only for thosethat exceed a certain minimum number of edits.

The relative standard deviation �/� is still not an appro-priate measure for the unsteadiness, due to an observedcharacteristic of the input data: the variance of the aggre-gated number of edits (i.e., edits performed by all authors)can reach extremely high values (see, e.g., Figure 12), sothat on those pages all (busy) authors will appear as highlyunsteady. Since we are interested in differences betweenthe authors (rather than absolute values), we subtract the

Page 11: Visual analysis of controversy in user-generated encyclopedias

44

Figure 12 Page on Hezbollah has a very high variance in its edit volume. The strong peak in 2006 (2213 edits in August 2006alone) is probably triggered by the 2006 Israel--Lebanon conflict (compare27).

minimum value of �/�, so that the minimum becomeszero, and normalize so that the maximal value becomesone.

The node color of an author is black if this unsteadinessindicator is zero, that is, if the author showed sustainedinterest in the page. It becomes red if this indicator is one,that is, if the participation frequency is the most volatile.For instance, the anonymous user 24.12.208.181 inFigure 4 is slightly unsteady and Yafnot very unsteady(compare Section ‘Gun politics’).

Total number of edits per month The aggregated editvolume performed by all authors of the analyzed page (orset of pages) is visualized in a bar chart at the bottom ofthe image. This diagram provides the information abouttime periods when this page was a ‘hot topic’ (compareSection ‘News triggered pages’) and can also provideclues to restrict the revision network to interesting timeintervals, see Section ‘Restriction to time intervals’.

FilteringVisualizing the complete revision network over the wholelifetime of the page gives an overview revealing the mostimportant authors, the roles they play, and the otherauthors they confront. Next we describe how relevantsub-structures of the revision network can be determined.

Restriction to time intervals The edit volume diagramshown at the bottom of the images reveals time pointswhen the page receives much interest. It is straightforward

to restrict the revision network by including only revi-sions within a certain time interval. For instance, Figure 11shows the revision network of Gun politics duringa rise of interest in the earlier stages of the page. Thedominant authors during that time are different fromthe dominant authors over the whole lifetime (shown inFigure 4). Restricting the network to specific time inter-vals also enables the analyst to examine the most recentdevelopment.

Restriction to relevant sub-networks A revision networkoften contains several ongoing controversies that arealmost independent, that is, involving disjoint sets ofauthors. For instance, one controversy can be due todifferent opinions of the authors (see, e.g., Figure 4and Section ‘Gun politics’) and another conflict canarise between vandals and vandalism repair (see, e.g.,Figure 13). Since such controversies might overlap in time,it is in general not possible to separate them by restrictionto time intervals as described in Section ‘Restriction totime intervals’. Instead, an approach based on networkclustering, which is described in the following, performsquite well in doing this task.

The goal of the network clustering is to put authorsthat strongly revise each other into the same cluster andauthors that have only little interaction into different clus-ters. The sub-networks induced by the various clusters arethen analyzed separately. In general, density-based graphclustering is a hard task (compare Gaertler28). We used avariant of spectral graph clustering heuristics proposed,for example, in Kannan et al.29 and McSherry.30 These

Page 12: Visual analysis of controversy in user-generated encyclopedias

45

Figure 13 Network clustering reveals a relevant sub-network of the revision network of Gun politics. Another controversycluster of larger aggregated edge weight is similar to Figure 4 and not shown separately. User Tawkerbot2 is not a real authorbut a script for vandalism repair; its dominant opponents are anonymous users. It seems that this image shows revisions causedby vandalism, overlapping in time with the dispute over different opinions shown in Figure 4.

spectral heuristics are efficient, received much empir-ical and theoretical support (see Kannan et al.29 andMcSherry30 and references therein), and also performedquite well in the examples that we considered. Figure 13shows a meaningful sub-network determined by networkclustering.

Examples of pages and patternsIn this section, we describe a sample of illustrating find-ings on specific pages and some patterns that could repeat-edly be observed.

Gun politicsThe issue gun rights vs gun control is a typical pro/contopic. Several Wikipedia pages, like Gun politics, Gunpolitics in the United States, etc. are related tothis topic and have largely overlapping author commu-nities. We took the union of the associated revisionnetworks which are built together from 4609 revisionsby 781 different authors. This network, which is shownin Figure 4, contains several interesting subnetworks thatare extracted either by filtering in time (compare Section‘Restriction to time intervals’) or by network clustering(compare Section ‘Restriction to relevant sub-networks’).For space limitations we will describe only the globalview in Figure 4.

The dominant confrontation in this network is clearlybetween Yaf and the anonymous user 24.12.208.181

(which we abbreviate in the following with 181). (Strictlyspoken it is not clear whether the same IP implies thesame person – however, looking at the sustained interestof 181 in gun politics makes us believe that this is thecase.) Looking at Yaf’s user-page – the user-page of a userUName can be accessed under

http://en.Wikipedia.org/wiki/User:UName

– makes it rather simple to guess that he/she advocates thefreedom to carry guns. In contrast, looking at the contri-butions of 181 – all contributions of a user UName can beaccessed under

http://en.Wikipedia.org/wiki/Special:Contributions/UName

– makes it almost evident that he/she takes the oppositepoint of view. The author 181 shows a slightly unsteadyedit behavior and is therefore drawn in dark-red inFigure 4. Indeed, 181 performed almost a hundred editsin Wikipedia – all of them between November 2005 andApril 2006 and almost all to pages related to gun poli-tics. Besides differences in opinion, another distinctionbetween these two users is that Yaf is more a revisorand 181 more revised (see Section ‘Revisor vs beingrevised’). The asymmetry of the edge between these twousers is mostly due to a couple of very quick revisions(within less than 5mins) where Yaf reverts edits madeby 181.

Page 13: Visual analysis of controversy in user-generated encyclopedias

46

Interestingly, some Wikipedia authors chose a usernamethat itself expresses a certain orientation. For instance, thename GunsKill (also shown in Figure 4) already givesa indication that this author may advocate more guncontrol (looking at his/her contributions further supportsthis). It is remarkable that this user is – similarly as 181 –more revised (mostly from Rhobite) than revisor.

While names like GunsKill indicate a certain opinionwith respect to a specific topic, names like Yafnot indi-cate a negative feeling towards another Wikipedia user(Yaf in this case). Not surprisingly, Yafnot and Yaf areon opposite sides in Figure 4. Yafnot shows a very highvariance in his/her edit behavior and is therefore drawnin red. Indeed, this author made only seven contributionsto Wikipedia – all on April 2, 2006 in a period of less than2h and all to the page Gun politics in the UnitedStates. Author Yafnot is an example of a user that didnot contribute much (only seven edits) but is quite a lotinvolved in controversy (among the nine most involvedusers in Figure 4).

Looking in detail at the sequence of edits of Gunpolitics in the United States on April 2, 2006,taking into account the positions of Yaf, Yafnot, and181 in Figure 4, and considering the purposeful name ofYafnot, on could come to the hypothesis that Yafnotand 181 are the same person. Indeed, Yaf had the sameidea, as the following quote (taken from the user talkpage of Rhobite, archive nine31) indicates:

User 24.12.208.181 has apparently taken the user nameYafnot after your 2nd Level warning. He has continuedto delete content of Gun politics in the United States.Thanks. Yaf 06:14, 2 April 2006 (UTC)

It is difficult to prove this hypothesis conclusively,without access to the log-files of the Wikipedia server.In any case, user Yafnot was blocked on April 2, 2006(still less than 2h after his/her first edit) by Rhobite forimpersonation.

Three-lateral conflictsWhile the page Gun politics is related to a typicalpro/con topic, other topics can lead to more compli-cated controversy structures. The three most involvedauthors (Fossa, Perun, and Capriccio) of the pageKroatien (Croatia) in the German Wikipedia are mutu-ally connected by strong revision edges and are thereforedrawn in a triangle. (Note that, since the edge betweenPerun and Capriccio has smaller weight, these twousers are drawn closer together, as it is claimed in Section‘Derivation of the scaling factor’.) The less involvedauthors are often connected to only one of the dominantthree and are therefore drawn at the opposite position.The deeper reason for this three-polar conflict structurecan only by found out by reading statements on userpages and user-talk pages. The history of Croatia isstrongly connected to the wars following the end of the

Yugoslavian republic and disputes related to this page areoften about the interpretation of historic events or overwhether certain famous persons can be categorized asCroats or rather Yugoslavs. A rough qualitative evaluationof the related discussions suggests that some users favormore a Croatian point of view, others prefer a Serbianpoint of view, and yet others see themselves as Yugoslavs(or more generally as Europeans) and strongly oppose tonationalism of every direction. However, when analyzingthe revision network related to Kroatien and reading therelated discussions it became evident that our non-NLPapproach reaches its limitations when applied to situ-ations where we have to make such subtle distinctionsin the disagreement relations. A promising direction forfuture work would be to merge the automatically gener-ated revision network with an interaction network thathas been constructed (manually or by a semi-automaticprocedure) by a qualitative analysis of discussion pages,that is, interpreting discussion entries as supportive,neutral, or antagonistic with respect to some other authorand adding this information to the revision network.Note that usually the number of entries on discussionpages is by far smaller than the number of revisions onthe related Wikipedia pages. Thus, this procedure wouldstill be applicable to high-interest pages, since the largestpart of the work is done automatically.

Vandalism defenseA typical pattern emerges when analyzing the pageGeorge W. Bush. This page is the most edited in theEnglish Wikipedia (more than 30,000 revisions by morethan 10,000 authors), is a frequent target of vandalism,and was the first Wikipedia page that become protected(cf. Viégas et al.4).

The network visualization (see Figure 9) reveals twodominant users playing the role of revisors, which areopposed to a huge number of much less involved anony-mous alters. User Shanes is a Wikipedia administratorand user Everyking a former administrator who had thisstatus in 2005. A significant difference between the pagesGun politics (see Figure 4) and George W. Bush is thatin the former the dominant authors confront dominantalters. It is likely that the users confronting Everykingand Shanes in Figure 9 are not really interested in writinga good article but rather want to vandalize the page. Onthe other hand, the dominant authors of Gun politicsseem to care about its content, since they contributed alot (although they have quite different ideas of what is agood Gun politics page).

Figure 10 shows the layout of the same network withoutthe normalization of authors’ position to an ellipse. Thecenter of this drawing obviously gets too crowded and it ishard to recognize the positions of the unimportant actors.

Obstinate vandalsThe revision network of some pages shows a struc-ture that could be described by ‘one user against the

Page 14: Visual analysis of controversy in user-generated encyclopedias

47

Figure 14 Revision network related to Freemasonry showing a certain author (Lightbringer) in opposition to nearly anybodyelse. Note that, in contrast to Shanes and Everyking in Figure 9, Lightbringer is more revised than revisor and, in addition,shows a very high variance in the edit frequency.

rest of the world.’ Figure 14 shows the network relatedto Freemasonry around the end of 2005. The authorLightbringer is in opposition to most other authors,is more revised than revisor and shows a very unsteadyedit behavior (drawn in red). Lightbringer’s user pagereveals that he/she is not only banned from Wikipediabut also tried to continue editing pages using various IP-addresses or different user names (so-called sock puppets).

On a first glance, it seems to be hard to distinguishbetween users who are fighting many vandals (‘watch-dogs’, e.g., Shanes and Everyking in Figure 9) and userswho are very active vandals opposed by many others (asLightbringer in Figure 14), since both types of usersstand rather on their own against a mass of other users.However, differences are the following two: first, vandalsare more likely to be revised and vandal-fighters are morelikely to be revisors. Second, active vandals that eventu-ally became blocked have necessarily a high edit varianceand therefore are shown in red color.

News-triggered pagesThe edit history of some Wikipedia pages is strongly influ-enced by political events. An extremal example is the pageon Hezbollah (see Figure 12). Although this page existsin Wikipedia since October 2001, it only became a hottopic during the 2006 Israel–Lebanon conflict and calmeddown afterwards. We showed in Brandes and Lerner27

that the pages Hezbollah and 2006 Israel--Lebanonconflict indeed have a very high correlation in theiredit frequencies over time. Furthermore, these correlation

values could be used to establish relationships betweenpages with high edit variance.

ConclusionWikipedia makes it possible to assess the author commu-nity behind an article by providing the complete edithistory of a page. However, the sheer number of edits andauthors makes it hard to understand this data withoutautomatic support.

The main contribution of our work lies in the proposedtechniques for visual analysis of the revision network.Our drawings easily reveal the authors that are the mostinvolved in controversy (taking the number of edits asa measure for user involvement would be insufficientas the example of Yafnot in Section ‘Gun politics’shows). Furthermore, our network visualizations showwho confronts whom and who plays which role.

Another contribution is that we identified some recur-rent patterns of confrontation in the examples we consid-ered: both Figure 4 and Figure 9 show a high asymmetryin the sense that users on one side of the conflict play therole of revisors and users on the other side are revised.However, the interpretation of the revisor vs revisedpattern can be quite different. In Figure 4 it seems tobe caused by differences in opinion and in Figure 9 byvandalism.

One issue for future work is to determine more conclu-sively the difference between opinion-triggered andvandalism-triggered confrontation. Possibilities include

Page 15: Visual analysis of controversy in user-generated encyclopedias

48

to make use of log data about user blocking, statementson talk pages or user-talk pages, or contributions ofan author to other pages. Another issue is to improvethe construction of the revision network by taking intoaccount whose text has been changed during a revisionor by augmenting the revision network by relations deter-mined from discussion pages, as this has been outlinedin Section ‘Three-lateral conflicts’. Note that the networkvisualization technique is totally independent from thenetwork construction procedure, as long as the edgescan be interpreted as disagreements or more generally asnegative relations between actors. In particular, the visu-alization technique could be used to visualize controver-sies among users in other domains, for example, Usenetgroups or blogs.

AcknowledgementsThis research was supported by DFG under grant Br2158/2-3.

References1 Brandes U, Lerner J. Visual analysis of controversy in user-

generated encyclopedias. Proceedings of the IEEE Symposium onVisual Analytics Science and Technology (VAST’07) (Sacramento,CA), IEEE Computer Society: Chicago, 2007; 179–186 (PaperChairs: John Dill and William Ribarsky).

2 Giles J. Internet encyclopaedias go head to head. Nature 2005;438: 900–901.

3 Viégas FB, Wattenberg M, Dave K. Studying cooperation andconflict between authors with history flow visualizations.Proceedings of the SIGCHI Conference on Human Factors in ComputingSystems (Vienna, Austria), ACM: New York, 2004; 575–582(Conference Chairs: Elizabeth Dykstra-Erickson and ManfredTscheligi).

4 Viégas FB, Wattenberg M, Kriss J, van Ham F. Talk before youtype: Coordination in Wikipedia. Proceedings of 40th AnnualHawaii International Conference on System Sciences (HICSS’07), IEEEComputer Society: Chicago, 2007; 78a (Conference Chair: RalphH. Sprague, Jr.).

5 Wikipedia. List of policies [WWW document]. http://en.Wikipedia.org/wiki/Wikipedia:List_of_policies (accessed 30November 2007).

6 Kolbitsch J, Maurer H. The transformation of the Web: howemerging communities shape the information we consume.Journal of Universal Computer Science 2006; 12: 187–213.

7 Leuf B, Cunningham W. The Wiki Way. Addison-Wesley: Reading,MA, 2001.

8 Holloway T, Bozicevic M, Börner K. Analyzing and visualizing thesemantic coverage of Wikipedia and its authors. Complexity 2007;12: 30–40.

9 Stvilia B, Twindale MB, Smith LC, Gasser L. Assessing informationquality of a community-based encyclopedia. Proceedings of theInternational Conference on Information Quality (Cambridge, MA),2005; 442–454 (unpublished) (Program Chairs: Michael Gertz,Michael Mielke, and Felix Naumann).

10 Voss J. Measuring Wikipedia. Proceedings of the InternationalConference of the International Society for Scientometrics andInformetrics (Stockholm, Sweden), 2005 (unpublished).

11 Gabrilovich E, Markovitch S. Overcoming the brittlenessbottleneck using Wikipedia: Enhancing text categorizationwith encyclopedic knowledge. Proceedings of the 21st NationalConference on Artificial Intelligence (Boston, Massachusetts), AAAIPress: New York, 2006 (Program Co-chairs: Yolanda Gil andRaymond J. Mooney).

12 Strube M, Ponzetto SP. WikiRelate! computing semanticrelatedness using Wikipedia. Proceedings of the Twenty-First

National Conference on Artificial Intelligence (AAAI’06) (Boston,MA), AAAI: New York, 2006 (Program Co-chairs: Yolanda Gil, andRaymond J. Mooney).

13 Chen C, Ibekwe-SanJuan F, Sanjuan E, Weaver C. Visual analysisof conflicting opinions. IEEE Symposium on Visual Analytics’06(Baltimore, MD), IEEE Computer Society: Chicago, 2006; 59–66(Paper Chairs: Pak Chung Wong and Daniel Keim).

14 Glance N, Hurst M, Nigam K, Siegler M, Stockton R, Tomokiyo T.Deriving marketing intelligence from online discussion.Proceeding of the Eleventh ACM SIGKDD International Conference onKnowledge Discovery in Data Mining (Chicago, Illinois), ACM: NewYork, 2005; 419–428 (General Chairs: Robert Grossman, ProgramChairs: Roberto Bayardo and Kristin Bennet).

15 Liu B, Hu M, Cheng J. Opinion observer: analyzing andcomparing opini ons on the Web. Proceedings of the 14thInternational Conference on World Wide Web (Chiba, Japan), ACM:New York, 2005; 342–351 (Conference Chairs: Allan Ellis andTatsuya Hagino).

16 Nigam K, Hurst M. Towards a robust metric of opinion. Proceedingsof the AAAI Spring Symposium on Exploring Attitude and Affectin Text (Palo Alto, CA), 2004 (AAAI Technical Report SS-04-07)(Co-Chairs: Yan Qu, James G. Shanahan, and Janyce Wiebe).

17 Agrawal R, Rajagopalan S, Srikant R, Xu Y. Mining newsgroupsusing networks arising from social behavior. Proceedings of the 12thInternational Conference on World Wide Web (Budapest, Hungary),ACM: New York, 2003; 529–535 (Conference Chairs: GusztávHencsey and Bebo White, Program Chairs: Yih-Farn Robin Chen,László Kovács, and Program Chair: Steve Lawrence).

18 Rosecrance RN. Bipolarity, multipolarity, and the future. Journalof Conflict Resolution 1966; 10: 314–327.

19 Kittur A, Suh B, Pendleton BA, Chi EH. He says, she says:conflict and coordination in Wikipedia. Proceedings of the SIGCHIConference on Human Factors in Computing Systems (San Jose,CA), ACM: New York, 2007; 453–462 (General Chair: Mary BethRosson, Program Chair: David Gilmore).

20 Suh B, Chi EH, Pendleton BA, Kittur A. Us vs. them:understanding social dynamics in Wikipedia with revert graphvisualizations. Proceedings of the IEEE Symposium on VisualAnalytics Science and Technology (VAST’07) (Sacramento, CA), IEEEComputer Society: Chicago, 2007; 163–170 (Papers Chairs: JohnDill and William Ribarsky).

21 Brandes U, Fleischer D, Lerner J. Summarizing dynamicbipolar conflict structures. IEEE Transactions on Visualization andComputer Graphics, special issue on Visual Analytics 2006; 12:1486–1499.

22 Wikimedia. Downloads [WWW document]. http://download.wikimedia.org/ (accessed 30 November 2007).

23 SAX. Simple API for XML [WWW document]. http://www.saxproject.org/ (accessed 30 November 2007).

24 DOM. Document object model [WWW document]. http://www.w3.org/DOM/ (accessed 30 November 2007).

25 Kaufmann M, Wagner D. Drawing Graphs. Springer Verlag: NewYork, NY.2001.

26 Golub GH, van Loan CF. Matrix Computations. John HopkinsUniversity Press: Baltimore, MA.1996.

27 Brandes U, Lerner J. Revision and co-revision in Wikipedia.Proceedings of the International Workshop on Bridging the GapBetween Semantic Web and Web 2.0 at the 4th European SemanticWeb Conference (ESWC’07) (Innsbruck, Austria), 2007; 85–96(unpublished) (Workshop Chairs: Bettina Hoser and AndreasHotho).

28 Gaertler M. Clustering. In: Brandes U and Erlebach T (Eds).Network Analysis. Springer Verlag: New York, NY, 2005; 187–215.

29 Kannan R, Vempala S, Vetta A. On clusterings: good, bad andspectral. Journal of the ACM 2004; 51: 497–515.

30 McSherry F. Spectral partitioning of random graphs. Proceedingsof the 42nd Annual IEEE Symposium on Foundations of ComputerScience (FOCS’01) (Las Vegas, Nevada), IEEE Computer Society:Chicago, 2001; 529–537 (Program Chair: Moni Naor).

31 Wikipedia. Archived user talk page of [WWW document]. http://en.Wikipedia.org/wiki/User_talk:Rhobite/Archive_9 (accessed 30November 2007).