pacs numbers: 89.75.hc, 89.65.-s, 01.30.-y · in terms of the social aspects encoded in a...

Motif-based success scores in coauthorship networks are highly sensitive to authorname disambiguation∗

David F. Klosik† and Stefan Bornholdt‡

Institute for Theoretical Physics, University of Bremen, Hochschulring 18, 28359 Bremen, Germany

Marc-Thorsten Hütt§

School of Engineering and Science, Jacobs University Bremen, Campus Ring 1, 28759 Bremen, Germany

Following the work of Krumov et al. [Eur. Phys. J. B 84, 535 (2011)] we revisit the questionwhether the usage of large citation datasets allows for the quantitative assessment of social (bymeans of coauthorship of publications) influence on the progression of science. Applying a morecomprehensive and well-curated dataset containing the publications in the journals of the AmericanPhysical Society during the whole 20th century we find that the measure chosen in the originalstudy, a score based on small induced subgraphs, has to be used with caution, since the obtainedresults are highly sensitive to the exact implementation of the author disambiguation task.

PACS numbers: 89.75.Hc, 89.65.-s, 01.30.-y

I. INTRODUCTION

Ever since the seminal work of Kuhn [1] it is widelyaccepted that the institutional process of knowledge pro-duction, i.e. science, cannot be fully described in purelylogical, content-related terms, but has a significant socialaspect to it. However, although the scientific communityprovides a comprehensive bookkeeping of its efforts byciting earlier work in new publications, only recently hasthis information been made widely accessible in the formof electronic datasets. With the aggregated citation in-formation within a set of scientific publications at hand,one might now be able to quantitatively assess the ex-tent to which the social embedding of science influencesits structure and progression.

Traditionally, the focus of citation data analysis hasbeen on the single publication level; indeed, the mostprominent property of paper citation networks in whichthe vertices represent papers while directed edges rep-resent the citations between the publications has beendescribed by de Solla Price as early as 1965 [2]: The num-ber of citations a paper receives (i.e., the correspondingnode’s in-degree) is broadly distributed, rendering highlycited publications significantly more frequent than theywould be if scientists cited earlier work randomly. Sim-ilar broad degree distributions have been found in net-works describing, e.g., technical, social, or biological in-teractions [3], and the question which might be the mu-tual underlying process, led to a now well-studied modelclass for network growth governed by a rich-get-richer ef-fect, commonly referred to as preferential attachment [4–11]. In more recent times, citation data itself has mainlybeen used to quantify the assessment of scientific research

∗ published as Phys. Rev. E 90, 032811 (2014)† [email protected]‡ [email protected]§ [email protected]

thereby interpreting citations as indicators of impact orassignment of credit. To this end, numerous quantitativemeasures have been described which range from countingdirect citations to considering also indirect citation paths[12–16] and some of which aim at the scholar [17, 18] orthe journal level [19]. Additionally, there have also beenstructural investigations, e.g., regarding the communitylevel [20] or other topological properties such as the rich-ness of feed-forward loops in citation networks [21].

Another line of research has considered the collabora-tion network that can be constructed from citation datagiven the authorship metadata. More precisely, in a coau-thorship network vertices represent authors and are con-nected by an undirected link if the two correspondingauthors have coauthored one or more papers together.These networks have been investigated about a decadeago [22, 23] with the main findings being the rather broaddegree distribution and the strong small-world effect, i.e.,the short paths between scholars in the network. Thereare also few approaches combining both citation and col-laboration data, e.g., [24].

In this study we will construct a coauthorship networkfrom a citation dataset, while the actual citation infor-mation is used to assess the success of the resulting linksin the network.

While there have been studies about large-scale prop-erties of coauthorship networks such as path lengths orcommunity sizes and, of course, on the single-vertexscale, following Krumov et al. [25] here the focus willbe on the intermediate level of small induced subgraphsknown from the investigation of network motifs whichoriginated in the biological context [26, 27] and whoseapplication has drawn some considerable criticism sincethe null-model graph ensemble has to be chosen withgreat care [28, 29]. Note, however, that throughout thisstudy score does not refer to the otherwise commonlyused z score, since we are not interested in subgraph fre-quencies, but to the qm value that is defined below. Wewill focus on the three- and four-node undirected induced

arX

iv:1

403.

7827

v2 [

phys

ics.

soc-

ph]

2 O

ct 2

014

http://dx.doi.org/10.1140/epjb/e2011-10746-5http://dx.doi.org/10.1103/PhysRevE.90.032811mailto:[email protected]:[email protected]:[email protected]

2

subgraphs shown in Fig. 1.

Motif 02

Motif 11

Motif 23

Motif 33

Motif 42

Motif 54

Motif 62

Motif 71

FIG. 1. The subgraphs used in this study. In the bottom linethe minimum number of distinct papers required to build thesubgraph is given.

We construct the collaboration network from a cita-tion dataset provided by the American Physical Society(see Materials and Methods) by, first, identifying the ar-ticles’ authors given in the article metadata. All authorinstances (i.e., names in author lists) are then groupedso that, ideally, all instances corresponding to the sameactual scholar are represented by a single vertex in thenetwork. There are different possible implementationsof this author disambiguation task which has become anobject of investigation in its own right [30, 31] with meth-ods ranging from directly comparing the author names toincorporating metadata or even citation data. While inKrumov et al. one specific version has been used, herewe find that the score proposed to assess the correlationbetween the collaboration pattern and the success of thecorresponding papers is very sensitive to the exact imple-mentation of the author disambiguation. We stress thatthis kind of ambiguity is not restricted to coauthorshipnetworks but is also present in other network applications(see Sec. V). In the bibliography context, however, someeffects of disambiguation errors have been pointed out in[32].

Furthermore, we show how the score is affected by theexclusion of large collaborations when the length of theauthor list above which papers are discarded is varied; in[25] this value was fixed at a number of eight.

II. COMPUTATION

First, we review the computation of the score proposedby Krumov et al. Two distinct authors who have coau-thored are connected by an edge, e, that represents thelist of all their mutually published papers, P (e). Notethat by this procedure a single publication can be repre-sented by many edges. In order to assess whether thereis a correlation between the collaboration structure onthe small subgraph scale and the scientific impact of thepublications contained in the subgraph’s edges the num-ber of citations to those papers is used to, first, computethe average number of citations to the papers of a singleedge

w2 = 〈c〉e =1

|P (e)|∑

p∈P (e)c(p) (1)

and then take the average over all edges of all instancesof a specific motif, Mm, i.e.,

qm =1

Nm

∑m′∈Mm

qm,m′ =1

NmEm

∑m′∈Mm

∑e∈m′

〈c〉e (2)

where Nm = |Mm| gives the number of instances of thesubgraph of type m and Em yields the number of itsedges (e.g., for the box motif E5 = 4). A sketch of theprocedure is given in Fig. 2. If one now shuffles the cita-tion numbers among all publications the qm is uniformlydistributed, indicating that the motif-scale collaborationpatterns in the given coauthorship network and the pa-pers’ success are not correlated after reshuffling. Withthe particular choice of (1) the motif scores are expectedto yield the average number of citations of all papers inthe network for every motif. The fact that this holds inall our computations implies that we obtain proper aver-ages of the often broadly distributed citation numbers. In[25] alternative edge scores have been proposed but withthe current dataset (1) turns out to be the appropriatechoice. It is important to notice that the shuffling pro-cedure only affects the citation data on the edges of theotherwise fixed collaboration network, i.e., no topologicalshuffling is applied.

Krumov et al. report considerably higher qm scores forthe four-node motif called Motif 5 in Fig. 1 which theycall the box motif. This subgraph stands out since itneeds four distinct publications to be constructed (whilee.g. the four-node clique, Motif 7, might contain only onesingle mutually published paper) and there must not beany other collaborations between the four authors thanin the author pairs corresponding to the four edges. Thebox motif therefore is an anticlustered structure and con-siderably fewer instances are found than of the other mo-tif types. Again, we stress that we do not focus on motiffrequency.

In order to keep the network topology and the dynam-ical quantity separated, in contrast to [25] we do not dis-card edges which are composed only of papers that didnot receive any citation, but keep them in the network.

A. Maximum collaboration size

As mentioned above, in [25] all papers with more thaneight authors have been discarded. Here we allow for dif-ferent values of the maximum collaboration size (MCS)and investigate its influence on the qm score (Sec. IV A).In terms of the social aspects encoded in a coauthorshipnetwork the exclusion of very large collaborations can beargued for since the thousands of links between their au-thors will hardly represent the same degree of personalconnection as an edge between, e.g., two scholars pub-lishing in a team of two.

3

B. Author name disambiguation

While one of the more elaborate author disambiguationschemes might have been used here, in order to assess thesensitivity of the score proposed in [25] we instead chosetwo versions of a very simple measure which consideronly the authors’ names as provided in the author listsof the publications in the dataset [33]. In the all initials(allINIT) disambiguation two author instance names areconsidered identical if in addition to the surname all ini-tials are compatible (meaning that J. Smith and JohnSmith are merged). The second implementation requiresthe full first name strings to match; we consider this strict(STRICT) since it will separate author instances if thefirst name is abbreviated in one and written out in theother case like the ones from the above example. How-ever, there are surnames common enough to be shared bypeople with different given names; if these given nameshave compatible initials the allINIT method will spuri-ously merge them. In order to address this we applya third disambiguation which we will refer to as SPLIT:We apply independently the allINIT and STRICT disam-biguation to the same data and then track those allINITnames that are split into more than two distinct authorsin the STRICT implementation. All papers with thosenames among the authors are then filtered. On the re-sulting dataset we then perform all remaining steps ofthe calculation with the allINIT method applied. Notethat the result of this procedure depends on all previousdata filtering, especially the one according to the MCS,so the number of remaining papers has to be comparedto the number of publications in the original data af-ter discarding those with more than MCS authors. Inthe cases shown here the data had been prepared with aMCS value of 10 before the application of the SPLIT fil-tering and we find that while for the merge of the journals36% and in PRC 32% are filtered, in the remaining singlejournal portions around 80% of the papers are kept. Forcomparison, if one excluded all papers from the datasetthat had at least one author whose given name is onlyprovided in initialized form not only would over half ofthe dataset be excluded, but also this filtering would bebiased against large collaborations since there name ab-breviation is used to save printing space.

III. MATERIALS AND METHODS

The citation data used in this study is composed of allAPS journals published between July 1893 and Decem-ber 2009 and the associated information about citationsbetween these papers, and may therefore be assumed tocover many major 20th century contributions to physics.The dataset can be requested at the American PhysicalSociety [34]. The data is provided in separate files corre-sponding to the individual APS journals which empha-size different topics; we chose to exclude Reviews of Mod-ern Physics and the online-only journals Physical Review

data

set

construct networkauthor disambiguation

filter, . . . (e.g. PRE-only)

PetterMinnhagen

PetterHolme

M.E.J.Newman

KimSneppen

(. . . )(. . . )

(. . . )(. . . )

PhysRevE.64.056135, 42

PhysRevE.66.021907, 30

PhysRevE.67.036118, 14

PhysRevE.68.030901, 14

citation counts ci

w2

qm,m′

[PhysRevE.73.026120, 7

PhysRevE.74.056108, 22

]

[PhysRevE.54.6226, 15

]

[PhysRevE.71.066111, 7

PhysRevE.72.046117, 1

]

FIG. 2. (Color online) A sketch of the score computation.

Special Topics - Accelerators and Beams and Physics Ed-ucation Research due to their considerably smaller sizecompared to the remaining journals. These remainingjournals vary in size from Physical Review E with 35022to Physical Review B with 133269 publications. In addi-tion to extensive bibliographic metadata such as author-ship, publication history, and PACS numbers the datasetalso provides article type tags, thereby allowing us to fil-ter nonstandard material (i.e., those tagged comment, er-ratum, reply, editorial, essay, publisher-note, retraction,miscellaneous) and restrict our analysis to standard re-search publications.

Since motif enumeration is computationally costly, formost of the calculations the RANDESU algorithm as de-scribed by Wernicke [35] is applied which instead of enu-merating all subgraphs samples uniformly from the setof all motifs. By performing duplicate runs of the motifscore computation with the same parameters we checkedthat the scores are not sensitive to the sampling proce-dure. For smaller networks the full enumeration is per-formed by application of the ESU algorithm, presentedalso in [35].

IV. RESULTS

We computed the qm score in both the single journalportions and the whole APS dataset and found that theshuffling of the citation frequencies among the papers in-deed yields a uniform distribution of the scores. As men-tioned above, in all cases the appropriate edge weightto achieve this was w2 = 〈c〉e, i.e., the averaged cita-tion frequency of all papers on the corresponding edge.

4

In the following we show the influence of a variation ofthe maximal number of authors above which a paper isexcluded from the data, as well as the sensitivity of theqm score against different implementations of the authordisambiguation task.

A. MCS scan

In Fig. 3 the average network degree as a functionof the MCS value is shown for both the allINIT andthe STRICT disambiguation. Increasing the MCS valuetranslates into the introduction of new potential nodesand edges to the network, and since a publication witha authors can produce up to a(a − 1)/2 new edges 〈k〉grows. Unlike PRA, PRB and PRE, the journals PRC,PRD and PRL which feature larger collaborations do notshow a saturated average degree in the shown MCS inter-val. The fact that the STRICT disambiguation results inmore distinct authors than the allINIT implementationis reflected by the lower average degrees.

0 20 40 60 802

4

6

8

〈k〉

PRA

allINIT

STRICT

0 20 40 60 80

3

6

9

12

PRB

0 20 40 60 800

10

20

30

PRC

0 20 40 60 80MCS

4

8

12

16

20

〈k〉

PRD

0 20 40 60 80MCS

2

3

4

5

6PRE

0 20 40 60 80MCS

4

8

12

16

20PRL

0 20 40 60 80MCS

4

8

12

16

20all journals

FIG. 3. A scan of the MCS value translates to a variation ofthe average degree.

Especially with the STRICT disambiguation schemewe can reproduce the main finding of [25], the higher qmscore for the box motif, for small enough values of themaximum collaboration size. Although in few journalportions this result is rather robust, in most cases thebox motif signal tends to decrease with increasing MCS;Fig. 4 shows two examples: In the merge of the journalsthe box motif keeps the highest score in the given MCSrange, while in PRL the box motif signal not only getsless pronounced but is lost at MCS ≈ 15.

B. Influence of the author disambiguation

The exact implementation of the author disambigua-tion turned out to be crucial considering the qm distribu-tion. From the network perspective it can be interpretedas a local perturbation (as illustrated in the sketch inFig. 5) which strongly influences the qm scores computed

0 1 2 3 4 5 6 7motif type

14

16

18

20

22

24

26

28

30

32

qm

PRL

MCS

5

8

10

15

20

25

MCS

0 1 2 3 4 5 6 7motif type

10

11

12

13

14

15

16

17

18

19all journals

MCS

5

7

8

10

15

MCS

FIG. 4. (Color online) Two example MCS scans (STRICTdisambiguation). The box motif signal is weaker for largerMCS. The almost flat lines correspond to the averaged resultsof 30 runs with shuffled citation counts.

on the few-node subgraph scale. While the SPLIT disam-biguation shows a rather similar behavior to the STRICTcase (only PRA and PRL do not show a maximum scorefor the box motif in the former case), by switching fromthe STRICT to the allINIT disambiguation scheme theqm score distribution can change qualitatively. For ex-ample, the journal merge does not show the box motifsignal any longer and also in the case of PRB and PREshown in Fig. 5 the box motif scores are suppressed.

C. Distributions of the qm,m′ scores

The qm are averaged values over the qm,m′ scores of thesingle instances of the specific motif type m and as suchare influenced of course not only by the top-ranked butalso by the motif instances with lowest scores. In Fig. 6the distributions of the qm,m′ values for the different mo-tif types are shown for the example of the PRB networkportion. In the depicted range the distributions can begrouped according to the minimal number of distinct pa-pers required to construct the specific motif (bottom rowin Fig. 1). In the box motif which requires the largestminimum number of distinct papers very low qm,m′ scoresare suppressed while the three- and four-node cliqueswhich need only one mutual publication among the au-thors often show very small scores. This is due to thedistribution of the citation counts used in the computa-tion of the scores: The many poorly cited publicationscan translate directly to low scores for cliques while it isunlikely for the at least four edges of a box motif to exclu-sively consist of poorly cited papers. Indeed, the qm,m′distributions of the motif types 0, 4, 6 and 2, 3 which re-quire the intermediate number of 2 and 3 edges, respec-tively, is consistent with this interpretation. This rendersthe investigation of the top motif instances according tothe qm,m′ scores less promising.

A paper with a authors contributes to a(a−1)/2 edgeswhich in turn may contribute to very many motif in-stances. One corresponding effect can be seen when ex-

5

0 1 2 3 4 5 6 7motif type

8

10

12

14

16

qm

PRB

MCS

20

15

10

8

7

5

0 1 2 3 4 5 6 7motif type

4.0

4.5

5.0

5.5

6.0

6.5

7.0

7.5

8.0

8.5PRE

MCS

10

8

7

5

allINIT

STRICT

0 1 2 3 4 5 6 7motif type

8

9

10

11

qm

PRB

MCS

20

15

10

8

7

5

0 1 2 3 4 5 6 7motif type

4.2

4.4

4.6

4.8

5.0

5.2

5.4

5.6

5.8

6.0PRE

MCS

10

8

7

5

FIG. 5. (Color online) The influence of the local topologicalperturbation introduced by switching between the STRICT(top) and the allINIT (bottom) disambiguation scheme forthe PRB and the PRE single journal portions. Again, thealmost flat lines correspond to the averaged results of 30 runswith shuffled citation counts.

amining the top motif instances according to the qm,m′scores: These lists can be dominated by highly-cited pub-lications. For example, more than half of the top mo-tifs in the PRA journal portion (SPLIT disambiguation,MCS = 10) share the publication “Quantum computa-tion with quantum dots” by DiVincenzo and Loss [36].If one is not, however, interested in compiling lists oftop instances, but only in averaged values like the qm,another approach might be used. The qm for a specificmotif type m can be rephrased as a weighted sum overthe citation counts of all papers that contribute to in-stances of that motif; the weights, however, depend notonly on how often the paper contributes but also on thenumbers of other papers it shares its edges with:

qm =∑i∈P

ci1

NmEm

∑m′∈Mm

∑e∈m′

1

|P (i)(e)|, (3)

|P (i)(e)| =∞ if i /∈ P (e)

with P being the set of all papers. A simple alternative

0 5 10 15 20 25 30 35 40 45qm,m′

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

only one paper needed

two papers needed

three papers needed

PRB

motif type 1

motif type 7

motif type 0

motif type 4

motif type 6

motif type 2

motif type 3

motif type 5

FIG. 6. (Color online) Normalized frequency of the qm,m′scores for the different motif types. The PRB journal portionhas been used with MCS = 10 and the SPLIT filtering hasbeen applied.

would be to set these weights to δi∈Mm/|Pm| where Pmis the set of all papers that at least once contribute to asubgraph instance of type m, i.e., to compute the averagecitation frequency of the papers that contribute to themotif type in question but consider each publication onlyonce.

Indeed, the box motif shows the highest average ci-tation counts compared to the other motif types for allsingle journals as well as the merge of all journals in-dependently of the choice of the disambiguation schemeand for MCS values not smaller than 5. In Fig. 7 thesecitation averages are given for the example of the PRBand the PRE data and an MCS value of 10 (compare toFig. 5). In the figure a box plot of the average of a ran-dom sample of size |Mm| drawn from the citation distri-bution is given, showing that the average citation countfor the box motif lies significantly above the sample av-erage and is therefore not just due to the smaller samplesize. However, since uniform sampling assumes that ev-ery paper is a part of every motif type with the sameprobability, the sampling procedure might be refined toincorporate the specific conditions imposed by the differ-ent motif types. In a first estimation, we restricted thesamples for the box motif to those papers that have atleast two coauthors with at least degree 2 each (i.e., haveat least one other collaborator) for the case of the SPLITdisambiguation. Although this increases the sample av-erage, the measured average citation count is still several(at least 5) standard deviations above the sample aver-age.

V. DISCUSSION AND CONCLUSION

Motivated by the question to what extent small-scalepatterns of collaboration among scholars influence therecognition of the resulting papers within the scientific

6

0 1 2 3 4 5 6 78.5

9.0

9.5

10.0

10.5

11.0aver

aged

cita

tions

PRBMCS=10ALLINIT

0 1 2 3 4 5 6 78.0

8.5

9.0

9.5

10.0

10.5

11.0

11.5PRBMCS=10STRICT

0 1 2 3 4 5 6 7motif type

4.0

4.5

5.0

5.5

6.0

6.5

7.0

7.5

aver

aged

cita

tions

PREMCS=10ALLINIT

0 1 2 3 4 5 6 7motif type

4.0

4.5

5.0

5.5

6.0

6.5

7.0

7.5PREMCS=10STRICT

FIG. 7. (Color online) The average citation counts of thepapers that contribute to at least one instance of a given motiftype for the example of the PRB and PRE data. Compare toFig. 5 where also the allINIT and the STRICT disambiguationis shown.

community (in terms of the number of citations the publi-cations receive) we searched induced few-node subgraphs(known from motif analysis) in a comprehensive cita-tion dataset provided by the American Physical Society.We then assigned a score to the detected subgraphs asdescribed in Krumov et al. [25] based on the citationcounts of the corresponding publications so that an av-erage value, qm, for every subgraph type m can be com-puted. It turns out that this score is sensitive to certaindetails of the network aggregation from the data and themain result of the aforementioned study [25], i.e., thehighest score for the anticlustered box motif, cannot beseen clearly in all cases: For example, discarding differ-ent amounts of the largest collaborations which translatesto a variation of the average network degree influencesthe qm signature. A stronger effect can be seen whenthe network is locally perturbed by changing the imple-mentation of the author disambiguation task; in the allinitial method of author disambiguation the box-motifscore can be suppressed, while a stricter disambiguation

scheme yields a high score. Consequently, such a scoreshould be used with care. This result might be of inter-est in other applications of few-node induced subgraph-based scores to data which is incomplete or not unam-biguously processed into a graph representation since ithighlights a possible difficulty on the motif scale in ad-dition to the well-known problems concerning the propernull-model selection in the usual motif-analysis. Whilethe ambiguity concerning whether two author names areto be translated into one or two nodes is exclusive tocoauthorship networks, several other types of complexnetworks have related underlying disambiguation tasks:Enzyme-centric metabolic networks (where the nodes areenzymes and a link represents common substrates andproducts) can also be formulated on the level of reac-tions (see, e.g., [29]); for many organisms, the enzymeinventory of the genome is incomplete [37, 38] and theenzyme-to-reaction mapping introduces similar issues asthe author disambiguation. On the basis of the presentinvestigation we expect that the topological properties ofenzyme-centric and reaction-centric metabolic networksshow similar differences as the coauthorship networks un-der disambiguation schemes of different stringency. Forbacteria, transcriptional regulatory networks (where thenodes are genes and a directed link from gene g1 to geneg2 represents the regulation of a g2 by a transcriptionfactor produced by g1) are sometimes more adequatelydescribed on the level of groups of genes, called oper-ons, under common regulation (see, e.g., [39]). In eu-karyotic organisms, attempts to apply this concept leadto ambiguities [40], similar to those encountered in theauthor disambiguation. In computational neuroscience,the topology of cortical area networks is often analyzedon the basis of “regions of interest” (ROI) from diffusionspectrum imaging, rather than functionally defined cor-tical areas [41]. Relating these ROI to the functional cor-tical areas, and comparing results obtained with differentsegmentations of the cerebral cortex, again, is compara-ble to the author disambiguation task.

ACKNOWLEDGMENTS

We acknowledge the support of the DeutscheForschungsgemeinschaft (DFG) under Contracts No. BO1242/6-1 and No. HU 937/9-1.

[1] T. S. Kuhn, The Structure of Scientific Revolutions (Uni-versity of Chicago Press, Chicago, 1996).

[2] D. J. D. S. Price, Science 149, 510 (1965).[3] R. Albert and A.-L. Barabási, Rev. Mod. Phys. 74, 47

(2002).[4] D. J. D. S. Price, J. Amer. Soc. Inform. Sci. 27, 292

(1976).[5] A.-L. Barabási and R. Albert, Science 286, 509 (1999).

[6] S. N. Dorogovtsev, J. F. F. Mendes, and A. N. Samukhin,Phys. Rev. Lett. 85, 4633 (2000).

[7] S. N. Dorogovtsev and J. F. F. Mendes, Phys. Rev. E 63,056125 (2001).

[8] S. N. Dorogovtsev and J. F. F. Mendes, Phys. Rev. E 62,1842 (2000).

[9] G. Bianconi and A.-L. Barabási, Phys. Rev. Lett. 86,5632 (2001).

http://dx.doi.org/10.1126/science.149.3683.510http://dx.doi.org/10.1103/RevModPhys.74.47http://dx.doi.org/10.1103/RevModPhys.74.47http://dx.doi.org/10.1002/asi.4630270505http://dx.doi.org/10.1002/asi.4630270505http://dx.doi.org/10.1126/science.286.5439.509http://dx.doi.org/10.1103/PhysRevLett.85.4633http://dx.doi.org/10.1103/PhysRevE.63.056125http://dx.doi.org/10.1103/PhysRevE.63.056125http://dx.doi.org/10.1103/PhysRevE.62.1842http://dx.doi.org/10.1103/PhysRevE.62.1842http://dx.doi.org/10.1103/PhysRevLett.86.5632http://dx.doi.org/10.1103/PhysRevLett.86.5632

7

[10] P. L. Krapivsky and S. Redner, Phys. Rev. E 63, 066123(2001).

[11] P. L. Krapivsky, S. Redner, and F. Leyvraz, Phys. Rev.Lett. 85, 4629 (2000).

[12] F. Radicchi, S. Fortunato, and C. Castellano, Proc. Natl.Acad. Sci. USA 105, 17268 (2008).

[13] J. Bollen, H. Van de Sompel, A. Hagberg, and R. Chute,PLoS ONE 4, e6022 (2009).

[14] P. Chen, H. Xie, S. Maslov, and S. Redner, Journal ofInformetrics 1, 8 (2007).

[15] D. Walker, H. Xie, K.-K. Yan, and S. Maslov, Journalof Statistical Mechanics: Theory and Experiment 2007,P06010 (2007).

[16] D. F. Klosik and S. Bornholdt, arXiv:1301.7471 (2013).[17] J. E. Hirsch, Proc. Natl Acad. Sci. USA 102, 16569

(2005).[18] F. Radicchi, S. Fortunato, B. Markines, and A. Vespig-

nani, Phys. Rev. E 80, 056103 (2009).[19] J. D. West, T. C. Bergstrom, and C. T. Bergstrom,

College & Research Libraries 71, 236 (2010).[20] P. Chen and S. Redner, Journal of Informetrics 4, 278

(2010).[21] Z.-X. Wu and P. Holme, Phys. Rev. E 80, 037101 (2009).[22] M. E. J. Newman, Physical Review E 64, 016131 (2001).[23] M. E. J. Newman, Physical Review E 64, 016132 (2001).[24] T. Martin, B. Ball, B. Karrer, and M. E. J. Newman,

Physical Review E 88, 012814 (2013).[25] L. Krumov, C. Fretter, M. Müller-Hannemann, K. Weihe,

and M.-T. Hütt, The European Physical Journal B 84,535 (2011).

[26] R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan,D. Chklovskii, and U. Alon, Science 298, 824 (2002).

[27] R. Milo, S. Itzkovitz, N. Kashtan, R. Levitt, S. Shen-Orr, I. Ayzenshtat, M. Sheffer, and U. Alon, Science303, 1538 (2004).

[28] C. Fretter, M. Müller-Hannemann, and M.-T. Hütt,Physical Review E 85, 056119 (2012).

[29] M. E. Beber, C. Fretter, S. Jain, N. Sonnenschein,M. Müller-Hannemann, and M.-T. Hütt, Journal of TheRoyal Society Interface 9, 3426 (2012).

[30] C. Schulz, A. Mazloumian, A. M. Petersen, O. Penner,and D. Helbing, arXiv:1401.6157 (2014).

[31] T. Christiano Silva and D. Raphael Amancio, Chaos 23,013139 (2013).

[32] B. D. Fegley and V. I. Torvik, PLoS ONE 8, e70299(2013).

[33] S. Milojeviç, Journal of Informetrics 7, 767 (2013).[34] See https://publish.aps.org/datasets.[35] S. Wernicke, IEEE/ACM Transactions on Computa-

tional Biology and Bioinformatics 3, 347 (2006).[36] D. Loss and D. P. DiVincenzo, Phys. Rev. A 57, 120

(1998).[37] A. M. Feist, M. J. Herrg̊ard, I. Thiele, J. L. Reed,

and B. Ø. Palsson, Nature Reviews Microbiology 7, 129(2009).

[38] J. Schellenberger, J. O. Park, T. M. Conrad, and B. Ø.Palsson, BMC Bioinformatics 11, 213 (2010).

[39] C. Marr, M. Geertz, M.-T. Hütt, and G. Muskhelishvili,BMC Syst Biol 2, 18 (2008).

[40] T. Blumenthal, Briefings in Functional Genomics andProteomics 3, 199 (2004).

[41] P. Hagmann, L. Cammoun, X. Gigandet, R. Meuli, C. J.Honey, V. J. Wedeen, and O. Sporns, PLoS Biol 6, e159(2008).

http://dx.doi.org/10.1103/PhysRevE.63.066123http://dx.doi.org/10.1103/PhysRevE.63.066123http://dx.doi.org/10.1103/PhysRevLett.85.4629http://dx.doi.org/10.1103/PhysRevLett.85.4629http://dx.doi.org/10.1073/pnas.0806977105http://dx.doi.org/10.1073/pnas.0806977105http://dx.doi.org/10.1371/journal.pone.0006022http://dx.doi.org/ 10.1016/j.joi.2006.06.001http://dx.doi.org/ 10.1016/j.joi.2006.06.001http://stacks.iop.org/1742-5468/2007/i=06/a=P06010http://stacks.iop.org/1742-5468/2007/i=06/a=P06010http://stacks.iop.org/1742-5468/2007/i=06/a=P06010http://arxiv.org/abs/1301.7471http://dx.doi.org/10.1073/pnas.0507655102http://dx.doi.org/10.1073/pnas.0507655102http://dx.doi.org/10.1103/PhysRevE.80.056103http://dx.doi.org/10.5860/0710236http://dx.doi.org/10.1016/j.joi.2010.01.001http://dx.doi.org/10.1016/j.joi.2010.01.001http://dx.doi.org/10.1103/PhysRevE.80.037101http://dx.doi.org/10.1103/PhysRevE.64.016131http://dx.doi.org/10.1103/PhysRevE.64.016132http://dx.doi.org/ 10.1103/PhysRevE.88.012814http://dx.doi.org/ 10.1140/epjb/e2011-10746-5http://dx.doi.org/ 10.1140/epjb/e2011-10746-5http://dx.doi.org/ 10.1126/science.298.5594.824http://dx.doi.org/ 10.1126/science.1089167http://dx.doi.org/ 10.1126/science.1089167http://dx.doi.org/10.1103/PhysRevE.85.056119http://dx.doi.org/ 10.1098/rsif.2012.0490http://dx.doi.org/ 10.1098/rsif.2012.0490http://arxiv.org/abs/1401.6157http://dx.doi.org/http://dx.doi.org/10.1063/1.4794795http://dx.doi.org/http://dx.doi.org/10.1063/1.4794795http://dx.doi.org/10.1371/journal.pone.0070299http://dx.doi.org/10.1371/journal.pone.0070299http://dx.doi.org/ http://dx.doi.org/10.1016/j.joi.2013.06.006https://publish.aps.org/datasetshttp://dx.doi.org/10.1109/TCBB.2006.51http://dx.doi.org/10.1109/TCBB.2006.51http://dx.doi.org/10.1103/PhysRevA.57.120http://dx.doi.org/10.1103/PhysRevA.57.120http://dx.doi.org/doi:10.1038/nrmicro1949http://dx.doi.org/doi:10.1038/nrmicro1949http://dx.doi.org/10.1186/1471-2105-11-213http://dx.doi.org/10.1186/1752-0509-2-18http://dx.doi.org/10.1093/bfgp/3.3.199http://dx.doi.org/10.1093/bfgp/3.3.199http://dx.doi.org/07-PLBI-RA-4028 [pii] 10.1371/journal.pbio.0060159http://dx.doi.org/07-PLBI-RA-4028 [pii] 10.1371/journal.pbio.0060159

Motif-based success scores in coauthorship networks are highly sensitive to author name disambiguationAbstractI IntroductionII ComputationA Maximum collaboration sizeB Author name disambiguation

III Materials and MethodsIV ResultsA MCS scanB Influence of the author disambiguationC Distributions of the qm,m scores

V Discussion and Conclusion Acknowledgments References

pacs numbers: 89.75.hc, 89.65.-s, 01.30.-y · in terms of the social aspects encoded in a...

Documents