complex networks in genomics and proteomics...4 1 complex networks in genomics and proteomics figure...
TRANSCRIPT
Complex Networks in Genomicsand ProteomicsRicard V. SoléRomualdo Pastor-Satorras
SFI WORKING PAPER: 2002-06-026
SFI Working Papers contain accounts of scientific work of the author(s) and do not necessarily represent theviews of the Santa Fe Institute. We accept papers intended for publication in peer-reviewed journals or proceedings volumes, but not papers that have already appeared in print. Except for papers by our externalfaculty, papers must be based on work done at SFI, inspired by an invited visit to or collaboration at SFI, orfunded by an SFI grant.©NOTICE: This working paper is included by permission of the contributing author(s) as a means to ensuretimely distribution of the scholarly and technical work on a non-commercial basis. Copyright and all rightstherein are maintained by the author(s). It is understood that all persons copying this information willadhere to the terms and constraints invoked by each author's copyright. These works may be reposted onlywith the explicit permission of the copyright holder.www.santafe.edu
SANTA FE INSTITUTE
ComplexNetworks in Genomicsand Proteomics
RicardV. Sole andRomualdoPastor-Satorras
WILEY-VCH VerlagBerlin GmbHMay 29,2002
1 ComplexNetworks in Genomicsand Proteomics
Ricard V. SoleandRomualdoPastor-SatorrasICREA-Complex SystemsLab,UniversitatPompeuFabra-IMIM
Dr Aiguader80,08003Barcelona,SpainDepartamentdeFisicai EnginyeriaNuclear
UniversitatPolitecnicadeCatalunyaCampusNordB4, 08034Barcelona,Spain
1.1 Intr oduction
Complex multicellularorganismscontainlargegenomesin whicheachstructuralgeneis asso-ciatedwith at leastoneregulatoryelementandeachregulatoryelementintegratestheactivityof at leasttwo othergenes.Thenatureof suchregulationstartedto be understoodfrom theanalysisof smallprokaryoticregulationsubsystemsandthecurrentpictureindicatesthat thewebsthat shapecellular behavior are very complex. Actually, integration of extracellularsignalsofteninvolvesthecrosstalkbetweensignalcascadesthathasbeensuggestedto sharesomecommontraitswith neuralnetworks[1]. In arelatedcontext, detailedanalysesof subsetsof interactinggenesrevealthatcell biology is highly modular[2]. Here“modules”aremadeup of many speciesof interactingmoleculesandthe functionalrelevanceof thesesubnetsishighlightedby theobservationthatthey areconservedthroughevolution.
In many cases,proteinscomposedby multiple subunits behave asswitch-like elementsthatcanflip, for example,from anactiveto aninactivestateandback.Theswitchingbehaviorof thesecomplexes,togetherwith the underlyinginformationprocessingthat takesplaceatthe network level, allows for a computationaldescriptionof intracellularsignaling. In thiscontext, onemight considersomekey featuresof standardcomputationalsystemsthatshouldapplyhere.Oneparticularlyimportantaspectis theresilienceof thesignalingnetwork underdifferentsourcesof perturbation.Theanalysisof mutationalrobustnessin differentorganismsrevealedanextraordinarylevel of homeostasis:in many casesthetotalsuppressionof agivengenein a givenorganismleadsto a smallphenotypiceffector evento no effectatall [3, 4].
Following the analogywith engineeredsystems,the immediateexplanationfor suchro-bustnesswould comefrom the presenceof a high degreeof redundancy. Undermutation,additionalcopiesof a givengenemight compensatethe failure of the othercopy. However,theanalysisof redundancy in genomedataindicatesthatredundantgenesarerapidly lost andthatredundancy is not theleadingmechanismresponsiblefor mutationalrobustness[4].
Theoriginsof robustnessagainstmutationsis particularlywell highlightedby theanalysisof genome-widescaledataof thebuddingyeastSaccharomycescerevisiae[4]. Themaincon-clusionof this studyis thatthemajorcauseof robustnesscomesfrom theinteractionsamongunrelatedgenes. This mechanismwould be illustratedby the following example: given a
4 1 Complex Networksin GenomicsandProteomics
Figure 1.1: Thedomainof molecularinteractionamongp53andMDM2 is shown in this 3Dreconstruction[5]. MDM2 (herein cyan)bindsaspecificdomainof p53in aregion(hereshownin yellow) importantfor theinteractionof p53with componentsof thetranscriptionmachinery.
metabolicnetwork, completelyunrelatedenzymescancatalysedifferent reactionsbut con-tribute to a pathway whosegoal is to sustainan optimal flux of metabolites.Under theseconditions,mutationsin genesencodingthoseenzymeswill have little or mild effects.Addi-tionally, it is interestingto seethatmany examplesof experimentalbiotechnologymanipula-tionsinvolving thetinkeringof oneor two genesfail to reachtheexpectedgoals:very often,counterintuitiveoutcomesareobtained.
On the otherhand,mutationsinvolving somekey genescanhave very importantconse-quences.This is thecase,in particular, of thep53tumorsuppressorgene,Figure1.1,which isknown to play a critical role in genomestability andintegratesmany differentsignalsrelatedto cell-cycleor apoptosis(cell death)[6]. This andothertumor-suppressorgenespreventcellproliferation(thuskeepingcell numbersundercontrol)but canalsopromoteapoptosis.Theexampleof p53is particularlyimportantbecauseit is mutatedor thereis afunctionaldefectinthep53pathway in approximatelyhalf of humancancers.Thep53network (partially shownin Figure1.2) is quitewell-known in mammalsandinvolvesgenesthatcontrol,for example,apoptosis,the developmentof blood vessels,or cell differentiation. The coreof this net isdefinedby thefeedbackloop existing betweenp53andits negative regulator, theMDM2 on-coprotein. In invertebrates(suchasDrosophila) homologuesof p53 areknown to be activethroughoutearlydevelopment[7].
Thefactthatmany mutationshave little or no effect seemsto beconsistentwith thepres-enceof genesthateithercannotpropagatetheir failureor whosefunctioncanbereplacedbyotherpartsof thenet.Thepresenceof somegenesthatintegratemultiplesignalsandcantrig-gerwidespreadchangesundertheir failureshows that theunderlyingnetwork includessomehighly-connectedhubs. It seemsto be a compromisebetweenintegrationandhomeostasisthatshouldbeobservablewhenlooking at themapof interactionswithin thecellularnet.
Althoughacompletedescriptionof cellularnetworkswould requiretheexplicit consider-
1.1 Introduction 5
!#"$%'& (
)* $(,+ + -/.0/.0
123546"798 5:;
< ='='>?A@
Figure 1.2: Schematicarchitectureof the p53 network. The p53 nodeintegratesinformationfrom very differentpartsof the system. Only part of the cell circuitry is shown here. For adetailedpresentation,seeRef. [8].
ationof dynamics,topologicalapproaches—inwhich only thestaticarchitectureof thenetisconsidered—areoftensuccessfulin providing insight into biologicalcomplexity. This is thecase,for example,of somemodelsof ecologicalnetworks: in spitethatpopulationsfluctuatein timeandchangesin biomassor productivity takeplaceatdifferentscales,someof thefun-damentalregularitiesexhibitedby food webscanbefairly well explainedby meansof staticapproaches[9]. Besides,thecomparisonof a wide rangeof complex networks(bothnaturalandartificial) revealsthat strongregularitiesaresharedby them,in spitethat their underly-ing components,the natureof their interactions,andtheir time scalesarevery different. Inthis chapterthetopologicalpatternsdisplayedby thesenetworkswill beexplored.As will beshown, thecompromisebetweenstability andintegrationcanbemadeexplicit by looking atthelarge-scaleorganizationof cellularnetworks.
6 1 Complex Networksin GenomicsandProteomics
1.2 Cellular networks
Themolecularbasisof geneticcontrolin cells,particularlyin eukaryoticcells(i. e. cellswithnucleus)is oneof themostbasicactiveareasof molecularcell biology. Of particularinterestistheunderstandingof theregulationmechanismsinvolvedin thedevelopmentof multicellularorganisms.In mostwell-known casestudies,suchasin thefruit fly Drosophilamelanogaster,it hasbeenshown that regulationamongthe genesthat control early development(suchasfushi tarazu,Figure1.3) takesplaceat the transcriptionlevel [10]. The web of interactionscanbeverycomplex, andanexampleof asub-webof thegeneticnetassociatedto Drosophilaearly developmentis shown in Figure1.3(b). Mutationsin genesassociatedto early stagesof developmenthave typically a strongeffect andsometimes,asit occurswith the so-calledhomeoticgenes[11], they resultin importantmorphologicalchanges.
Modelsof generegulationhavea longhistoryin theoreticalbiology [12, 13]. Thediscov-ery of the mechanismsof transcriptionregulationin the Lac operonof E. coli wasfollowedby the formulationof somesimplemathematicalmodels[14]. Inspiredin early modelsofneuralnetworks,a standardformulationof generegulationcanbe introducedby meansof adynamicalsystem:BDCAEGFHIBH JLK EMON PRQSUT E C EV W JYX VZ'Z[Z'V\]V (1.1)
wherea setof\
differentgenesis defined.HereP J FC V,Z[Z'Z[V CA^_I givesthe activity stateof
eachgene.Degradationis introducedby thelasttermT E C E
. Thefunction K EM N PRQ introducethenatureandextentof the interactionsamongcomponents.An exampleof suchtypeof modelis: BDC E FHIBH JLK EMa`b ^cdGe f E d C d FgHI Sih Egjk SUT EgCAEFHI V
(1.2)
where K EM FglI is a sigmoidalfunction of the local field m E Jon d f E d C d , h E is a threshold,andtheweights f E d give thesignandstrengthof thegene-geneinteractions.Usually thesetp Jrq f E dAs is generatedfrom agivendistribution t F f I
thatis assumedto besymmetricandwith zeromean.This typeof netcandisplaya hugevarietyof dynamicalpatterns,includingoscillationsandchaos[15]. But the really interestingbehavior (seebelow) comesfrom thestatisticalpropertiesderivedfrom thepresenceof phasetransitions[16] whentheconnectivityis tuned.
Why to considerthis type of mathematicalapproximations?Someattemptsof buildinglarge-scalemodelsof cellular netsbasedon near-realisticdescriptionshave failed to repro-ducethewholespectrumof dynamicalpatternsdisplayedby evensimplecontrolledsystems.On the otherhand,somekey questionscanfind powerful answersin the genericpropertiesexhibited by simple representationsof real nets[16]. As an example,a striking featureofmulticellulardiversity is the surprisinglysmall repertoireof cell types,giventhe potentiallyastronomicdiversity of cell statesthat would be obtainedfrom the combinatoricsof genestates[17]. Assumingthat thenumberof genesin a multicellularorganismis uwv XxDy , zAdifferentpossiblestatesareavailable. Yet, if cell typesareconsideredasindicatorsof geneexpressionstates,only z xx S| xx statesareactuallyrealized.
1.2 Cellular networks 7
~
9O
9 / [
[
9 9 9R
a b
Figure 1.3: (a) Spatialpatternof activity of a givengeneinvolved in Drosophiladevelopment(the so-calledfushi tarazugene(FTZ); seealsoFigure1.2). The darker areascorrespondtohigherlevelsof activity of FTZ, indicatingwhatcellsareexpressingit. Cell-to-cell interactionsgeneratethissetof stripeswith acharacteristiclength.In (b) anexampleof a realgenenetworkis shown. It includessomepart (i.e. a directedsubgraph)of thegeneticnet involved in thede-velopmentof Drosophilafly. Thenamesof thegenesinvolvedareindicated,suchasFTZ=fushitarazu.Only theconnectionsareshown, not their sign.
In this sectionwewill summarizesomekey featuresof this typeof dynamicalsystemsbyconsideringtherichnessof their attractorswhenlow-dimensionalnetsareused.Afterwards,thegeneralscenarioinvolvingalargenumberof genes(i.e. largenetworks)will beconsidered.
1.2.1 Two-genenetworks
Theminimalnumberof genesneededin orderto obtainarich spectrumof behavioral patternsis given by two elementsin interaction,althoughsingle-genemodelswith the appropriatenonlinearitiescanalsodisplaycomplex dynamicbehavior [18]. Two-genemodelsallow tounderstandparticularlyimportantproblems,suchasthedynamicsof virus-cell interactionsinbacteria[19]. An exampleis thefollowing two-genesystemwith noself-interaction,describedby theequations:BAC BAH J f C X f C S C (1.3)BAC BAH J f C X f C S C Z (1.4)
Thefixedpointsareeasilyfound; togetherwith the trivial fixedpoint, ¢¡£ J F x V x I we getasecondnontrivial point ¢¡ J F¤C ¡ V C ¡ I givenby:C ¡ J ¥f §¦ C ¡ J ¥f §¦ V (1.5)
8 1 Complex Networksin GenomicsandProteomics
whosestability canbeeasilydetermined.Here ¦¨J© f f and ¥ Jª f f S X . Theeigenvaluesassociatedto theJacobimatrix for thissystemfor ¢¡£ J F x V x I are«R¬ J S X®°¯ f f V (1.6)
andthusthis point will be stableif A± f f ³² X . Thereis an exchangeof stability and ¢¡ becomesstablewhenthepreviousconditiondoesnot hold (i. e. a transcriticalbifurcationtakesplace)[22].
Whenself-interactionsarealsoconsidered(i. e. f E[Eµ´Jªx I severalattractorscanbepresentasa consequenceof thecompetitionbetweenpositive feedbacksandmutualinhibition. Oneparticularcaseis givenby networkssuchthat thematrix of connections
pis symmetric,of
theform:p J·¶ T ¸¸ TO¹ V(1.7)
with¸»º½¼ ¾
andTÀ¿ x . In otherwords,whenthereis self-activation by both genesplus
crossinteractionswhich canbepositive or negative. Thelater is a very commonsituationinrealmorphogeneticprocessesandis stronglyrelatedwith theprocessof competitionbetweenspeciesin ecosystems.
Thestabilityanalysisof thisgeneralproblemcanbeperformedby usingthegeneralJacobimatrix: Á J¶ ¥ AÃDÄ S X ¥ ¸ Ã5Ä ¥ ¸ Ã5Ä ¥ AÃDÄ S X ¹ (1.8)
where Ä E dÆÅ Xµ ¥ C ¡E ¸ C ¡d . For¸Ç¿ x , themutualreinforcementbetweenbothgenesleads
to the samestate(indicatedashomogeneousin Figure1.4). HereC ¡ J C ¡ J N F ¥ ¸ I SX Q Ã F ¥ ¸ I and it is stable(this point disappearsat
¸ J F X S T I ÃD ). For¸ ² S X , the
self-interactionis unableto sustaingeneactivity andit decaysto zero.Finally, aninterestingdomainis observedfor
F X S T I Ã5 ¿È¸Ç¿ x , wherethreeattractorsarepresent(thepreviousone,wherebothcoexist, andtwo exclusionpoints). In Figure1.4(b)we show anexampleoftheflow field for the 3-attractordomain. We canseethat therearethreebasinsof attractionassociatedto eachpossiblefinal state(fixedpoint).
Theseresults,in particularthepresenceof multiple attractorsfor someparameterranges,arespeciallyimportantwithin thecontext of development[20, 21]. In many casesthebehaviorof cellsthatbecomedifferentiatedis very similar to thatof a switch. By dependingon initialconditionsor externalperturbations,which might emergefrom someothergenesin thenet-works,thesystemcanreachoneor anotherbasinof attractionandthusa differentfinal state.More importantly, it hasbeenshown that somewell-defined,small setsof interactinggenes(so-calledmodules),areresponsiblefor specificspatialpatternsemerging in morphogeneticprocesses[20, 21]. As aconsequence,notonly singlegenes,but modules,canbethetargetofselection.
1.2.2 Randomnetworks
Beyond the specificwiring diagramsthat canbe consideredin small-sizedgeneticnets,thestudyof large-u netshasbeendominatedby randomly-wiredsystems[16]. Heregenesare
1.2 Cellular networks 9
−1.0 −0.5É 0.0 0.5 1.0Cross−interaction strengthÊ0.0
0.5
1.0
1.5
Fix
ed p
oint
s
0.0 0.5 1.0 1.5 2.0Morphogen 1Ë#Ì0.0
0.5
1.0
1.5
2.0
Mor
phog
en 2Í
homogeneous(single point)
3stableÎ
bistableÏ
A
B
Figure 1.4: Multistability in genenetwork models: (a) bifurcationdiagramfor the two-genenetwork modelwith a symmetricmatrix. Here Ð³Ñ»Ò and ÓaÑÕÔ . Threebasicdomainsareinvolved(seetext); (b) flow diagramof themodelfor Ö×ÑÈØÚÙDÛÜÔÞÝ , in thethree-attractordomain,indicatedas3stablein (a).
connectedat random,with anaveragenumberof ß connectionspergene.An extensive litera-tureon randomBooleannetworkshasshown thata numberof genericfeaturesarecharacter-istic of thesenetsasa consequenceof thepresenceof phasetransitionphenomenain random
10 1 Complex Networksin GenomicsandProteomics
graphs[23].In orderto illustratethis idea,let usconsidera graph ¦ ^°à á thatconsistsof
\nodesjoined
by links with someprobability â . Specifically, eachpossiblelink betweentwo given nodesoccurswith a probability â . Theaveragenumberof links (alsocalledtheaveragedegree) ofa givennodewill be ß J \ â , andit canbeeasilyshown thattheprobability F9ã_I thata nodehasa degree
ã(it is connectedto
ãothernodes)followsa Poissondistribution,
F9ã_I Jªäåæ ßèçãé Z (1.9)
This so-calledErdos-Renyi (ER) randomgraph[24] will be fairly well characterizedby anaveragedegreeê ã_ë J c ç ã F9ã#I J ß V (1.10)
where Fìã_I shows a peak.Thedistribution F9ã_I is in this sensea single-scaleddistribution[25] andanexampleis shown in Figure1.5(a).
TheERmodeldisplaysaphasetransitionatagivencritical averagedegreeß5í JYX [23, 26].At this critical point, a giant componentforms: for ß ¿ ß5í a large fractionof the nodesareconnectedin asinglecluster, whereasfor ß ² ß5í thesystemis fragmentedinto smallsubwebs.This typeof randommodelhasbeenusedin differentcontexts, includingecological,genetic,metabolic,andneuralnetworks [26]. The importanceof this phasetransitionis obvious intermsof the collective propertiesthat ariseat the critical point: communicationamongthewhole systembecomespossible,andthusinformationcanflow from the units to the wholesystemandback. Besides,thetransitionoccurssuddenlyandimpliesaninnovation. No lessimportant,it takesplaceat a low costin termsof thenumberof requiredlinks ( îÈu ).
The ER modelcanbe extendedto directedgraphsandhasbeenanalyzedby Kauffmanwithin the context of geneticregulatory networks [16]. In the languagepresentedin sec-tion 1.2, this will correspondto a network in which genesarerandomlyconnected,andreg-ulatedby anaverageof ß othergenes.This meansthat the uoïðu matrix
p J»q f E ds willhave ßu nonzeroelements,distributedat random.Theprobability thata geneis regulatedby exactly
ãothergeneswill bethengivenby thedistribution(1.9).Beyondthespecifictime-
dependentfeaturesassociatedto theparticularmodelchosen,oneimportantcharacteristicofthesesystemsis thepresenceof thepercolationthreshold:onceacritical averageconnectivityß íñJòX (the ratio of directedlinks to genes)is reached,the systembecomessuddenlycon-nected.Below thecritical thresholdthesystemis essentiallydisconnectedandthuschangesin a givengenecannotpropagateto the restof the system.The presenceof the percolationthresholdallows thesystemto exhibit acomplex dynamicalbehavior, includingdeterministicchaos,Figure1.5(b).
Oneconsequenceof thesemodels(but stronglytied to thetopologicalpropertiesof sparserandomgraphs)is thata high diversityof attractorscompatiblewith a high degreeof home-ostasisseemsto naturallyemergecloseto thepercolationthreshold.However, earlyevidenceindicatedthat thedegreedistributionsthatcharacterizereal geneticnetsarefar from Poisso-nian. Actually, aswe will seein section1.4, the topologyof real networksstronglydepartsfrom theErdos-Renyi scenario.
1.3 Threeinterconnectedlevelsof cellular nets 11
0.4
0.6
0.8
0.40.450.50.550.60.650.70.750.8
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.4
0.6
0.80.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8
óô õ ö
óô õ ö÷ø ù úûüý
÷ø ù úûüýóô õ þÿö
óô õ þÿö Figure1.5: (a)An exampleof adirectedrandomnetwork with Poissonianstructure.Hereeachnodeis a genein a modelgenenetwork andarrows indicatethe regulatoryconnections.Thistypeof graphis characterizedby anaveragedegree ; togetherwith theappropriatenonlinearcouplingamonggenes,it cangeneratedifferent typesof dynamicalpatterns,including deter-ministic chaos.An exampleof thestrangeattractorsobtainedfrom thesenetsis shown in (b) intwo differentviews.
1.3 Thr eeinterconnectedlevelsof cellular nets
Generegulationtakesplaceat differentlevelsandinvolvestheparticipationof proteins.Thewholecellularnetwork includesthreelevelsof integration: Thegenome,andtheregulationpathwaysdefinedby interactionsamonggenes; Theproteome,definedby thesetof proteinsandtheir interactions;and Themetabolicnetwork, alsounderthecontrolof proteinsthatoperateasenzymes.
Unlike the relatively unchanginggenome,the dynamicproteomechangesthroughtimein responseto intra- andextracellularenvironmentalsignals. The proteomeis particularlyimportant. Proteinsunify genomestructureon the onehandandfunctionalbiology on theother:they areboththeproductsof genesandregulatereactionsor pathways.
Complicatingthestudyof genefunctionis thefactthatmultiple proteinscanarisefrom asinglegene.In eukaryoticorganisms,genesappearfragmentedinto pieces(exons)separatedby non-codingdomains(introns).After transcription,theresultingmessengerRNA (mRNA)is generatedby theexcisionandeliminationof intronsfollowedby thejoining of exons.Thisprocessis calledsplicing. Oncethe mRNA is formed,it will be translatedinto a proteinbythetranslationmachinery.
12 1 Complex Networksin GenomicsandProteomics
A very importantfeatureis thatsplicingcanoccurin differentwayssothatdifferentsetsof exonsarejoined together. In this way, differentmRNA’s (andthusdifferentproteins)areproduced. The combinatorialpotentialof this so-calledalternativesplicing is obvious. Insomecases,thousandsof differentproteinsarepotentiallyavailablefor a givengene.
Alternativesplicingexpandsgenomecomplexity in anextraordinaryfashion.In this con-text, althoughthegenomesof complex organismsmight not stronglydiffer in termsof theirnumberof genes,theunderlyingproteomecomplexity canbevery different. As will bedis-cussedin the next sections,the actualstructureof proteinnetworks is shown to be stronglyheterogeneousandsharesseveralpreviouslyunsuspectedtraitswith many differentsystems.
1.4 Small world graphsand scale-freenets
Theanalysisof thetopologicalstructureof proteininteractionmaps(in thebuddingyeastSac-charomycescerevisiaeandothersimpleorganisms)revealeda surprisingresult: theprotein-protein interactionnet sharessomeuniversalfeatureswith the topologicalorganizationofothercomplex nets,bothnaturalandartificial, rangingform technologicalnetworks[27, 25,28, 29], neuralnetworks[30], metabolicpathways[31, 32, 33], andfoodwebs[34, 35] to thehumanlanguagegraph[36]. Thesestudiesactuallyoffer thefirst globalview of theproteomemapandshow thatit stronglydepartsfrom thesimpleErdos-Renyi scenario.
Thefirst featurecharacteristicof theproteomemapis thattheprobability F9ã_I thatagivenproteininteractswith other
ãproteinshasascale-free(SF)nature,i.e. it followsapower law, F9ã#I î ã å , with a sharpexponentialcut-off for large
ã. Thusmostproteinshave a small
numberof links with otherproteinsandafew of themarehighly connected(hubs). Thoselastonesarelikely to bevery importantto cell function[37, 38, 6].
Thesecondfeatureis thepresenceof theso-calledsmallworld (SW)property[30, 39, 40].Smallworld graphshaveanumberof surprisingfeaturesthatmake themspeciallyrelevanttounderstandhow interactionsamongindividuals,metabolites,or speciesleadto therobustnessandhomeostasisobservedin nature.TheSWpatterncanbedetectedfrom theanalysisof twobasicstatisticalpropertiesof thenetwork1: (a) theclusteringcoefficient and(b) theaveragepathlength .
The proteomegraph(seeFigure1.6) is definedby a pair ¦ á J F f á V á I , where f á Jq â E s V F W J X V,Z[Z'Z[V u I is the set of u proteins(nodes)and á J qAq â EV â d5sAs is the set of
edges/connectionsbetweenproteins. The adjacencymatrix E d indicatesthat an interactionexistsbetweenproteinsâ EV â d º f á ( E d J X ) or thattheinteractionis absent( E d Jrx ). Twoconnectedproteinsarethuscalledadjacentandthedegree
ã Eof a givenproteinis thenumber
of edgesthatconnectit with otherproteins.Let usconsidertheadjacency matrixandindicateby E J q â d E d J X s the setof nearestneighborsof a protein â E º f á . The clusteringcoefficient for this proteinis definedasthe ratio betweenthe actualnumberof connectionsbetweentheproteinsâ d º E , andthetotalpossiblenumberof connections,
ãAEGF9ãE S X I Ã z [30]
1Sincetheproteomemapis adisconnectednetwork, thesequantitiesareactuallydefinedon thegiantcomponent,definedasthelargestclusterof connectednodesin thenetwork [23].
1.4 Smallworld graphsandscale-freenets 13
pi
Γ
p
p
p
p
ξ
ξ pΩ
Figure1.6: Measuringtheclusteringfrom aproteomegraph . Hereeachnode(blackcircles)is a proteinandphysicalinteractionsareindicatedby meansof edgesconnectingnodes.
(seeFigure.1.6). Denoting! E J cdGe E d " cç$#&%(' d ç*) V (1.11)
we definetheclusteringcoefficientof theW-th proteinas F W I J z ! EãAEF9ãE S X I V (1.12)
whereãE
is thedegreeof theW-th protein.Theclusteringcoefficient is definedastheaverage
of F W I overall theproteins,
J Xu c E e F W I Z (1.13)
The averagepathlength is definedasfollows: Given two proteinsâ E V â d º f á , let E d be
thelengthof theshortestpathconnectingthesetwo proteins,following thelinks presentin thenetwork. Theaveragepathlength is definedas:
J zu F u S X I c E+ d E d Z (1.14)
For the ER graph,we have a clusteringcoefficient inverselyproportionalto the networksize, -,/.½v ß Ã u ; this is a very small quantity, that tendsto zerofor large networks. Theaveragepathlength,on the otherhand,is proportionalto the logarithmof the network size10 2 v4365(7 F u I Ã 38597 F ß I . At the other extreme,regular latticeswith only nearest-neighborconnectionsamongunits exhibit a long averagepathlength. Graphswith SW structurearecharacterizedby a high clustering,;:< ,/. , while possessinganaveragepathcomparableto anERgraphwith thesameaverageconnectivity andnumberof nodes, v ,/. .
Theexperimentalobservationson theproteomemapcanbesummarizedasfollows:
14 1 Complex Networksin GenomicsandProteomics
10= 010= 1
k>10
−3
10−2
10−1
100
Pcu
m(? k)
Figure 1.7: (b) Cumulateddegreedistribution for theyeastproteomemapfrom Ref. [37]. Thedegreedistribution hasbeenfitted to thescalingbehavior @BACEDGFHACJIGKLCEDMEN&O$MQPRSPUT , with anexponentÓWV®ÒDÛ X anda sharpcut-off C$YZV ÔÞÝ .
1. Theproteomemapis asparsegraph,with asmallaveragenumberof links perprotein.InRef.[41] anaverageconnectivity ᛌ X Z [ S z Z | wasreportedfor theproteomemapof S.cerevisiae. This observation is alsoconsistentwith thestudyof theglobalorganizationof theE. coli genenetwork from availableinformationontranscriptionalregulation[42].
2. It exhibits a SW pattern,differentfrom thepropertiesdisplayedby purely random(ER)graphs.In particular, Ref. [41] reportedthe values J z Z z ï X,x å and J]\ Z X_^ , tobecomparedwith thevaluescorrespondingto anER network with comparablesizeandaverageconnectivity, -,/. JYX ï Xx åa` and 10b2 Jdc Z x .
3. Thedegreedistribution of links follows a power-law with a well-definedcut-off. To bemoreprecise,Jeonget al. [37] reporteda functionalform for thedegreedistribution ofS.cerevisiae F9ã#IZeÀFìã £ ã_I åäAå çgfç T Z (1.15)
Parametersreportedin Ref.[37] areã £ e X , T vLz Z ^ andacut-off
ã í vªz x . In Figure1.7we checkthis functionaldependenceon thecumulateddegreedistributionof theproteinmap2 usedin Ref. [37]. A fit to the form (1.15)yields the values
ã £ e X Z X , ã í e X*h ,and
T J z Z i ax Z z , compatiblewith theresultsfoundin [37, 41]. Thisparticularform ofthedegreedistributioncouldhaveadaptivesignificanceasasourceof robustnessagainstmutations.
Thehighly heterogeneouscharacterof thesemapshasimportantconsequenceswithin thecontext of molecularcell biology [32, 6]. It indicatesthat theevolution of proteome/genome
2Dataavailableat thewebsitehttp://www.nd.edu/ j networks/database/index.html.
1.5 Scale-freeproteomes:geneduplicationmodels 15
complexity hasbeendriventowardsa well-definedtopologicalpatternthatprovidesthesub-stratefor anextraordinaryhomeostaticstabilityagainstrandommutationalevents.
1.5 Scale-freeproteomes:geneduplication models
Severalmodelshave beenproposedin orderto explain theregularitiesdisplayedby thepro-teomemap[44, 45, 46]. Thesemodelsof proteomeevolutionarebasedon ageneduplicationplus rewiring processthat includesthe basicingredientsof proteomegrowth andintendstoreproducetheprevioussetof observations.Thefirst componentof themodelsallows thesys-temto grow by meansof thecopy processof previousunits(togetherwith their wiring). Thesecondintroducesnovelty by meansof changesin thewiring pattern,usuallyconstrainedtothe newly createdgenes.This constraintis requiredif we assumethat conservationof gene(protein)interactionsis dueto functionalrestrictionsandthatfurtherchangesin theregulationmaparelimited. Suchconstraintwould bestronglyrelaxedwheninvolving a newly created(and redundant)unit. The modelsproposedso far are intendedto capturethe topologicalpropertiesof theproteomemap.No explicit functionalityis includedin thedescriptionof theproteinsandthis is certainlya drawback.But by ignoringthespecificfeaturesof theprotein-proteininteractionsandtheunderlyingregulationdynamics,onecanexplore thequestionofhow muchthenetwork topologyis dueto theduplicationanddiversificationprocesses.
In this chapterwe will focusin particularin themodeldescribedin Refs.[45, 46]. Thismodelconsiderssingle-geneduplications,whichoccurin mostcasesdueto unequalcrossover[47], plusre-wiring. Multiple duplicationsshouldbeconsideredin futureextensionsof thesemodels: molecularevidenceshows that even whole-genomeduplicationshave actuallyoc-curredin S.cerevisiae[48] (seealsoRef. [49]). Re-wiringhasalsobeenusedin dynamicalmodelsof theevolutionof robustnessin complex organisms[50].
Theproteomegraphatany givenstepH(i.e. after
Hduplications)will beindicatedas ¦ á#FHI .
Therulesof themodel,summarizedin Figure1.8,areimplementedasfollows.Eachtimestep:(a) onenodein thegraphis randomlychosenandduplicated;(b) thelinks emerging from thenew generatednodeare removed with probability ; (c) finally, new links (not previouslypresent)canbecreatedbetweenthenew nodeandall therestof thenodeswith probability ¥ .Step(a) implementsgeneduplication,in which both the original andthe replicatedproteinsretain the samestructuralpropertiesand, consequently, the sameset of interactions. Therewiring steps(b) and (c) implementthe possiblemutationsof the replicatedgene,whichtranslateinto thedeletionandadditionof interactions,with differentprobabilities.
1.5.1 Mean-field rate equation for the averageconnectivity
Sincethemodeljustpresentedhastwo freeparameters,namelythedeletionprobability andtheadditionprobability ¥ , onepreliminarytaskis to constraintheir possiblevaluesby usingtheavailableempiricaldata.Oneaveragepropertythatcanbedeterminedis theevolution oftheaveragenumberof interactionsperprotein/genethroughtime,whichcanbecomparedwiththeevidencefrom realproteomes[37, 41], aswell asrecentanalysisof large-scaleperturbationexperiments[51].
16 1 Complex Networksin GenomicsandProteomics
klnmkUopm
δ
αkqrmFigure 1.8: Growing network by duplicationof nodes.First (a) duplicationoccursafter ran-domly selectinga node(arrow). Thelinks from thenewly creatednode(white) now canexpe-riencedeletion(b) andnew links canbecreated(c); theseeventsoccurwith probabilitiesÐ ands , respectively.
Let us indicateby ß and t the averageconnectivity of the systemand its numberlinks, respectively, whenit is composedby u proteins.Thesemagnitudessatisfytherelationt J ß u à z . It is easyto check(seealsoRef. [44]) that,at a mean-fieldlevel, thatnumberof links t fulfill thefollowing rateequationt vu J t ß ¥ F u S ß I S ß V (1.16)
wherethe last two termscorrespondto the additionof links to a fraction ¥ to the u S ß unitsnot connectedto theduplicatednode,plusthedeletionof any of thenew ß links, withprobability . UsingthecontinuousapproximationB ß B u e ß wu S ß V (1.17)
Eq.(1.16)canbewrittenB ß B u J Xu N ß z ¥ F u S ß I S z ß Q V (1.18)
whosesolutionisß J ¥¥ ® u ¶_ß S ¥¥ ® ¹ u % V (1.19)
where JÀX S z F ¥ U I and ß is theinitial connectivity at u JYX . For any constantvalueof¥ and thismodelleadsto anincreasingconnectivity throughtime. In orderto haveafinite ßin thelimit of large u , onepossiblesolutionis to imposeanadditionrate ¥ thatis a functionof thesizeof thenetwork, with theform
¥ F u I J ¸u V(1.20)
1.5 Scale-freeproteomes:geneduplicationmodels 17
where¸
is a constant.That is, the rateof additionof new links (the establishmentof newviable interactionsbetweenproteins)is inverselyproportionalto the network size,andthusmuchsmallerthanthe deletionrate , in agreementwith the ratesobserved in [41]. In thiscase,for large u , thedifferentialrateequation(1.18)equationtakestheformB ß B u J Xu F X S z I ß z ¸u Z
(1.21)
Thesolutionof this equationis
ß J z ¸z S X Õ¶ ß S z ¸z S X ¹ u å Ux Z (1.22)
For ¿ X5Ã z a finite connectivity is reachedin thelimit of a largenetwork,
ß Å 36y8zv| ß J z ¸z S X Z (1.23)
In order to reducethe numberof independentparametersof the model,Ref. [45] usedtheavailableexperimentaldatato estimatetheaveragedegree ß andtheratio of additionanddeletionratesin the yeastproteome,¥ ÃD [41] to find a relation between and , which,togetherwith Eq. (1.23), yields a numericalestimateof
¸and . Sinceit is clear that this
estimateis stronglydependentontheassumedvalue ¥ ÃD , Ref. [46] followedamorepragmat-ical approach,consideringa -dependentmodelandfixing theactualvalueof by comparingnumericalsimulationswith experimentaldata.
1.5.2 Rateequation for the nodedistrib ution ~-Therateequationapproachto evolving networks[52] canbefruitfully appliedto theproteomemodelunderconsideration[46]. This approachfocuseson thetime evolution of thenumber\ ç FgHI of nodesin thenetwork with exactly
ãlinks at time
H. Definingour network by means
of thesetof numbers\ ç FHI , wehave thatthetotal numberof nodesu is givenbyu J c ç \ ç V (1.24)
while thetotalnumberof links is givenbyt J Xz c ç ã \ ç Z (1.25)
Time is dividedinto periods.In eachperiod,H H ÇX , onenodeis duplicatedat random,
so that u u ÀX . If, after eachduplication,thereis a probability to deleteeachlinkfrom thejust-duplicatednode,theprobabilityof increasingthenumberof nodesat degree
ã,
by directduplicationwithout link deletion,is givenbyZ à 1_ N \ ç \ ç ÈX Q J \ çu F X S ã I Z (1.26)
18 1 Complex Networksin GenomicsandProteomics
Ontheotherhand,anodeof degreeã
canbecreatedfrom theduplicationof anodeof degreeã ÈX in whicha link is deleted,contributingwith aprobabilityZ$ à 1_ N \ ç \ ç ÈX Q J \ çu u Fìã ÈX I Z (1.27)
Theprobabilityof degreechange,from duplicationof a nodeconnectedto a degree-ã
node,is givenby:ZS à 1_ N F \ ç å V\ ç I F \ ç å S X V\ ç ÈX I Q J \ ç å u F9ã S X IF X S I Z (1.28)
Finally, in thesameperiod,weproceedto add u S ãElinks with probability ¥ J ¸ Ã u , whereãE
is theconnectivity of thejustduplicatednode.In thelimit u: ãE, wecansimplyconsider
the additionof u ¥ J ¸new links to the graph. Whenthis last stepis performedwith the
correlatedprescriptiongivenfor themodel(i.e. addinglinks from theduplicatednodeto therestof thenodesin thegraph),it leadsto a nonlocalrateequationfor the functions
\ ç [46].For thesake of simplicity, we will considernow thesimplercaseof a uncorrelatedadditionof links (new links createdbetweenany two nodesin thegraph).However, it canbeprovedthatbothprescriptionsleadqualitatively to similar results[46].
Thecaseof uncorrelatedadditionof links canberepresentedasthedistribution of z ¥ unew link endsamongthe u nodesin thenetwork. Thiseventcontributeswith aprobabilityZ _ N F \ ç V\ çu IZ F \ ç S X V\ çu ©X I Q J \ çu z ¥ u J \ çu z ¸ V (1.29)
Theprobabilities(1.26),(1.27),(1.28),and(1.29)definetherateequationfor theconnectivitydistributionB \ ç FHIBH J \ çu u N Fìã ÈX I \ çu S ã \ ç Q X S u N F9ã S X I \ ç å S ã \ ç Q z ¸u N \ ç å S \ ç Q Z (1.30)
Sinceeachtimestepanew nodeis added,Eq. (1.30)satisfiestheconditionB uBHrJ c çB \ ç FgHIBH J½X V (1.31)
thatyieldstheexpectedresult u FHI J u £ H , where u £ is theinitial numberof nodesin thenetwork. In ordertosolveEq.(1.30),weimposethehomogeneousconditiononthepopulationnumber\ ç FHI J u FgHI â ç eÈH â ç V (1.32)
whereâ ç is theprobabilityof finding a nodeof connectivityã, which we assumeto beinde-
pendentof time. With this approximation,therateequationreadsF9ã ©X I â çu S F9ã z ¸ I â ç N Fìã S X IÞF X S I z ¸ Q â ç å Jªx Z (1.33)
1.5 Scale-freeproteomes:geneduplicationmodels 19
Eq. (1.33)canbe solvedusingthe generatingfunctionalmethod[53]. Let us definethe thegeneratingfunctional FlI J c ç l ç â ç Z (1.34)
Introducingthis definitioninto Eq. (1.33),weobtainanequationfor FlI
, whosesolutionis FlI J ¶ S l F X S Iz S X ¹ å f1 å x Z(1.35)
Knowing FlI
we cancomputeimmediatelytheaverageconnectivityß J c ç ã â ç Å l BE FlIBl e J z ¸z S X V (1.36)
in agreementwith the mean-fieldpredictionof Eq. (1.23). On the otherhand,performingaTaylor expansionof
FglIaround
l J©x wecanobtainâ ç asâ ç J Xãé B FlIBAl e £ Z (1.37)
Applying this formula to the function(1.35),andusingStirling’s approximationfor largeã,
we canobtaintheasymptoticbehavior of â ç , givenby:â ç î F9ã £ ã_I åa ä å çgfç T V (1.38)
with T J S ã £ JYX S z ¸X S V ã í J X36 xx å 9¡ Z (1.39)
As we canobserve from thepreviousresult,we recover thesamefunctionalform experi-mentallyobservedin [37]. However, it is importantto noticethatfor all theparameterrangeinwhich theexponentialcut-off
ã í is well-defined,we obtaina valueof thedegreeexponent,asgivenby Eq. (1.39),that is
T£¢ X . Thesameresultholdswhenconsideringtherateequationfor thecorrelatedmodel,in which thelink additionis fully correlatedwith thenew duplicatednode[46]. This resultis unsatisfactory, becauseit doesnot correspondwith theresultsfromnumericalsimulationsof the model [46]. This discrepancy is explainedby the fact that theu ¥¤
solutionpresentedhasonly meaningfor ¿ X5à z (seeEq. (1.36)). Yet the masterequationwas definedon the basisof an independent-event approximationthat only makessensefor §¦ X . The masterequationitself shouldbecomevalid for x , but thentheconvergenceresultsassumedat u ¨¤
seemquestionable,asindicatedby thefact thatwegetananalytic,but negative, ß .
Thereis, however, somethingqualitative still to be learnedfrom theseequations,in theneighborhoodof î X Ã z , small
¸. This is a neighborhoodwheretheconvergenceresultsat
large u still givesensibleanswers,evenif they arenot quantitatively correctdueto marginalapproximationsin the underlyingmasterequation. Yet at the sametime, sincethis is thesmallestvalueof wherewecangetanswers,it is theonewherethemasterequationwehaveconstructedis likely to bethebestapproximationto themuchmorecomplicatedtrueequation(onewith frequentcoupledevents).
20 1 Complex Networksin GenomicsandProteomics
0.50© 0.52© 0.54© 0.56© 0.58©δª1.0
1.5
2.0
2.5
3.0γ«
a)¬10 0
10 110 2
k®10
−5
10−4
10−3
10−2
10−1
100
P(k
)
b)¯
Figure1.9: a)DegreeexponentÓ asafunctionof thedeletionrate Ð from computersimulationsof theproteomemodelwith averageconnectivity ÑLÒDÛ Ý . Network size ° ÑrÒ²± ÔÙ_³ . Thedegreedistributionsis averagedover ÔÙÙ,Ù differentnetwork realizations.b) Degreedistributionfor thesamemodel, Ð ÑaÙAÛ Ý_XÒ , averagedover ÔÙÙÙ,Ù differentnetwork realizations.Thedistri-bution canbefitted to the form @BAC9D-F´AC I K§CED MEN O M/PRSPUT , with anexponentӳѩÒDÛ Ýwµ ÙDÛÜÔanda sharpcut-off C Y V£¶$· .
1.5.3 Numerical simulations
Theproteomemodeldefinedin section1.5 dependseffectively on two independentparame-ters: the averageconnectivity of the network ß andthe deletionrateof newly createdlinks , beingtheadditionrate
¸computedfrom Eq. (1.23). Theaverageconnectivity canbeesti-
matedfrom theexperimentalresultsfrom realproteomemaps.Thedataanalyzedin Ref.[37]yields a value ß e z Z ^x . As a safeestimate,onecan imposethe value ß J z Z h [46], andconsidervalues ¿ X Ã z , in accordancewith Eq. (1.23). In spiteof the drawbacksof theanalyticalstudyof the model,section1.5.2,oneshouldexpectthe modelto yield, for eachvalueof , the functionalform Eq. (1.38)for thedegreedistribution,with a degreeexponentT
which is a functionof (for a fixedaverageconnectivity ß J z Z h ). Fromnumericalsim-ulationsof the modelonecanthencomputethe function
T F I andselectthe valueof thatyieldsa degreeexponentin agreementwith theexperimentalobservations.Fig 1.9(a)showsvaluesof
Testimatedfrom thefunctionalform (1.38)for thedegreedistributionobtainedfrom
computersimulationsof model,averagingover X,xAxx network of size u J zñï X,x ` nodes,ofthesameorderof thosefoundin themapsanalyzedin Ref.[37]. Apart form aconcaveregionfor very closeto X Ã z , T is an increasingfunctionof . The valueof yielding thedegreeexponentclosestto theexperimentallyobservedoneis thenÆJªx Z h i z Z (1.40)
In Figure1.10(a)we show thetopologyof thegiantcomponentof a typical realizationofthenetwork modelof size u J z×ï X,x ` . This Figureclearlyresemblesthegiantcomponentof a realyeastnetworks,aswecanseecomparingwith Figure1.10(b)3; wecanappreciatethe
3Figurekindly providedby W. Basalaj(seehttp://www.cl.cam.uk/ j wb204/GD99/#Mewes).
1.5 Scale-freeproteomes:geneduplicationmodels 21
a)
0
3
1
2
4
810
37
70
75
8082
94
128
138
164
186
200
212
258
306
317
341
400
428
461
489
562
19
88
163
192362
146
439
5
6
13
23
43
73
139
218
246
254
278
337
411
419433
482
514
549
616
634
16
40
42
98
194335
445
456
519
129
154
172
191
249
408
427
48850
441
322
378
97
385
507
618
345
267
157
227
259
268
415 555
595
603
610
81
326
69
381
79
31
584
135
168
211
344
399
572
518
499
491
339
581
607
624
57
224
318
304
413
448
377
405
537
27
580
612
29
44
87
109
175
449
530
565
583
627
33
202
250 293
404
457
39
41
49
89
162
357
512
520
65
291
340
458
516
414
594
373
601
325
474
559
216
239
316
336
475
547
68
394
161
169
613
546
638
171
14 17
78
93
156
237
386
460
522
541
622 632
74
120
166
476
593
608
110
422
108
56
111
240
91
131
170
187
220
264
284
592
600
334
3729
375
379
589
106
115
182
234
275
329
571
390
615
20
197
7185
213
352
470
147
331
214
596
294
479
236
241
12
606
483
11
21
215
247
468
34
48
86
158
173
207
269
279
371
417
436
465
469
282
473
540
637
513
28
32
53
54
67
141
438
550
85
266
47
346
370
184
217
343
504
15
22
92
253
263
30
179
307
410
590
144
203
321
355
374
477
26
66
123
209
384
437
556
198
140
393
429
453
493
531
450
155
165
167
195
440
579
578
543
18
145
36
46
500
528
274
302
623
629
71
310
466
324
148
312
347
424
366
464
330
369
391
554
153201
262
271
295
535
24
90
114
298
59
101
505
102
112
136
221
561
116
406
523 566
183
447
25
76
189
382
515
61
498
597
320
490
60
72 280
327
333
380103
105
133
151
231
288
296
536
548
585
104
51
228
257
277
558
633
188
454
62
311
35
176
219
308
480
568
494
38
557
487
233
260
265
412
511
544
630
602
204
338
532
45
64
290
455
222
368
376
426
444
620
332
117
150
180
242
303
495
564
83
285
365
174
63
126
130
272
52
134
243
292
420
432
506
551
229
107
210
611
235
525
55
588
631
95
122
251
538
573
388
524
132
492
58
283
305
281
527
570
287
462
398
244
353
423
496
625
350
435
232
323
617
425
127
190
364
177
599
348
472
113
421
569
270
409
149
539
567
315
77
196
387
199
223
226
314
361
401
485
526
301
256
205
252
395
363
481
463
84
443
328
418
349
501
96
534
99
640
575
100
517
553248
367
402
299
503
193
586
124
206
621
118
392
407
434
471
119
137
225
451
452
619
508
121
497
545
576
609
178
181
300
509
125
605
356
533
574
358
552
521
238
442
142
342
478
582
626
143
152
403
261
286
159
160
208
430
389
245
446
484
351
563
273
383
309
598
510
354
467
591
230
560
416
431
587
459
636
255
289
297
542
397
396639
319
628
313
276
604
360
359
502
614
577
529
486
635
b)
Figure1.10: a)Topologyof thegiantcomponentof themapobtainedwith theproteomemodelwith parametersCE¹]Ñ ÒDÛ Ý and Ð Ñ ÙDÛ Ý_X Ý . Network size ° ѨÒB± ÔÙ ³ . b) Topologyof a realyeastproteomemapobtainedfrom theMIPSdatabase[43].
presenceof a few highly connectedhubsplusmany nodeswith a relatively smallnumberofconnections,in closeresemblanceof therealproteomemap.On theotherhand,Figure1.9(b)shows theconnectivity F9ã_I obtainedfor networksof size u J z×ï Xx ` , averagedof X,xxAxxrealizations,for ¢JLx Z h i z . In this Figurewe observethat theresultingconnectivity distribu-tion canbefitted to a power-law with anexponentialcut-off, of theform givenby Eq. (1.38),with parameters
T J z Z h x Z X andã í e | \ , in fair agreementwith themeasurementsreported
by [41] and[37].
Finally, Table1.1 reportsthe valuesof ß , T , , and obtainedfor the proteomemodel,comparedwith the valuesreportedfor the yeastS.cerevisiaeby Ref. [41], thosecalculatedfor the map usedin Ref. [37], and thosecorrespondingto an ER randomgraphwith sizeandaverageconnectivity comparablewith both the modelandthe real data. All the magni-tudesdisplayedby themodelcomparequitewell with thevaluesmeasuredfor theyeast,andrepresenta furtherconfirmationof the SW conjecturefor the proteinnetworksadvancedby[41].
Table 1.1: Comparisonbetweenthe observed regularitiesin the yeastproteomereportedbyWagner[41], thosecalculatedfrom the proteomemap usedby Jeonget al. [37], the modelpredictionswith ° ѽÒÙÙ,Ù , Ð ÑrÙDÛ Ý1X Ò and ñѽÒDÛ ÝÙ , anda ER network with thesamesizeandconnectivity asthemodel.
Wagner’sdata Jeong’sdata Proteomemodel ER modelß X Z c | z Z ^èx z Z ^ ®x Z i z Z hDx §x Z x9hT z Z h z Z ^ z Z h®x Z X — z Z z ï X,x å \ Z X ï Xx å X Z x ï X,x å X ï X,x åp` \ Z X_^ i#Z c#X h Z h®x Z \ c Z x §x Z z
22 1 Complex Networksin GenomicsandProteomics
1.6 Discussion
Simplemodelsof complex biological interactionshavebeenusedthroughthelastdecadesaspowerful metaphorsof naturalcomplexity. Networkspervadebiologyandthereis little doubtthat the untanglingbiological complexity demandsa considerabledegreeof simplification.This view workswell whengenericmechanismsareat work: percolationcloseto criticialityin randomgraphswouldbeaperfectexamplein thiscontext. Sinceinformationtransfer(net-work communication)is akey propertyto all biosystems,reachingathresholdin connectivityallows informationto propagatein a veryeffectivewayundera low wiring cost.
Similar principlesmight beoperatingin technologygraphs[28, 54] andthestriking simi-laritiesbetweenman-madenetworks(suchaselectroniccircuitsor softwaregraphs)andnat-ural webssuggeststhatanorganizingprincipleinvolving optimalcommunicationmightbeatwork in both typesof systems.This seemsa reasonablepossibility, sincethe costof wiringis animportantconstraintin bothcases.For technologygraphs,however, randomfailuretipi-cally leadsto collapseandthusthereis norobustnessassociatedto thescale-freearchitecture.Biologicalsystemsmighthavetakenadvantageof theSFpatternsthatarisefrom optimizationof pathlengthunderlow cost[55] andmakeuseof thesourceof robustnessfor freethatmightbegenerated.
As it occurswith many otheraspectsof biologicalcomplexity, historicconstraintsplayanimportantrole in shapingnetwork topology. Not surprisingly, someof theoldestactorsin themetabolicsceneseemto be highly connected,thussuggestinga leadingrole of preferentialattachment[26] at leastat early stagesof the evolution of metabolism. But the proteomemapis avery largewebincorporatinga largeamountof plasticitythatmighthavebeentunedthroughevolution in order to reachoptimaly wired pathways. Futureresearchwill providea new perspective on how biological netsget organizedthroughevolution andwhat arethecontributionsof emergence,selection,andtinkeringto network biocomplexity.
Acknowledgements
We thankthe membersof the Complex SystemsLab for usefuldiscussions.This work hasbeenpartially supportedby theEuropeanNetwork ContractNo. ERBFMRXCT980183,theEuropeanCommission- FetOpenprojectCOSINIST-2001-33555,agrantPB97-0693andbytheSantaFeInstitute(R. V. S.). R.P.-S.acknowledgesfinancialsupportfrom theMinisteriodeCienciay Tecnologıa(Spain).
References
[1] D. Bray. Proteinmoleculesascomputationalelementsin living cells. Nature 376, 307–312(1995).
[2] L. H. Hartwell,J.J.Hopfield,S.Leibler, andA. W. Murray. Frommolecularto modularcell biology. Nature402, C47–C52(1999).
[3] P. Ross-Macdonald,P. S. R. Coelho,T. Roemer, S. Agarwal, A. Kumar, R. Jansen,K. H. Cheung,A. Sheehan,D. Symoniatis,L. Umansky, M. Heldtman,F. K. Nelson,
References 23
H. Iwasaki,K. Hager, M. Gerstein,P. Miller, G. S.Roeder, andM. Snyder. Large-scaleanalysisof the yeastgenomeby transposontaggingandgenedisruption. Nature 402,413–418(1999).
[4] A. Wagner. Robustnessagainstmutationsin geneticnetworksof yeast. Nature Genet.24, 355–361(2000).
[5] P. H. Kussie,S. Gorina, V. Marechal,B. Elenbaas,J. Moreau,A. J. Levine and N.P. Pavletich. Structureof the MDM2 OncoproteinBoundto the p53 Tumor Suppres-sorTransactivationDomain.Science274, 948–953(1996).
[6] B. Vogelstein,D. Lane,andA. J.Levine. Surfingthep53network. Nature408, 307–310(2000).
[7] S. Jin. Identification and characterizationof a p53 homologue in Drosophilamelanogaster. Proc.Natl. Acad.Sci.USA97, 7301–7306(2000).
[8] K. W. Kohn Molecularinteractionmapof themammaliancell cylce controlandDNArepairsystems.Mol. Biol. Cell 10, 2703-2734(1999).
[9] R. J.Williams ansN. D. Martinez. Simplerulesyield complex food webs.Nature 404,180–183(2000).
[10] H. Lodish,A. Berk, S. L. Zipursky, andP. Matsudaira,MolecularCell Biology, (W. H.Freeman,New York, 2000).4th edition.
[11] W. J. Gehring,MasterControl Genesin DevelopmentandEvolution, (YaleUniversityPress,New Haven,1998).
[12] J,Hasty, D. McMillen, F. IsaacsandJ.J.Collins Computationalstudiesof generegula-tory networks: in numeromolecularbiology. NatureReviewsGenet.2, 268–279(2001).
[13] P. Smolen,D. A. BaxterandJ. H. Byrne Mathematicalmodelingof genenetworks.Neuron 26, 567–580(2000).
[14] B. Goodwin,Temporal organizationin cells, (AcademicPress,New York, 1963).[15] J. E. Lewis andL. Glass Steadystates,limit cyclesandchaosin modelsof complex
biologicalnetworks. Int. J. Bif. Chaos1, 477–483(1991).[16] S.A. Kauffman,Origins of Order, (OxfordUniversityPress,New York, 1993).[17] S. B. Carroll. Chanceandnecessity:the evolution of morphologicalcomplexity and
diversity. Nature 409, 1102–1109(2000).[18] M. LaurentandN. Kellershohn Multistability: a major meansof differentiationand
evolution in biologicalsystems.TrendsBiochem.Sci.24, 418–422(1999).[19] M. Ptashne,A GeneticSwitch (Blackwell,Cambridge,1992).[20] I. Salazar, J.Garcia-FernandezandR. V. Sole Genenetworkscapableof patternforma-
tion: from inductionto reaction-diffusion.J. Theor. Biol. 205, 587–603(2000).[21] R. V. Sole, I. SalazarandJ.Garcia-FernandezCommonPatternFormation,Modularity
andPhaseTransitionsin a GeneNetwork Model of Morphogenesis.PhysicaA 305,640-647(2002).
[22] J. M. T. ThompsonandH. B. Stewart, Nonlineardynamicsandchaos, (JohnWiley &Sons,New York, 1986).
[23] B. Bollobas,RandomGraphs, (AcademicPress,London,1985).[24] P. ErdosandP. Renyi. Ontheevolutionof randomgraphs.Publ.Math.Inst.Hung. Acad.
Sci.5, 17–60(1960).
24 1 Complex Networksin GenomicsandProteomics
[25] L. A. N. Amaral,A. Scala,M. Barthelemy, andH. E. Stanley. Classesof small-worldnetworks. Proc.Natl. Acad.Sci.USA97, 11149–11152(2000).
[26] R. Albert andA.-L. Barabasi. Statisticalmechanicsof complex networks. Rev. Mod.Phys.74, 47–97(2002).
[27] R. A. Albert, H. Jeong,and A.-L. Barabasi. Error and attack toleranceof complexnetworks. Nature406, 378–382(2000).
[28] R. Ferreri Cancho,C. Janssen,andR. V. Sole. The topologyof technologygraphs:smallworld patternin electroniccircuits. Phys.Rev. E 63, 32767(2001).
[29] R. Pastor-Satorras,A. Vazquez,andA. Vespignani.Dynamicalandcorrelationproper-tiesof theinternet.Phys.Rev. Lett.87, 258701(2001).
[30] D. J.WattsandS.H. Strogatz.Collective dynamicsof ‘small-world’ networks. Nature393, 440–442(1998).
[31] D. Fell andA. Wagner. Thesmallworld of metabolism.NatureBiotech.18, 1121(2000).[32] H. Jeong,B. Tombor, R. Albert, Z. N.Oltvai, andA.-L. Barabasi.Thelarge-scaleorga-
nizationof metabolicnetworks. Nature407, 651–654(2001).[33] J. Podani,Z. Oltvai, H. Jeong,B. Tombor, A.-L. Barabasi,andE. Szathmary. Compa-
rablesystem-level organizationof ArchaeaandEukaryotes.Nature Genetics29, 54–56(2001).
[34] J. M. Montoya andR. V. Sole. Smallworld pattersin fodd webs. J. Theor. Biol. 214,405–412(2002).
[35] R. J. Williams, N. D. Martinez,E. L. Berlow, J. A. Dunne,andA.-L. Barabasi. Twodegreesof separationin complex food webs,(2001). SantaFe working paper01-07-036.
[36] R. Ferreri Cancho,C. Janssen,andR. V. Sole. The small world of humanlanguage.Procs.Roy. Soc.LondonB 268, 2261–2266(2001).
[37] H. Jeong,S.Mason,A. L. Barabasi,andZ. N. Oltvai. Lethalityandcentralityin proteinnetworks. Nature411, 41 (2001).
[38] S. Maslov andK. Sneppen Specificityandstability in topologyof proteinnetworks.Science296, 910-913(2002).
[39] D. J.Watts,SmallWorlds, (PrincetonUniversityPress,Princeton,1999).[40] M. E. J.NewmanModelsof theSmallWorld. J. Stat.Phys.101, 819–841(2000).[41] A. Wagner. The yeastprotein interactionnetwork evolves rapidly and containsfew
redundantduplicategenes.Mol. Biol. Evol.18, 1283–1292(2001).[42] D. Thieffry, A. M. Huerta,E. Perez-Rueda,andJ. Collado-Vives. From specificgene
regulation to genomicnetworks: a global analysisof transcriptionalregulation in es-cherichia coli. BioEssays20, 433–440(1998).
[43] H. W. Mewes,K. Heumann,A. Kaps,K. Mayer, F. Pfeiffer, S.Stocker, andD. Frishman.Mips: a databasefor genomesand proteinsequences.Nucleic Acids Res., 27,44–48(1999).
[44] A. Vazquez,A. Flammini,A. Maritan,andA. Vespignani.Modellingof proteininterac-tion networks,(2001).cond-mat/0108043.
[45] R. V. Sole,R. Pastor-Satorras,E. Smith,andT. Kepler. A modelof large-scaleproteomeevolution. Adv. Complex. Syst.5, 43–54(2002).
References 25
[46] R. Pastor-Satorras,E. Smith, and R. V. Sole. Evolving protein interactionnetworksthroughgeneduplication,(2002).SantaFeworkingpaper02-02-008.
[47] S.Ohno,Evolutionbygeneduplication, (Springer, Berlin, 1970).[48] K. H. Wolfe andD. C. Shields. Molecularevidencefor an ancientduplicationof the
entireyeastgenome.Nature387, 708–713(1997).[49] A. Wagner. Evolution of genenetworks by geneduplications:A mathematicalmodel
andits implicationsongenomeorganization.Proc.Natl. Acad.Sci.USA91, 4387–4391(1994).
[50] S.BornholdtandK. Sneppen.Robustnessasanevolutionaryprinciple. Proc.Roy. Soc.Lond.B 267, 2281–2286(2000).
[51] A. Wagner. Estimatingcoarsegenenetwork structurefrom large-scalegeneperturbationdata,(2001).SantaFeworkingpaper01-09-051.
[52] P. L. Krapivsky, S. Redner, andF. Leyvraz. Connectivity of growing randomnetworks.Phys.Rev. Lett.85, 4629(2000).
[53] C. W. Gardiner, Handbookof stochasticmethods, (Springer, Berlin, 1985).2ndedition.[54] S.Valverde,R. FerrerCanchoandR. V. Sole. Scale-freenetworksfrom optimaldesign.
SantaFeworking paper02-04-019.[55] R. FerrerCanchoandR. V. Sole. Optmizationin Complex Networks. SantaFeworking
paper01-11-068.