complex networks in genomics and proteomics...4 1 complex networks in genomics and proteomics figure...

26
Complex Networks in Genomics and Proteomics Ricard V. Solé Romualdo Pastor-Satorras SFI WORKING PAPER: 2002-06-026 SFI Working Papers contain accounts of scientific work of the author(s) and do not necessarily represent the views of the Santa Fe Institute. We accept papers intended for publication in peer-reviewed journals or proceedings volumes, but not papers that have already appeared in print. Except for papers by our external faculty, papers must be based on work done at SFI, inspired by an invited visit to or collaboration at SFI, or funded by an SFI grant. ©NOTICE: This working paper is included by permission of the contributing author(s) as a means to ensure timely distribution of the scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the author(s). It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may be reposted only with the explicit permission of the copyright holder. www.santafe.edu SANTA FE INSTITUTE

Upload: others

Post on 10-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Complex Networks in Genomics and Proteomics...4 1 Complex Networks in Genomics and Proteomics Figure 1.1: The domain of molecular interaction among p53 and MDM2 is shown in this 3D

Complex Networks in Genomicsand ProteomicsRicard V. SoléRomualdo Pastor-Satorras

SFI WORKING PAPER: 2002-06-026

SFI Working Papers contain accounts of scientific work of the author(s) and do not necessarily represent theviews of the Santa Fe Institute. We accept papers intended for publication in peer-reviewed journals or proceedings volumes, but not papers that have already appeared in print. Except for papers by our externalfaculty, papers must be based on work done at SFI, inspired by an invited visit to or collaboration at SFI, orfunded by an SFI grant.©NOTICE: This working paper is included by permission of the contributing author(s) as a means to ensuretimely distribution of the scholarly and technical work on a non-commercial basis. Copyright and all rightstherein are maintained by the author(s). It is understood that all persons copying this information willadhere to the terms and constraints invoked by each author's copyright. These works may be reposted onlywith the explicit permission of the copyright holder.www.santafe.edu

SANTA FE INSTITUTE

Page 2: Complex Networks in Genomics and Proteomics...4 1 Complex Networks in Genomics and Proteomics Figure 1.1: The domain of molecular interaction among p53 and MDM2 is shown in this 3D

ComplexNetworks in Genomicsand Proteomics

RicardV. Sole andRomualdoPastor-Satorras

WILEY-VCH VerlagBerlin GmbHMay 29,2002

Page 3: Complex Networks in Genomics and Proteomics...4 1 Complex Networks in Genomics and Proteomics Figure 1.1: The domain of molecular interaction among p53 and MDM2 is shown in this 3D
Page 4: Complex Networks in Genomics and Proteomics...4 1 Complex Networks in Genomics and Proteomics Figure 1.1: The domain of molecular interaction among p53 and MDM2 is shown in this 3D

1 ComplexNetworks in Genomicsand Proteomics

Ricard V. SoleandRomualdoPastor-SatorrasICREA-Complex SystemsLab,UniversitatPompeuFabra-IMIM

Dr Aiguader80,08003Barcelona,SpainDepartamentdeFisicai EnginyeriaNuclear

UniversitatPolitecnicadeCatalunyaCampusNordB4, 08034Barcelona,Spain

1.1 Intr oduction

Complex multicellularorganismscontainlargegenomesin whicheachstructuralgeneis asso-ciatedwith at leastoneregulatoryelementandeachregulatoryelementintegratestheactivityof at leasttwo othergenes.Thenatureof suchregulationstartedto be understoodfrom theanalysisof smallprokaryoticregulationsubsystemsandthecurrentpictureindicatesthat thewebsthat shapecellular behavior are very complex. Actually, integration of extracellularsignalsofteninvolvesthecrosstalkbetweensignalcascadesthathasbeensuggestedto sharesomecommontraitswith neuralnetworks[1]. In arelatedcontext, detailedanalysesof subsetsof interactinggenesrevealthatcell biology is highly modular[2]. Here“modules”aremadeup of many speciesof interactingmoleculesandthe functionalrelevanceof thesesubnetsishighlightedby theobservationthatthey areconservedthroughevolution.

In many cases,proteinscomposedby multiple subunits behave asswitch-like elementsthatcanflip, for example,from anactiveto aninactivestateandback.Theswitchingbehaviorof thesecomplexes,togetherwith the underlyinginformationprocessingthat takesplaceatthe network level, allows for a computationaldescriptionof intracellularsignaling. In thiscontext, onemight considersomekey featuresof standardcomputationalsystemsthatshouldapplyhere.Oneparticularlyimportantaspectis theresilienceof thesignalingnetwork underdifferentsourcesof perturbation.Theanalysisof mutationalrobustnessin differentorganismsrevealedanextraordinarylevel of homeostasis:in many casesthetotalsuppressionof agivengenein a givenorganismleadsto a smallphenotypiceffector evento no effectatall [3, 4].

Following the analogywith engineeredsystems,the immediateexplanationfor suchro-bustnesswould comefrom the presenceof a high degreeof redundancy. Undermutation,additionalcopiesof a givengenemight compensatethe failure of the othercopy. However,theanalysisof redundancy in genomedataindicatesthatredundantgenesarerapidly lost andthatredundancy is not theleadingmechanismresponsiblefor mutationalrobustness[4].

Theoriginsof robustnessagainstmutationsis particularlywell highlightedby theanalysisof genome-widescaledataof thebuddingyeastSaccharomycescerevisiae[4]. Themaincon-clusionof this studyis thatthemajorcauseof robustnesscomesfrom theinteractionsamongunrelatedgenes. This mechanismwould be illustratedby the following example: given a

Page 5: Complex Networks in Genomics and Proteomics...4 1 Complex Networks in Genomics and Proteomics Figure 1.1: The domain of molecular interaction among p53 and MDM2 is shown in this 3D

4 1 Complex Networksin GenomicsandProteomics

Figure 1.1: Thedomainof molecularinteractionamongp53andMDM2 is shown in this 3Dreconstruction[5]. MDM2 (herein cyan)bindsaspecificdomainof p53in aregion(hereshownin yellow) importantfor theinteractionof p53with componentsof thetranscriptionmachinery.

metabolicnetwork, completelyunrelatedenzymescancatalysedifferent reactionsbut con-tribute to a pathway whosegoal is to sustainan optimal flux of metabolites.Under theseconditions,mutationsin genesencodingthoseenzymeswill have little or mild effects.Addi-tionally, it is interestingto seethatmany examplesof experimentalbiotechnologymanipula-tionsinvolving thetinkeringof oneor two genesfail to reachtheexpectedgoals:very often,counterintuitiveoutcomesareobtained.

On the otherhand,mutationsinvolving somekey genescanhave very importantconse-quences.This is thecase,in particular, of thep53tumorsuppressorgene,Figure1.1,which isknown to play a critical role in genomestability andintegratesmany differentsignalsrelatedto cell-cycleor apoptosis(cell death)[6]. This andothertumor-suppressorgenespreventcellproliferation(thuskeepingcell numbersundercontrol)but canalsopromoteapoptosis.Theexampleof p53is particularlyimportantbecauseit is mutatedor thereis afunctionaldefectinthep53pathway in approximatelyhalf of humancancers.Thep53network (partially shownin Figure1.2) is quitewell-known in mammalsandinvolvesgenesthatcontrol,for example,apoptosis,the developmentof blood vessels,or cell differentiation. The coreof this net isdefinedby thefeedbackloop existing betweenp53andits negative regulator, theMDM2 on-coprotein. In invertebrates(suchasDrosophila) homologuesof p53 areknown to be activethroughoutearlydevelopment[7].

Thefactthatmany mutationshave little or no effect seemsto beconsistentwith thepres-enceof genesthateithercannotpropagatetheir failureor whosefunctioncanbereplacedbyotherpartsof thenet.Thepresenceof somegenesthatintegratemultiplesignalsandcantrig-gerwidespreadchangesundertheir failureshows that theunderlyingnetwork includessomehighly-connectedhubs. It seemsto be a compromisebetweenintegrationandhomeostasisthatshouldbeobservablewhenlooking at themapof interactionswithin thecellularnet.

Althoughacompletedescriptionof cellularnetworkswould requiretheexplicit consider-

Page 6: Complex Networks in Genomics and Proteomics...4 1 Complex Networks in Genomics and Proteomics Figure 1.1: The domain of molecular interaction among p53 and MDM2 is shown in this 3D

1.1 Introduction 5

!#"$%'& (

)* $(,+ + -/.0/.0

123546"798 5:;

< ='='>?A@

Figure 1.2: Schematicarchitectureof the p53 network. The p53 nodeintegratesinformationfrom very differentpartsof the system. Only part of the cell circuitry is shown here. For adetailedpresentation,seeRef. [8].

ationof dynamics,topologicalapproaches—inwhich only thestaticarchitectureof thenetisconsidered—areoftensuccessfulin providing insight into biologicalcomplexity. This is thecase,for example,of somemodelsof ecologicalnetworks: in spitethatpopulationsfluctuatein timeandchangesin biomassor productivity takeplaceatdifferentscales,someof thefun-damentalregularitiesexhibitedby food webscanbefairly well explainedby meansof staticapproaches[9]. Besides,thecomparisonof a wide rangeof complex networks(bothnaturalandartificial) revealsthat strongregularitiesaresharedby them,in spitethat their underly-ing components,the natureof their interactions,andtheir time scalesarevery different. Inthis chapterthetopologicalpatternsdisplayedby thesenetworkswill beexplored.As will beshown, thecompromisebetweenstability andintegrationcanbemadeexplicit by looking atthelarge-scaleorganizationof cellularnetworks.

Page 7: Complex Networks in Genomics and Proteomics...4 1 Complex Networks in Genomics and Proteomics Figure 1.1: The domain of molecular interaction among p53 and MDM2 is shown in this 3D

6 1 Complex Networksin GenomicsandProteomics

1.2 Cellular networks

Themolecularbasisof geneticcontrolin cells,particularlyin eukaryoticcells(i. e. cellswithnucleus)is oneof themostbasicactiveareasof molecularcell biology. Of particularinterestistheunderstandingof theregulationmechanismsinvolvedin thedevelopmentof multicellularorganisms.In mostwell-known casestudies,suchasin thefruit fly Drosophilamelanogaster,it hasbeenshown that regulationamongthe genesthat control early development(suchasfushi tarazu,Figure1.3) takesplaceat the transcriptionlevel [10]. The web of interactionscanbeverycomplex, andanexampleof asub-webof thegeneticnetassociatedto Drosophilaearly developmentis shown in Figure1.3(b). Mutationsin genesassociatedto early stagesof developmenthave typically a strongeffect andsometimes,asit occurswith the so-calledhomeoticgenes[11], they resultin importantmorphologicalchanges.

Modelsof generegulationhavea longhistoryin theoreticalbiology [12, 13]. Thediscov-ery of the mechanismsof transcriptionregulationin the Lac operonof E. coli wasfollowedby the formulationof somesimplemathematicalmodels[14]. Inspiredin early modelsofneuralnetworks,a standardformulationof generegulationcanbe introducedby meansof adynamicalsystem:BDCAEGFHIBH JLK EMON PRQSUT E C EV W JYX VZ'Z[Z'V\]V (1.1)

wherea setof\

differentgenesis defined.HereP J FC V,Z[Z'Z[V CA^_I givesthe activity stateof

eachgene.Degradationis introducedby thelasttermT E C E

. Thefunction K EM N PRQ introducethenatureandextentof the interactionsamongcomponents.An exampleof suchtypeof modelis: BDC E FHIBH JLK EMa`b ^cdGe f E d C d FgHI Sih Egjk SUT EgCAEFHI V

(1.2)

where K EM FglI is a sigmoidalfunction of the local field m E Jon d f E d C d , h E is a threshold,andtheweights f E d give thesignandstrengthof thegene-geneinteractions.Usually thesetp Jrq f E dAs is generatedfrom agivendistribution t F f I

thatis assumedto besymmetricandwith zeromean.This typeof netcandisplaya hugevarietyof dynamicalpatterns,includingoscillationsandchaos[15]. But the really interestingbehavior (seebelow) comesfrom thestatisticalpropertiesderivedfrom thepresenceof phasetransitions[16] whentheconnectivityis tuned.

Why to considerthis type of mathematicalapproximations?Someattemptsof buildinglarge-scalemodelsof cellular netsbasedon near-realisticdescriptionshave failed to repro-ducethewholespectrumof dynamicalpatternsdisplayedby evensimplecontrolledsystems.On the otherhand,somekey questionscanfind powerful answersin the genericpropertiesexhibited by simple representationsof real nets[16]. As an example,a striking featureofmulticellulardiversity is the surprisinglysmall repertoireof cell types,giventhe potentiallyastronomicdiversity of cell statesthat would be obtainedfrom the combinatoricsof genestates[17]. Assumingthat thenumberof genesin a multicellularorganismis uwv XxDy , zAdifferentpossiblestatesareavailable. Yet, if cell typesareconsideredasindicatorsof geneexpressionstates,only z xx S| xx statesareactuallyrealized.

Page 8: Complex Networks in Genomics and Proteomics...4 1 Complex Networks in Genomics and Proteomics Figure 1.1: The domain of molecular interaction among p53 and MDM2 is shown in this 3D

1.2 Cellular networks 7

~

9O

9 / [

[

9 9 9R

a b

Figure 1.3: (a) Spatialpatternof activity of a givengeneinvolved in Drosophiladevelopment(the so-calledfushi tarazugene(FTZ); seealsoFigure1.2). The darker areascorrespondtohigherlevelsof activity of FTZ, indicatingwhatcellsareexpressingit. Cell-to-cell interactionsgeneratethissetof stripeswith acharacteristiclength.In (b) anexampleof a realgenenetworkis shown. It includessomepart (i.e. a directedsubgraph)of thegeneticnet involved in thede-velopmentof Drosophilafly. Thenamesof thegenesinvolvedareindicated,suchasFTZ=fushitarazu.Only theconnectionsareshown, not their sign.

In this sectionwewill summarizesomekey featuresof this typeof dynamicalsystemsbyconsideringtherichnessof their attractorswhenlow-dimensionalnetsareused.Afterwards,thegeneralscenarioinvolvingalargenumberof genes(i.e. largenetworks)will beconsidered.

1.2.1 Two-genenetworks

Theminimalnumberof genesneededin orderto obtainarich spectrumof behavioral patternsis given by two elementsin interaction,althoughsingle-genemodelswith the appropriatenonlinearitiescanalsodisplaycomplex dynamicbehavior [18]. Two-genemodelsallow tounderstandparticularlyimportantproblems,suchasthedynamicsof virus-cell interactionsinbacteria[19]. An exampleis thefollowing two-genesystemwith noself-interaction,describedby theequations:BAC BAH J f C X f C S C (1.3)BAC BAH J f C X f C S C Z (1.4)

Thefixedpointsareeasilyfound; togetherwith the trivial fixedpoint, ¢¡£ J F x V x I we getasecondnontrivial point ¢¡ J F¤C ¡ V C ¡ I givenby:C ¡ J ¥f §¦ C ¡ J ¥f §¦ V (1.5)

Page 9: Complex Networks in Genomics and Proteomics...4 1 Complex Networks in Genomics and Proteomics Figure 1.1: The domain of molecular interaction among p53 and MDM2 is shown in this 3D

8 1 Complex Networksin GenomicsandProteomics

whosestability canbeeasilydetermined.Here ¦¨J© f f and ¥ Jª f f S X . Theeigenvaluesassociatedto theJacobimatrix for thissystemfor ¢¡£ J F x V x I are«R¬ J S X­®°¯ f f V (1.6)

andthusthis point will be stableif A± f f ³² X . Thereis an exchangeof stability and ¢¡ becomesstablewhenthepreviousconditiondoesnot hold (i. e. a transcriticalbifurcationtakesplace)[22].

Whenself-interactionsarealsoconsidered(i. e. f E[Eµ´Jªx I severalattractorscanbepresentasa consequenceof thecompetitionbetweenpositive feedbacksandmutualinhibition. Oneparticularcaseis givenby networkssuchthat thematrix of connections

pis symmetric,of

theform:p J·¶ T ¸¸ TO¹ V(1.7)

with¸»º½¼ ¾

andTÀ¿ x . In otherwords,whenthereis self-activation by both genesplus

crossinteractionswhich canbepositive or negative. Thelater is a very commonsituationinrealmorphogeneticprocessesandis stronglyrelatedwith theprocessof competitionbetweenspeciesin ecosystems.

Thestabilityanalysisof thisgeneralproblemcanbeperformedby usingthegeneralJacobimatrix: Á J¶ ¥ AÃDÄ S X ¥ ¸ Ã5Ä ¥ ¸ Ã5Ä ¥ AÃDÄ S X ¹ (1.8)

where Ä E dÆÅ Xµ ¥ C ¡E ¸ C ¡d . For¸Ç¿ x , themutualreinforcementbetweenbothgenesleads

to the samestate(indicatedashomogeneousin Figure1.4). HereC ¡ J C ¡ J N F ¥ ¸ I SX Q Ã F ¥ ¸ I and it is stable(this point disappearsat

¸ J F X S T I ÃD ). For¸ ² S X , the

self-interactionis unableto sustaingeneactivity andit decaysto zero.Finally, aninterestingdomainis observedfor

F X S T I Ã5 ¿È¸Ç¿ x , wherethreeattractorsarepresent(thepreviousone,wherebothcoexist, andtwo exclusionpoints). In Figure1.4(b)we show anexampleoftheflow field for the 3-attractordomain. We canseethat therearethreebasinsof attractionassociatedto eachpossiblefinal state(fixedpoint).

Theseresults,in particularthepresenceof multiple attractorsfor someparameterranges,arespeciallyimportantwithin thecontext of development[20, 21]. In many casesthebehaviorof cellsthatbecomedifferentiatedis very similar to thatof a switch. By dependingon initialconditionsor externalperturbations,which might emergefrom someothergenesin thenet-works,thesystemcanreachoneor anotherbasinof attractionandthusa differentfinal state.More importantly, it hasbeenshown that somewell-defined,small setsof interactinggenes(so-calledmodules),areresponsiblefor specificspatialpatternsemerging in morphogeneticprocesses[20, 21]. As aconsequence,notonly singlegenes,but modules,canbethetargetofselection.

1.2.2 Randomnetworks

Beyond the specificwiring diagramsthat canbe consideredin small-sizedgeneticnets,thestudyof large-u netshasbeendominatedby randomly-wiredsystems[16]. Heregenesare

Page 10: Complex Networks in Genomics and Proteomics...4 1 Complex Networks in Genomics and Proteomics Figure 1.1: The domain of molecular interaction among p53 and MDM2 is shown in this 3D

1.2 Cellular networks 9

−1.0 −0.5É 0.0 0.5 1.0Cross−interaction strengthÊ0.0

0.5

1.0

1.5

Fix

ed p

oint

s

0.0 0.5 1.0 1.5 2.0Morphogen 1Ë#Ì0.0

0.5

1.0

1.5

2.0

Mor

phog

en 2Í

homogeneous(single point)

3stableÎ

bistableÏ

A

B

Figure 1.4: Multistability in genenetwork models: (a) bifurcationdiagramfor the two-genenetwork modelwith a symmetricmatrix. Here Ð³Ñ»Ò and ÓaÑÕÔ . Threebasicdomainsareinvolved(seetext); (b) flow diagramof themodelfor Ö×ÑÈØÚÙDÛÜÔÞÝ , in thethree-attractordomain,indicatedas3stablein (a).

connectedat random,with anaveragenumberof ß connectionspergene.An extensive litera-tureon randomBooleannetworkshasshown thata numberof genericfeaturesarecharacter-istic of thesenetsasa consequenceof thepresenceof phasetransitionphenomenain random

Page 11: Complex Networks in Genomics and Proteomics...4 1 Complex Networks in Genomics and Proteomics Figure 1.1: The domain of molecular interaction among p53 and MDM2 is shown in this 3D

10 1 Complex Networksin GenomicsandProteomics

graphs[23].In orderto illustratethis idea,let usconsidera graph ¦ ^°à á thatconsistsof

\nodesjoined

by links with someprobability â . Specifically, eachpossiblelink betweentwo given nodesoccurswith a probability â . Theaveragenumberof links (alsocalledtheaveragedegree) ofa givennodewill be ß J \ â , andit canbeeasilyshown thattheprobability F9ã_I thata nodehasa degree

ã(it is connectedto

ãothernodes)followsa Poissondistribution,

F9ã_I Jªäåæ ßèçãé Z (1.9)

This so-calledErdos-Renyi (ER) randomgraph[24] will be fairly well characterizedby anaveragedegreeê ã_ë J c ç ã F9ã#I J ß V (1.10)

where Fìã_I shows a peak.Thedistribution F9ã_I is in this sensea single-scaleddistribution[25] andanexampleis shown in Figure1.5(a).

TheERmodeldisplaysaphasetransitionatagivencritical averagedegreeß5í JYX [23, 26].At this critical point, a giant componentforms: for ß ¿ ß5í a large fractionof the nodesareconnectedin asinglecluster, whereasfor ß ² ß5í thesystemis fragmentedinto smallsubwebs.This typeof randommodelhasbeenusedin differentcontexts, includingecological,genetic,metabolic,andneuralnetworks [26]. The importanceof this phasetransitionis obvious intermsof the collective propertiesthat ariseat the critical point: communicationamongthewhole systembecomespossible,andthusinformationcanflow from the units to the wholesystemandback. Besides,thetransitionoccurssuddenlyandimpliesaninnovation. No lessimportant,it takesplaceat a low costin termsof thenumberof requiredlinks ( îÈu ).

The ER modelcanbe extendedto directedgraphsandhasbeenanalyzedby Kauffmanwithin the context of geneticregulatory networks [16]. In the languagepresentedin sec-tion 1.2, this will correspondto a network in which genesarerandomlyconnected,andreg-ulatedby anaverageof ß othergenes.This meansthat the uoïðu matrix

p J»q f E ds willhave ßu nonzeroelements,distributedat random.Theprobability thata geneis regulatedby exactly

ãothergeneswill bethengivenby thedistribution(1.9).Beyondthespecifictime-

dependentfeaturesassociatedto theparticularmodelchosen,oneimportantcharacteristicofthesesystemsis thepresenceof thepercolationthreshold:onceacritical averageconnectivityß íñJòX (the ratio of directedlinks to genes)is reached,the systembecomessuddenlycon-nected.Below thecritical thresholdthesystemis essentiallydisconnectedandthuschangesin a givengenecannotpropagateto the restof the system.The presenceof the percolationthresholdallows thesystemto exhibit acomplex dynamicalbehavior, includingdeterministicchaos,Figure1.5(b).

Oneconsequenceof thesemodels(but stronglytied to thetopologicalpropertiesof sparserandomgraphs)is thata high diversityof attractorscompatiblewith a high degreeof home-ostasisseemsto naturallyemergecloseto thepercolationthreshold.However, earlyevidenceindicatedthat thedegreedistributionsthatcharacterizereal geneticnetsarefar from Poisso-nian. Actually, aswe will seein section1.4, the topologyof real networksstronglydepartsfrom theErdos-Renyi scenario.

Page 12: Complex Networks in Genomics and Proteomics...4 1 Complex Networks in Genomics and Proteomics Figure 1.1: The domain of molecular interaction among p53 and MDM2 is shown in this 3D

1.3 Threeinterconnectedlevelsof cellular nets 11

0.4

0.6

0.8

0.40.450.50.550.60.650.70.750.8

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.4

0.6

0.80.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

óô õ ö

óô õ ö÷ø ù úûüý

÷ø ù úûüýóô õ þÿö

óô õ þÿö Figure1.5: (a)An exampleof adirectedrandomnetwork with Poissonianstructure.Hereeachnodeis a genein a modelgenenetwork andarrows indicatethe regulatoryconnections.Thistypeof graphis characterizedby anaveragedegree ; togetherwith theappropriatenonlinearcouplingamonggenes,it cangeneratedifferent typesof dynamicalpatterns,including deter-ministic chaos.An exampleof thestrangeattractorsobtainedfrom thesenetsis shown in (b) intwo differentviews.

1.3 Thr eeinterconnectedlevelsof cellular nets

Generegulationtakesplaceat differentlevelsandinvolvestheparticipationof proteins.Thewholecellularnetwork includesthreelevelsof integration: Thegenome,andtheregulationpathwaysdefinedby interactionsamonggenes; Theproteome,definedby thesetof proteinsandtheir interactions;and Themetabolicnetwork, alsounderthecontrolof proteinsthatoperateasenzymes.

Unlike the relatively unchanginggenome,the dynamicproteomechangesthroughtimein responseto intra- andextracellularenvironmentalsignals. The proteomeis particularlyimportant. Proteinsunify genomestructureon the onehandandfunctionalbiology on theother:they areboththeproductsof genesandregulatereactionsor pathways.

Complicatingthestudyof genefunctionis thefactthatmultiple proteinscanarisefrom asinglegene.In eukaryoticorganisms,genesappearfragmentedinto pieces(exons)separatedby non-codingdomains(introns).After transcription,theresultingmessengerRNA (mRNA)is generatedby theexcisionandeliminationof intronsfollowedby thejoining of exons.Thisprocessis calledsplicing. Oncethe mRNA is formed,it will be translatedinto a proteinbythetranslationmachinery.

Page 13: Complex Networks in Genomics and Proteomics...4 1 Complex Networks in Genomics and Proteomics Figure 1.1: The domain of molecular interaction among p53 and MDM2 is shown in this 3D

12 1 Complex Networksin GenomicsandProteomics

A very importantfeatureis thatsplicingcanoccurin differentwayssothatdifferentsetsof exonsarejoined together. In this way, differentmRNA’s (andthusdifferentproteins)areproduced. The combinatorialpotentialof this so-calledalternativesplicing is obvious. Insomecases,thousandsof differentproteinsarepotentiallyavailablefor a givengene.

Alternativesplicingexpandsgenomecomplexity in anextraordinaryfashion.In this con-text, althoughthegenomesof complex organismsmight not stronglydiffer in termsof theirnumberof genes,theunderlyingproteomecomplexity canbevery different. As will bedis-cussedin the next sections,the actualstructureof proteinnetworks is shown to be stronglyheterogeneousandsharesseveralpreviouslyunsuspectedtraitswith many differentsystems.

1.4 Small world graphsand scale-freenets

Theanalysisof thetopologicalstructureof proteininteractionmaps(in thebuddingyeastSac-charomycescerevisiaeandothersimpleorganisms)revealeda surprisingresult: theprotein-protein interactionnet sharessomeuniversalfeatureswith the topologicalorganizationofothercomplex nets,bothnaturalandartificial, rangingform technologicalnetworks[27, 25,28, 29], neuralnetworks[30], metabolicpathways[31, 32, 33], andfoodwebs[34, 35] to thehumanlanguagegraph[36]. Thesestudiesactuallyoffer thefirst globalview of theproteomemapandshow thatit stronglydepartsfrom thesimpleErdos-Renyi scenario.

Thefirst featurecharacteristicof theproteomemapis thattheprobability F9ã_I thatagivenproteininteractswith other

ãproteinshasascale-free(SF)nature,i.e. it followsapower law, F9ã#I î ã å , with a sharpexponentialcut-off for large

ã. Thusmostproteinshave a small

numberof links with otherproteinsandafew of themarehighly connected(hubs). Thoselastonesarelikely to bevery importantto cell function[37, 38, 6].

Thesecondfeatureis thepresenceof theso-calledsmallworld (SW)property[30, 39, 40].Smallworld graphshaveanumberof surprisingfeaturesthatmake themspeciallyrelevanttounderstandhow interactionsamongindividuals,metabolites,or speciesleadto therobustnessandhomeostasisobservedin nature.TheSWpatterncanbedetectedfrom theanalysisof twobasicstatisticalpropertiesof thenetwork1: (a) theclusteringcoefficient and(b) theaveragepathlength .

The proteomegraph(seeFigure1.6) is definedby a pair ¦ á J F f á V á I , where f á Jq â E s V F W J X V,Z[Z'Z[V u I is the set of u proteins(nodes)and á J qAq â EV â d5sAs is the set of

edges/connectionsbetweenproteins. The adjacencymatrix E d indicatesthat an interactionexistsbetweenproteinsâ EV â d º f á ( E d J X ) or thattheinteractionis absent( E d Jrx ). Twoconnectedproteinsarethuscalledadjacentandthedegree

ã Eof a givenproteinis thenumber

of edgesthatconnectit with otherproteins.Let usconsidertheadjacency matrixandindicateby E J q â d E d J X s the setof nearestneighborsof a protein â E º f á . The clusteringcoefficient for this proteinis definedasthe ratio betweenthe actualnumberof connectionsbetweentheproteinsâ d º E , andthetotalpossiblenumberof connections,

ãAEGF9ãE S X I Ã z [30]

1Sincetheproteomemapis adisconnectednetwork, thesequantitiesareactuallydefinedon thegiantcomponent,definedasthelargestclusterof connectednodesin thenetwork [23].

Page 14: Complex Networks in Genomics and Proteomics...4 1 Complex Networks in Genomics and Proteomics Figure 1.1: The domain of molecular interaction among p53 and MDM2 is shown in this 3D

1.4 Smallworld graphsandscale-freenets 13

pi

Γ

p

p

p

p

ξ

ξ pΩ

Figure1.6: Measuringtheclusteringfrom aproteomegraph . Hereeachnode(blackcircles)is a proteinandphysicalinteractionsareindicatedby meansof edgesconnectingnodes.

(seeFigure.1.6). Denoting! E J cdGe E d " cç$#&%(' d ç*) V (1.11)

we definetheclusteringcoefficientof theW-th proteinas F W I J z ! EãAEF9ãE S X I V (1.12)

whereãE

is thedegreeof theW-th protein.Theclusteringcoefficient is definedastheaverage

of F W I overall theproteins,

J Xu c E e F W I Z (1.13)

The averagepathlength is definedasfollows: Given two proteinsâ E V â d º f á , let E d be

thelengthof theshortestpathconnectingthesetwo proteins,following thelinks presentin thenetwork. Theaveragepathlength is definedas:

J zu F u S X I c E+ d E d Z (1.14)

For the ER graph,we have a clusteringcoefficient inverselyproportionalto the networksize, -,/.½v ß Ã u ; this is a very small quantity, that tendsto zerofor large networks. Theaveragepathlength,on the otherhand,is proportionalto the logarithmof the network size10 2 v4365(7 F u I Ã 38597 F ß I . At the other extreme,regular latticeswith only nearest-neighborconnectionsamongunits exhibit a long averagepathlength. Graphswith SW structurearecharacterizedby a high clustering,;:< ,/. , while possessinganaveragepathcomparableto anERgraphwith thesameaverageconnectivity andnumberof nodes, v ,/. .

Theexperimentalobservationson theproteomemapcanbesummarizedasfollows:

Page 15: Complex Networks in Genomics and Proteomics...4 1 Complex Networks in Genomics and Proteomics Figure 1.1: The domain of molecular interaction among p53 and MDM2 is shown in this 3D

14 1 Complex Networksin GenomicsandProteomics

10= 010= 1

k>10

−3

10−2

10−1

100

Pcu

m(? k)

Figure 1.7: (b) Cumulateddegreedistribution for theyeastproteomemapfrom Ref. [37]. Thedegreedistribution hasbeenfitted to thescalingbehavior @BACEDGFHACJIGKLCEDMEN&O$MQPRSPUT , with anexponentÓWV®ÒDÛ X anda sharpcut-off C$YZV ÔÞÝ .

1. Theproteomemapis asparsegraph,with asmallaveragenumberof links perprotein.InRef.[41] anaverageconnectivity ᛌ X Z [ S z Z | wasreportedfor theproteomemapof S.cerevisiae. This observation is alsoconsistentwith thestudyof theglobalorganizationof theE. coli genenetwork from availableinformationontranscriptionalregulation[42].

2. It exhibits a SW pattern,differentfrom thepropertiesdisplayedby purely random(ER)graphs.In particular, Ref. [41] reportedthe values J z Z z ï X,x å and J]\ Z X_^ , tobecomparedwith thevaluescorrespondingto anER network with comparablesizeandaverageconnectivity, -,/. JYX ï Xx åa` and 10b2 Jdc Z x .

3. Thedegreedistribution of links follows a power-law with a well-definedcut-off. To bemoreprecise,Jeonget al. [37] reporteda functionalform for thedegreedistribution ofS.cerevisiae F9ã#IZeÀFìã £ ã_I åäAå çgfç T Z (1.15)

Parametersreportedin Ref.[37] areã £ e X , T vLz Z ^ andacut-off

ã í vªz x . In Figure1.7we checkthis functionaldependenceon thecumulateddegreedistributionof theproteinmap2 usedin Ref. [37]. A fit to the form (1.15)yields the values

ã £ e X Z X , ã í e X*h ,and

T J z Z i ­ax Z z , compatiblewith theresultsfoundin [37, 41]. Thisparticularform ofthedegreedistributioncouldhaveadaptivesignificanceasasourceof robustnessagainstmutations.

Thehighly heterogeneouscharacterof thesemapshasimportantconsequenceswithin thecontext of molecularcell biology [32, 6]. It indicatesthat theevolution of proteome/genome

2Dataavailableat thewebsitehttp://www.nd.edu/ j networks/database/index.html.

Page 16: Complex Networks in Genomics and Proteomics...4 1 Complex Networks in Genomics and Proteomics Figure 1.1: The domain of molecular interaction among p53 and MDM2 is shown in this 3D

1.5 Scale-freeproteomes:geneduplicationmodels 15

complexity hasbeendriventowardsa well-definedtopologicalpatternthatprovidesthesub-stratefor anextraordinaryhomeostaticstabilityagainstrandommutationalevents.

1.5 Scale-freeproteomes:geneduplication models

Severalmodelshave beenproposedin orderto explain theregularitiesdisplayedby thepro-teomemap[44, 45, 46]. Thesemodelsof proteomeevolutionarebasedon ageneduplicationplus rewiring processthat includesthe basicingredientsof proteomegrowth andintendstoreproducetheprevioussetof observations.Thefirst componentof themodelsallows thesys-temto grow by meansof thecopy processof previousunits(togetherwith their wiring). Thesecondintroducesnovelty by meansof changesin thewiring pattern,usuallyconstrainedtothe newly createdgenes.This constraintis requiredif we assumethat conservationof gene(protein)interactionsis dueto functionalrestrictionsandthatfurtherchangesin theregulationmaparelimited. Suchconstraintwould bestronglyrelaxedwheninvolving a newly created(and redundant)unit. The modelsproposedso far are intendedto capturethe topologicalpropertiesof theproteomemap.No explicit functionalityis includedin thedescriptionof theproteinsandthis is certainlya drawback.But by ignoringthespecificfeaturesof theprotein-proteininteractionsandtheunderlyingregulationdynamics,onecanexplore thequestionofhow muchthenetwork topologyis dueto theduplicationanddiversificationprocesses.

In this chapterwe will focusin particularin themodeldescribedin Refs.[45, 46]. Thismodelconsiderssingle-geneduplications,whichoccurin mostcasesdueto unequalcrossover[47], plusre-wiring. Multiple duplicationsshouldbeconsideredin futureextensionsof thesemodels: molecularevidenceshows that even whole-genomeduplicationshave actuallyoc-curredin S.cerevisiae[48] (seealsoRef. [49]). Re-wiringhasalsobeenusedin dynamicalmodelsof theevolutionof robustnessin complex organisms[50].

Theproteomegraphatany givenstepH(i.e. after

Hduplications)will beindicatedas ¦ á#FHI .

Therulesof themodel,summarizedin Figure1.8,areimplementedasfollows.Eachtimestep:(a) onenodein thegraphis randomlychosenandduplicated;(b) thelinks emerging from thenew generatednodeare removed with probability ; (c) finally, new links (not previouslypresent)canbecreatedbetweenthenew nodeandall therestof thenodeswith probability ¥ .Step(a) implementsgeneduplication,in which both the original andthe replicatedproteinsretain the samestructuralpropertiesand, consequently, the sameset of interactions. Therewiring steps(b) and (c) implementthe possiblemutationsof the replicatedgene,whichtranslateinto thedeletionandadditionof interactions,with differentprobabilities.

1.5.1 Mean-field rate equation for the averageconnectivity

Sincethemodeljustpresentedhastwo freeparameters,namelythedeletionprobability andtheadditionprobability ¥ , onepreliminarytaskis to constraintheir possiblevaluesby usingtheavailableempiricaldata.Oneaveragepropertythatcanbedeterminedis theevolution oftheaveragenumberof interactionsperprotein/genethroughtime,whichcanbecomparedwiththeevidencefrom realproteomes[37, 41], aswell asrecentanalysisof large-scaleperturbationexperiments[51].

Page 17: Complex Networks in Genomics and Proteomics...4 1 Complex Networks in Genomics and Proteomics Figure 1.1: The domain of molecular interaction among p53 and MDM2 is shown in this 3D

16 1 Complex Networksin GenomicsandProteomics

klnmkUopm

δ

αkqrmFigure 1.8: Growing network by duplicationof nodes.First (a) duplicationoccursafter ran-domly selectinga node(arrow). Thelinks from thenewly creatednode(white) now canexpe-riencedeletion(b) andnew links canbecreated(c); theseeventsoccurwith probabilitiesÐ ands , respectively.

Let us indicateby ß and t the averageconnectivity of the systemand its numberlinks, respectively, whenit is composedby u proteins.Thesemagnitudessatisfytherelationt J ß u à z . It is easyto check(seealsoRef. [44]) that,at a mean-fieldlevel, thatnumberof links t fulfill thefollowing rateequationt vu J t ß ¥ F u S ß I S ß V (1.16)

wherethe last two termscorrespondto the additionof links to a fraction ¥ to the u S ß unitsnot connectedto theduplicatednode,plusthedeletionof any of thenew ß links, withprobability . UsingthecontinuousapproximationB ß B u e ß wu S ß V (1.17)

Eq.(1.16)canbewrittenB ß B u J Xu N ß z ¥ F u S ß I S z ß Q V (1.18)

whosesolutionisß J ¥¥ ® u ¶_ß S ¥¥ ® ¹ u % V (1.19)

where JÀX S z F ¥ U I and ß is theinitial connectivity at u JYX . For any constantvalueof¥ and thismodelleadsto anincreasingconnectivity throughtime. In orderto haveafinite ßin thelimit of large u , onepossiblesolutionis to imposeanadditionrate ¥ thatis a functionof thesizeof thenetwork, with theform

¥ F u I J ¸u V(1.20)

Page 18: Complex Networks in Genomics and Proteomics...4 1 Complex Networks in Genomics and Proteomics Figure 1.1: The domain of molecular interaction among p53 and MDM2 is shown in this 3D

1.5 Scale-freeproteomes:geneduplicationmodels 17

where¸

is a constant.That is, the rateof additionof new links (the establishmentof newviable interactionsbetweenproteins)is inverselyproportionalto the network size,andthusmuchsmallerthanthe deletionrate , in agreementwith the ratesobserved in [41]. In thiscase,for large u , thedifferentialrateequation(1.18)equationtakestheformB ß B u J Xu F X S z I ß z ¸u Z

(1.21)

Thesolutionof this equationis

ß J z ¸z S X Õ¶ ß S z ¸z S X ¹ u å Ux Z (1.22)

For ¿ X5Ã z a finite connectivity is reachedin thelimit of a largenetwork,

ß Å 36y8zv| ß J z ¸z S X Z (1.23)

In order to reducethe numberof independentparametersof the model,Ref. [45] usedtheavailableexperimentaldatato estimatetheaveragedegree ß andtheratio of additionanddeletionratesin the yeastproteome,¥ ÃD [41] to find a relation between and , which,togetherwith Eq. (1.23), yields a numericalestimateof

¸and . Sinceit is clear that this

estimateis stronglydependentontheassumedvalue ¥ ÃD , Ref. [46] followedamorepragmat-ical approach,consideringa -dependentmodelandfixing theactualvalueof by comparingnumericalsimulationswith experimentaldata.

1.5.2 Rateequation for the nodedistrib ution ~-Therateequationapproachto evolving networks[52] canbefruitfully appliedto theproteomemodelunderconsideration[46]. This approachfocuseson thetime evolution of thenumber\ ç FgHI of nodesin thenetwork with exactly

ãlinks at time

H. Definingour network by means

of thesetof numbers\ ç FHI , wehave thatthetotal numberof nodesu is givenbyu J c ç \ ç V (1.24)

while thetotalnumberof links is givenbyt J Xz c ç ã \ ç Z (1.25)

Time is dividedinto periods.In eachperiod,H H ÇX , onenodeis duplicatedat random,

so that u u ÀX . If, after eachduplication,thereis a probability to deleteeachlinkfrom thejust-duplicatednode,theprobabilityof increasingthenumberof nodesat degree

ã,

by directduplicationwithout link deletion,is givenbyZ à 1_ N \ ç \ ç ÈX Q J \ çu F X S ã I Z (1.26)

Page 19: Complex Networks in Genomics and Proteomics...4 1 Complex Networks in Genomics and Proteomics Figure 1.1: The domain of molecular interaction among p53 and MDM2 is shown in this 3D

18 1 Complex Networksin GenomicsandProteomics

Ontheotherhand,anodeof degreeã

canbecreatedfrom theduplicationof anodeof degreeã ÈX in whicha link is deleted,contributingwith aprobabilityZ$ à 1_ N \ ç \ ç ÈX Q J \ çu u Fìã ÈX I Z (1.27)

Theprobabilityof degreechange,from duplicationof a nodeconnectedto a degree-ã

node,is givenby:ZS à 1_ N F \ ç å V\ ç I F \ ç å S X V\ ç ÈX I Q J \ ç å u F9ã S X IF X S I Z (1.28)

Finally, in thesameperiod,weproceedto add u S ãElinks with probability ¥ J ¸ Ã u , whereãE

is theconnectivity of thejustduplicatednode.In thelimit u: ãE, wecansimplyconsider

the additionof u ¥ J ¸new links to the graph. Whenthis last stepis performedwith the

correlatedprescriptiongivenfor themodel(i.e. addinglinks from theduplicatednodeto therestof thenodesin thegraph),it leadsto a nonlocalrateequationfor the functions

\ ç [46].For thesake of simplicity, we will considernow thesimplercaseof a uncorrelatedadditionof links (new links createdbetweenany two nodesin thegraph).However, it canbeprovedthatbothprescriptionsleadqualitatively to similar results[46].

Thecaseof uncorrelatedadditionof links canberepresentedasthedistribution of z ¥ unew link endsamongthe u nodesin thenetwork. Thiseventcontributeswith aprobabilityZ _ N F \ ç V\ çu IZ F \ ç S X V\ çu ©X I Q J \ çu z ¥ u J \ çu z ¸ V (1.29)

Theprobabilities(1.26),(1.27),(1.28),and(1.29)definetherateequationfor theconnectivitydistributionB \ ç FHIBH J \ çu u N Fìã ÈX I \ çu S ã \ ç Q X S u N F9ã S X I \ ç å S ã \ ç Q z ¸u N \ ç å S \ ç Q Z (1.30)

Sinceeachtimestepanew nodeis added,Eq. (1.30)satisfiestheconditionB uBHrJ c çB \ ç FgHIBH J½X V (1.31)

thatyieldstheexpectedresult u FHI J u £ H , where u £ is theinitial numberof nodesin thenetwork. In ordertosolveEq.(1.30),weimposethehomogeneousconditiononthepopulationnumber\ ç FHI J u FgHI â ç eÈH â ç V (1.32)

whereâ ç is theprobabilityof finding a nodeof connectivityã, which we assumeto beinde-

pendentof time. With this approximation,therateequationreadsF9ã ©X I â çu S F9ã z ¸ I â ç N Fìã S X IÞF X S I z ¸ Q â ç å Jªx Z (1.33)

Page 20: Complex Networks in Genomics and Proteomics...4 1 Complex Networks in Genomics and Proteomics Figure 1.1: The domain of molecular interaction among p53 and MDM2 is shown in this 3D

1.5 Scale-freeproteomes:geneduplicationmodels 19

Eq. (1.33)canbe solvedusingthe generatingfunctionalmethod[53]. Let us definethe thegeneratingfunctional FlI J c ç l ç â ç Z (1.34)

Introducingthis definitioninto Eq. (1.33),weobtainanequationfor FlI

, whosesolutionis FlI J ¶ S l F X S Iz S X ¹ å f1 å x Z(1.35)

Knowing FlI

we cancomputeimmediatelytheaverageconnectivityß J c ç ã â ç Å l BE FlIBl e J z ¸z S X V (1.36)

in agreementwith the mean-fieldpredictionof Eq. (1.23). On the otherhand,performingaTaylor expansionof

FglIaround

l J©x wecanobtainâ ç asâ ç J Xãé B FlIBAl e £ Z (1.37)

Applying this formula to the function(1.35),andusingStirling’s approximationfor largeã,

we canobtaintheasymptoticbehavior of â ç , givenby:â ç î F9ã £ ã_I åa ä å çgfç T V (1.38)

with T J S ã £ JYX S z ¸X S V ã í J X36 xx å 9¡ Z (1.39)

As we canobserve from thepreviousresult,we recover thesamefunctionalform experi-mentallyobservedin [37]. However, it is importantto noticethatfor all theparameterrangeinwhich theexponentialcut-off

ã í is well-defined,we obtaina valueof thedegreeexponent,asgivenby Eq. (1.39),that is

T£¢ X . Thesameresultholdswhenconsideringtherateequationfor thecorrelatedmodel,in which thelink additionis fully correlatedwith thenew duplicatednode[46]. This resultis unsatisfactory, becauseit doesnot correspondwith theresultsfromnumericalsimulationsof the model [46]. This discrepancy is explainedby the fact that theu ¥¤

solutionpresentedhasonly meaningfor ¿ X5à z (seeEq. (1.36)). Yet the masterequationwas definedon the basisof an independent-event approximationthat only makessensefor §¦ X . The masterequationitself shouldbecomevalid for x , but thentheconvergenceresultsassumedat u ¨¤

seemquestionable,asindicatedby thefact thatwegetananalytic,but negative, ß .

Thereis, however, somethingqualitative still to be learnedfrom theseequations,in theneighborhoodof î X Ã z , small

¸. This is a neighborhoodwheretheconvergenceresultsat

large u still givesensibleanswers,evenif they arenot quantitatively correctdueto marginalapproximationsin the underlyingmasterequation. Yet at the sametime, sincethis is thesmallestvalueof wherewecangetanswers,it is theonewherethemasterequationwehaveconstructedis likely to bethebestapproximationto themuchmorecomplicatedtrueequation(onewith frequentcoupledevents).

Page 21: Complex Networks in Genomics and Proteomics...4 1 Complex Networks in Genomics and Proteomics Figure 1.1: The domain of molecular interaction among p53 and MDM2 is shown in this 3D

20 1 Complex Networksin GenomicsandProteomics

0.50© 0.52© 0.54© 0.56© 0.58©δª1.0

1.5

2.0

2.5

3.0γ«

a)¬10­ 0

10­ 110­ 2

k®10

−5

10−4

10−3

10−2

10−1

100

P(k

)

b)¯

Figure1.9: a)DegreeexponentÓ asafunctionof thedeletionrate Ð from computersimulationsof theproteomemodelwith averageconnectivity ÑLÒDÛ Ý . Network size ° ÑrÒ²± ÔÙ_³ . Thedegreedistributionsis averagedover ÔÙÙ,Ù differentnetwork realizations.b) Degreedistributionfor thesamemodel, Ð ÑaÙAÛ Ý_XÒ , averagedover ÔÙÙÙ,Ù differentnetwork realizations.Thedistri-bution canbefitted to the form @BAC9D-F´AC I K§CED MEN O M/PRSPUT , with anexponentӳѩÒDÛ Ýwµ ÙDÛÜÔanda sharpcut-off C Y V£¶$· .

1.5.3 Numerical simulations

Theproteomemodeldefinedin section1.5 dependseffectively on two independentparame-ters: the averageconnectivity of the network ß andthe deletionrateof newly createdlinks , beingtheadditionrate

¸computedfrom Eq. (1.23). Theaverageconnectivity canbeesti-

matedfrom theexperimentalresultsfrom realproteomemaps.Thedataanalyzedin Ref.[37]yields a value ß e z Z ^x . As a safeestimate,onecan imposethe value ß J z Z h [46], andconsidervalues ¿ X Ã z , in accordancewith Eq. (1.23). In spiteof the drawbacksof theanalyticalstudyof the model,section1.5.2,oneshouldexpectthe modelto yield, for eachvalueof , the functionalform Eq. (1.38)for thedegreedistribution,with a degreeexponentT

which is a functionof (for a fixedaverageconnectivity ß J z Z h ). Fromnumericalsim-ulationsof the modelonecanthencomputethe function

T F I andselectthe valueof thatyieldsa degreeexponentin agreementwith theexperimentalobservations.Fig 1.9(a)showsvaluesof

Testimatedfrom thefunctionalform (1.38)for thedegreedistributionobtainedfrom

computersimulationsof model,averagingover X,xAxx network of size u J zñï X,x ` nodes,ofthesameorderof thosefoundin themapsanalyzedin Ref.[37]. Apart form aconcaveregionfor very closeto X Ã z , T is an increasingfunctionof . The valueof yielding thedegreeexponentclosestto theexperimentallyobservedoneis thenÆJªx Z h i z Z (1.40)

In Figure1.10(a)we show thetopologyof thegiantcomponentof a typical realizationofthenetwork modelof size u J z×ï X,x ` . This Figureclearlyresemblesthegiantcomponentof a realyeastnetworks,aswecanseecomparingwith Figure1.10(b)3; wecanappreciatethe

3Figurekindly providedby W. Basalaj(seehttp://www.cl.cam.uk/ j wb204/GD99/#Mewes).

Page 22: Complex Networks in Genomics and Proteomics...4 1 Complex Networks in Genomics and Proteomics Figure 1.1: The domain of molecular interaction among p53 and MDM2 is shown in this 3D

1.5 Scale-freeproteomes:geneduplicationmodels 21

a)

0

3

1

2

4

810

37

70

75

8082

94

128

138

164

186

200

212

258

306

317

341

400

428

461

489

562

19

88

163

192362

146

439

5

6

13

23

43

73

139

218

246

254

278

337

411

419433

482

514

549

616

634

16

40

42

98

194335

445

456

519

129

154

172

191

249

408

427

48850

441

322

378

97

385

507

618

345

267

157

227

259

268

415 555

595

603

610

81

326

69

381

79

31

584

135

168

211

344

399

572

518

499

491

339

581

607

624

57

224

318

304

413

448

377

405

537

27

580

612

29

44

87

109

175

449

530

565

583

627

33

202

250 293

404

457

39

41

49

89

162

357

512

520

65

291

340

458

516

414

594

373

601

325

474

559

216

239

316

336

475

547

68

394

161

169

613

546

638

171

14 17

78

93

156

237

386

460

522

541

622 632

74

120

166

476

593

608

110

422

108

56

111

240

91

131

170

187

220

264

284

592

600

334

3729

375

379

589

106

115

182

234

275

329

571

390

615

20

197

7185

213

352

470

147

331

214

596

294

479

236

241

12

606

483

11

21

215

247

468

34

48

86

158

173

207

269

279

371

417

436

465

469

282

473

540

637

513

28

32

53

54

67

141

438

550

85

266

47

346

370

184

217

343

504

15

22

92

253

263

30

179

307

410

590

144

203

321

355

374

477

26

66

123

209

384

437

556

198

140

393

429

453

493

531

450

155

165

167

195

440

579

578

543

18

145

36

46

500

528

274

302

623

629

71

310

466

324

148

312

347

424

366

464

330

369

391

554

153201

262

271

295

535

24

90

114

298

59

101

505

102

112

136

221

561

116

406

523 566

183

447

25

76

189

382

515

61

498

597

320

490

60

72 280

327

333

380103

105

133

151

231

288

296

536

548

585

104

51

228

257

277

558

633

188

454

62

311

35

176

219

308

480

568

494

38

557

487

233

260

265

412

511

544

630

602

204

338

532

45

64

290

455

222

368

376

426

444

620

332

117

150

180

242

303

495

564

83

285

365

174

63

126

130

272

52

134

243

292

420

432

506

551

229

107

210

611

235

525

55

588

631

95

122

251

538

573

388

524

132

492

58

283

305

281

527

570

287

462

398

244

353

423

496

625

350

435

232

323

617

425

127

190

364

177

599

348

472

113

421

569

270

409

149

539

567

315

77

196

387

199

223

226

314

361

401

485

526

301

256

205

252

395

363

481

463

84

443

328

418

349

501

96

534

99

640

575

100

517

553248

367

402

299

503

193

586

124

206

621

118

392

407

434

471

119

137

225

451

452

619

508

121

497

545

576

609

178

181

300

509

125

605

356

533

574

358

552

521

238

442

142

342

478

582

626

143

152

403

261

286

159

160

208

430

389

245

446

484

351

563

273

383

309

598

510

354

467

591

230

560

416

431

587

459

636

255

289

297

542

397

396639

319

628

313

276

604

360

359

502

614

577

529

486

635

b)

Figure1.10: a)Topologyof thegiantcomponentof themapobtainedwith theproteomemodelwith parametersCE¹]Ñ ÒDÛ Ý and Ð Ñ ÙDÛ Ý_X Ý . Network size ° ѨÒB± ÔÙ ³ . b) Topologyof a realyeastproteomemapobtainedfrom theMIPSdatabase[43].

presenceof a few highly connectedhubsplusmany nodeswith a relatively smallnumberofconnections,in closeresemblanceof therealproteomemap.On theotherhand,Figure1.9(b)shows theconnectivity F9ã_I obtainedfor networksof size u J z×ï Xx ` , averagedof X,xxAxxrealizations,for ¢JLx Z h i z . In this Figurewe observethat theresultingconnectivity distribu-tion canbefitted to a power-law with anexponentialcut-off, of theform givenby Eq. (1.38),with parameters

T J z Z h ­ x Z X andã í e | \ , in fair agreementwith themeasurementsreported

by [41] and[37].

Finally, Table1.1 reportsthe valuesof ß , T , , and obtainedfor the proteomemodel,comparedwith the valuesreportedfor the yeastS.cerevisiaeby Ref. [41], thosecalculatedfor the map usedin Ref. [37], and thosecorrespondingto an ER randomgraphwith sizeandaverageconnectivity comparablewith both the modelandthe real data. All the magni-tudesdisplayedby themodelcomparequitewell with thevaluesmeasuredfor theyeast,andrepresenta furtherconfirmationof the SW conjecturefor the proteinnetworksadvancedby[41].

Table 1.1: Comparisonbetweenthe observed regularitiesin the yeastproteomereportedbyWagner[41], thosecalculatedfrom the proteomemap usedby Jeonget al. [37], the modelpredictionswith ° ѽÒÙÙ,Ù , Ð ÑrÙDÛ Ý1X Ò and ñѽÒDÛ ÝÙ , anda ER network with thesamesizeandconnectivity asthemodel.

Wagner’sdata Jeong’sdata Proteomemodel ER modelß X Z c | z Z ^èx z Z ^ ­®x Z i z Z hDx ­§x Z x9hT z Z h z Z ^ z Z h­®x Z X — z Z z ï X,x å \ Z X ï Xx å X Z x ï X,x å X ï X,x åp` \ Z X_^ i#Z c#X h Z h­®x Z \ c Z x ­§x Z z

Page 23: Complex Networks in Genomics and Proteomics...4 1 Complex Networks in Genomics and Proteomics Figure 1.1: The domain of molecular interaction among p53 and MDM2 is shown in this 3D

22 1 Complex Networksin GenomicsandProteomics

1.6 Discussion

Simplemodelsof complex biological interactionshavebeenusedthroughthelastdecadesaspowerful metaphorsof naturalcomplexity. Networkspervadebiologyandthereis little doubtthat the untanglingbiological complexity demandsa considerabledegreeof simplification.This view workswell whengenericmechanismsareat work: percolationcloseto criticialityin randomgraphswouldbeaperfectexamplein thiscontext. Sinceinformationtransfer(net-work communication)is akey propertyto all biosystems,reachingathresholdin connectivityallows informationto propagatein a veryeffectivewayundera low wiring cost.

Similar principlesmight beoperatingin technologygraphs[28, 54] andthestriking simi-laritiesbetweenman-madenetworks(suchaselectroniccircuitsor softwaregraphs)andnat-ural webssuggeststhatanorganizingprincipleinvolving optimalcommunicationmightbeatwork in both typesof systems.This seemsa reasonablepossibility, sincethe costof wiringis animportantconstraintin bothcases.For technologygraphs,however, randomfailuretipi-cally leadsto collapseandthusthereis norobustnessassociatedto thescale-freearchitecture.Biologicalsystemsmighthavetakenadvantageof theSFpatternsthatarisefrom optimizationof pathlengthunderlow cost[55] andmakeuseof thesourceof robustnessfor freethatmightbegenerated.

As it occurswith many otheraspectsof biologicalcomplexity, historicconstraintsplayanimportantrole in shapingnetwork topology. Not surprisingly, someof theoldestactorsin themetabolicsceneseemto be highly connected,thussuggestinga leadingrole of preferentialattachment[26] at leastat early stagesof the evolution of metabolism. But the proteomemapis avery largewebincorporatinga largeamountof plasticitythatmighthavebeentunedthroughevolution in order to reachoptimaly wired pathways. Futureresearchwill providea new perspective on how biological netsget organizedthroughevolution andwhat arethecontributionsof emergence,selection,andtinkeringto network biocomplexity.

Acknowledgements

We thankthe membersof the Complex SystemsLab for usefuldiscussions.This work hasbeenpartially supportedby theEuropeanNetwork ContractNo. ERBFMRXCT980183,theEuropeanCommission- FetOpenprojectCOSINIST-2001-33555,agrantPB97-0693andbytheSantaFeInstitute(R. V. S.). R.P.-S.acknowledgesfinancialsupportfrom theMinisteriodeCienciay Tecnologıa(Spain).

References

[1] D. Bray. Proteinmoleculesascomputationalelementsin living cells. Nature 376, 307–312(1995).

[2] L. H. Hartwell,J.J.Hopfield,S.Leibler, andA. W. Murray. Frommolecularto modularcell biology. Nature402, C47–C52(1999).

[3] P. Ross-Macdonald,P. S. R. Coelho,T. Roemer, S. Agarwal, A. Kumar, R. Jansen,K. H. Cheung,A. Sheehan,D. Symoniatis,L. Umansky, M. Heldtman,F. K. Nelson,

Page 24: Complex Networks in Genomics and Proteomics...4 1 Complex Networks in Genomics and Proteomics Figure 1.1: The domain of molecular interaction among p53 and MDM2 is shown in this 3D

References 23

H. Iwasaki,K. Hager, M. Gerstein,P. Miller, G. S.Roeder, andM. Snyder. Large-scaleanalysisof the yeastgenomeby transposontaggingandgenedisruption. Nature 402,413–418(1999).

[4] A. Wagner. Robustnessagainstmutationsin geneticnetworksof yeast. Nature Genet.24, 355–361(2000).

[5] P. H. Kussie,S. Gorina, V. Marechal,B. Elenbaas,J. Moreau,A. J. Levine and N.P. Pavletich. Structureof the MDM2 OncoproteinBoundto the p53 Tumor Suppres-sorTransactivationDomain.Science274, 948–953(1996).

[6] B. Vogelstein,D. Lane,andA. J.Levine. Surfingthep53network. Nature408, 307–310(2000).

[7] S. Jin. Identification and characterizationof a p53 homologue in Drosophilamelanogaster. Proc.Natl. Acad.Sci.USA97, 7301–7306(2000).

[8] K. W. Kohn Molecularinteractionmapof themammaliancell cylce controlandDNArepairsystems.Mol. Biol. Cell 10, 2703-2734(1999).

[9] R. J.Williams ansN. D. Martinez. Simplerulesyield complex food webs.Nature 404,180–183(2000).

[10] H. Lodish,A. Berk, S. L. Zipursky, andP. Matsudaira,MolecularCell Biology, (W. H.Freeman,New York, 2000).4th edition.

[11] W. J. Gehring,MasterControl Genesin DevelopmentandEvolution, (YaleUniversityPress,New Haven,1998).

[12] J,Hasty, D. McMillen, F. IsaacsandJ.J.Collins Computationalstudiesof generegula-tory networks: in numeromolecularbiology. NatureReviewsGenet.2, 268–279(2001).

[13] P. Smolen,D. A. BaxterandJ. H. Byrne Mathematicalmodelingof genenetworks.Neuron 26, 567–580(2000).

[14] B. Goodwin,Temporal organizationin cells, (AcademicPress,New York, 1963).[15] J. E. Lewis andL. Glass Steadystates,limit cyclesandchaosin modelsof complex

biologicalnetworks. Int. J. Bif. Chaos1, 477–483(1991).[16] S.A. Kauffman,Origins of Order, (OxfordUniversityPress,New York, 1993).[17] S. B. Carroll. Chanceandnecessity:the evolution of morphologicalcomplexity and

diversity. Nature 409, 1102–1109(2000).[18] M. LaurentandN. Kellershohn Multistability: a major meansof differentiationand

evolution in biologicalsystems.TrendsBiochem.Sci.24, 418–422(1999).[19] M. Ptashne,A GeneticSwitch (Blackwell,Cambridge,1992).[20] I. Salazar, J.Garcia-FernandezandR. V. Sole Genenetworkscapableof patternforma-

tion: from inductionto reaction-diffusion.J. Theor. Biol. 205, 587–603(2000).[21] R. V. Sole, I. SalazarandJ.Garcia-FernandezCommonPatternFormation,Modularity

andPhaseTransitionsin a GeneNetwork Model of Morphogenesis.PhysicaA 305,640-647(2002).

[22] J. M. T. ThompsonandH. B. Stewart, Nonlineardynamicsandchaos, (JohnWiley &Sons,New York, 1986).

[23] B. Bollobas,RandomGraphs, (AcademicPress,London,1985).[24] P. ErdosandP. Renyi. Ontheevolutionof randomgraphs.Publ.Math.Inst.Hung. Acad.

Sci.5, 17–60(1960).

Page 25: Complex Networks in Genomics and Proteomics...4 1 Complex Networks in Genomics and Proteomics Figure 1.1: The domain of molecular interaction among p53 and MDM2 is shown in this 3D

24 1 Complex Networksin GenomicsandProteomics

[25] L. A. N. Amaral,A. Scala,M. Barthelemy, andH. E. Stanley. Classesof small-worldnetworks. Proc.Natl. Acad.Sci.USA97, 11149–11152(2000).

[26] R. Albert andA.-L. Barabasi. Statisticalmechanicsof complex networks. Rev. Mod.Phys.74, 47–97(2002).

[27] R. A. Albert, H. Jeong,and A.-L. Barabasi. Error and attack toleranceof complexnetworks. Nature406, 378–382(2000).

[28] R. Ferreri Cancho,C. Janssen,andR. V. Sole. The topologyof technologygraphs:smallworld patternin electroniccircuits. Phys.Rev. E 63, 32767(2001).

[29] R. Pastor-Satorras,A. Vazquez,andA. Vespignani.Dynamicalandcorrelationproper-tiesof theinternet.Phys.Rev. Lett.87, 258701(2001).

[30] D. J.WattsandS.H. Strogatz.Collective dynamicsof ‘small-world’ networks. Nature393, 440–442(1998).

[31] D. Fell andA. Wagner. Thesmallworld of metabolism.NatureBiotech.18, 1121(2000).[32] H. Jeong,B. Tombor, R. Albert, Z. N.Oltvai, andA.-L. Barabasi.Thelarge-scaleorga-

nizationof metabolicnetworks. Nature407, 651–654(2001).[33] J. Podani,Z. Oltvai, H. Jeong,B. Tombor, A.-L. Barabasi,andE. Szathmary. Compa-

rablesystem-level organizationof ArchaeaandEukaryotes.Nature Genetics29, 54–56(2001).

[34] J. M. Montoya andR. V. Sole. Smallworld pattersin fodd webs. J. Theor. Biol. 214,405–412(2002).

[35] R. J. Williams, N. D. Martinez,E. L. Berlow, J. A. Dunne,andA.-L. Barabasi. Twodegreesof separationin complex food webs,(2001). SantaFe working paper01-07-036.

[36] R. Ferreri Cancho,C. Janssen,andR. V. Sole. The small world of humanlanguage.Procs.Roy. Soc.LondonB 268, 2261–2266(2001).

[37] H. Jeong,S.Mason,A. L. Barabasi,andZ. N. Oltvai. Lethalityandcentralityin proteinnetworks. Nature411, 41 (2001).

[38] S. Maslov andK. Sneppen Specificityandstability in topologyof proteinnetworks.Science296, 910-913(2002).

[39] D. J.Watts,SmallWorlds, (PrincetonUniversityPress,Princeton,1999).[40] M. E. J.NewmanModelsof theSmallWorld. J. Stat.Phys.101, 819–841(2000).[41] A. Wagner. The yeastprotein interactionnetwork evolves rapidly and containsfew

redundantduplicategenes.Mol. Biol. Evol.18, 1283–1292(2001).[42] D. Thieffry, A. M. Huerta,E. Perez-Rueda,andJ. Collado-Vives. From specificgene

regulation to genomicnetworks: a global analysisof transcriptionalregulation in es-cherichia coli. BioEssays20, 433–440(1998).

[43] H. W. Mewes,K. Heumann,A. Kaps,K. Mayer, F. Pfeiffer, S.Stocker, andD. Frishman.Mips: a databasefor genomesand proteinsequences.Nucleic Acids Res., 27,44–48(1999).

[44] A. Vazquez,A. Flammini,A. Maritan,andA. Vespignani.Modellingof proteininterac-tion networks,(2001).cond-mat/0108043.

[45] R. V. Sole,R. Pastor-Satorras,E. Smith,andT. Kepler. A modelof large-scaleproteomeevolution. Adv. Complex. Syst.5, 43–54(2002).

Page 26: Complex Networks in Genomics and Proteomics...4 1 Complex Networks in Genomics and Proteomics Figure 1.1: The domain of molecular interaction among p53 and MDM2 is shown in this 3D

References 25

[46] R. Pastor-Satorras,E. Smith, and R. V. Sole. Evolving protein interactionnetworksthroughgeneduplication,(2002).SantaFeworkingpaper02-02-008.

[47] S.Ohno,Evolutionbygeneduplication, (Springer, Berlin, 1970).[48] K. H. Wolfe andD. C. Shields. Molecularevidencefor an ancientduplicationof the

entireyeastgenome.Nature387, 708–713(1997).[49] A. Wagner. Evolution of genenetworks by geneduplications:A mathematicalmodel

andits implicationsongenomeorganization.Proc.Natl. Acad.Sci.USA91, 4387–4391(1994).

[50] S.BornholdtandK. Sneppen.Robustnessasanevolutionaryprinciple. Proc.Roy. Soc.Lond.B 267, 2281–2286(2000).

[51] A. Wagner. Estimatingcoarsegenenetwork structurefrom large-scalegeneperturbationdata,(2001).SantaFeworkingpaper01-09-051.

[52] P. L. Krapivsky, S. Redner, andF. Leyvraz. Connectivity of growing randomnetworks.Phys.Rev. Lett.85, 4629(2000).

[53] C. W. Gardiner, Handbookof stochasticmethods, (Springer, Berlin, 1985).2ndedition.[54] S.Valverde,R. FerrerCanchoandR. V. Sole. Scale-freenetworksfrom optimaldesign.

SantaFeworking paper02-04-019.[55] R. FerrerCanchoandR. V. Sole. Optmizationin Complex Networks. SantaFeworking

paper01-11-068.