how are navigation in networks and splicing in parasites related? shai carmi bar-ilan university...

How are navigation in networks and splicing in parasites related? Shai Carmi Bar-Ilan University Department of physics and the faculty of life sciences Summer 2010, USA Navigation in networks with local information Navigation is important in communication networks, transportation networks, and social networks. Knowledge of the entire network is usually not feasible. Use greedy navigation. The Internet at the Autonomous Systems level Carmi et. al, PNAS 104, (2007) Boguna & Krioukov, PRL 102, (2009) MapQuest Scale-free networks Nomenclature: In a network (graph), links (edges) connect nodes (vertices). The degree of a node, k, is its number of links. In the last decade, measurements showed that almost all natural networks are scale-free. Nodes in scale-free networks have degrees in all orders of magnitude, including nodes with an extremely large number of links (hubs). Degree distribution: Small : network is highly heterogeneous, many hubs exist. Large : network is homogeneous, fewer hubs, similar to purely random networks. Navigation models 1.Navigating to the hub. S. Carmi., P. L. Krapivsky, and D. ben-Avraham, Physical Review E 78, (2008). 2.Kleinbergs navigation model. S. Carmi, S. Carter, J. Sun, and D. ben-Avraham. Physical Review Letters 102, (2009). How to find the most connected node in the network: an algorithm Start from a given node. Go to the neighbor with highest degree (break ties arbitrarily). Keep going, until reaching a peak- a node whose degree is greater than the degrees of all of its neighbors. Only knowledge of the neighbors degree is required! Basins of attraction are formed around each hub. a a b c d e f g h i j k l m n o p Example Courtesy of Hernan Rozenfeld Who cares? Practical interest: Fast message routing to the most connected node (for example, wireless sensor networks). Theoretical interest: - A new decomposition procedure based on association to hubs. - Number and sizes of basins can be used to characterize networks. Rao et al., JMB 2004 Basins distribution in scale-free networks The giant basin How does the basins topology depend on the degree exponent (P(k)~k - )? For c 3 the largest hub attracts all nodes, forming a giant basin. For larger the network is fragmented to numerous basins whose size distribution decays as a power-law. Mathematically: The size of the largest basin scales as S~N , where =1 for c and 1/(-1) for large . The probability of a node to belong to a basin of size s is Q(s)~s - for small s with -1. Theory- a transition at =3 We prove that the probability of a node of degree k to be a peak is approximately exp[-Ak 3- ], where A is a k-independent constant. For 3, many nodes with large degree will be peaks. For large , we prove that the size of the largest basin scales as S~N 1/(-1). The first two moments of the number of basins and the number of solitary basins can be approximated analytically. Deterministic fractal (u,v)-nets The behavior Q(s)~s - can be explained using a fractal scale-free network model. Each link in generation n splits into u+v=w links in generation n+1. A short summary Greedy search for the most connected node partitions the network into basins of attraction. For scale-free networks with 3, there are many basins (corresponding to the network modules). The transition at =3 and the power-law distribution of small basin sizes can be analytically explained. The Internet and the glass network have a giant basin. Generalization to lattices All degrees are equal. Nodes importance is determined by height or energy. Assume each node is attracted to its shortest neighbor. Basins of attraction have simple physical interpretation. valley peak saddle A fun exercise in probability The number of valleys R(s): the probability of a node to be the valley of a basin of size s. In 1D, R(1)=1/30, and R(s) decays as 1/s!, much faster than the power-law for networks. In 2D, the density of peaks and valleys is 1/5, of saddles 1/15. R(1)=109/4290. Density of craters is 3/715. Density of ridges is 1/20. The navigation problem and the Kleinberg model We know short paths exist in social networks (six degrees of separation). But how do people find them? The Kleinberg model ( Nature 406, 845 (2000) ). * Underlying lattice; one long-range link for each node; long range link has length r with probability ~r --. * Greedy navigation: message is always sent to the neighbor geographically nearest to the destination. Kleinberg proved (T- delivery time; d- dimension; L- lattice linear size) - For =d, T ln 2 L. - For d, T L x for some exponent x. For =d, greedy navigation can find short paths! Accurate expression for the delivery time- an open problem for 9 years. We prove We also show that short paths can be found for d if messages can be lost. A sharp transition Trypanosoma brucei Parasitic eukaryotes that diverged million years ago. Pathogens of the African Sleeping Sickness (30,000 deaths per year, best treatment is from 1916). Transfer from the gut of the Tsetse fly to the bloodstream of humans and cattle. Unique biology: - Kinetoplast - RNA editing with gRNA - Antigenic variation - trans-splicing From Mark Fields lab website IAEA mRNA processing T. brucei genes have no promoters. Gene expression is regulated by controlling mRNA stability and translation. Gene1 Gene2 Gene3 Gene4 Polycistronic Transcript AAAA SL Itai Dov Tkacz Trans-Splicing= And Polyadenylation= Splicing overview SL- Spliced Leader RNA See also: Liang et. al, Euk. Cell (2003). Open questions Where are the splice sites? Is there alternative trans-splicing? Mapping transcript boundaries: a deep-sequencing approach Total RNA from insect-form Poly(A) + RNA selectionTerminator exonuclease treatment First strand cDNA synthesis with random hexamer or oligo(dT) primers First strand cDNA synthesis with random hexamer primers Second strand cDNA synthesis with RNaseH-derived RNA primers Second strand cDNA synthesis with SL primer cDNA fragmentation and size selection Addition of adapters and amplification Illumina sequencing ~30 million useful reads! N. G. Kolev, J. B. Franklin, S. Carmi., H. Shi, S. Michaeli, and C. Tschudi, PLoS Pathogens (in press). Data analysis results 532 transcripts with misannotated start codon. 898 annotated genes not producing a transcript. 1,114 new transcripts, including conserved coding and non-coding. 394 genes with non-coding transcripts in their 3UTR. Trans-splicing and polyadenylation of snoRNA clusters. Transcription initiation sites of the polycistronic units. Digital gene expression. Splice-site composition PPT No signal observed in the exon, except for small purine excess. No G at -3 Non AG splice-sites due to sequencing errors and strain differences. Pyrimidine peak at about -25, distance from AG varies: unique to trypanosomes. PolyPyrimidine Tract The 3-splice site 5UTR ORF Human Splice site composition Median- 43ntsMedian- 18nts Define the PPT as the longest stretch of pyrimidines (separated by no more than one purine) in the 200nts upstream of the splice site. UTRs Median- 130ntsMedian- 388nts Alternative splicing Uncertainty of splice-site usage: (Shannon entropy). Alternative splicing ATG nt position relative to START codon relative usage of trans-splice sites % Sites near the ORF are stronger. Some sites are found in frame. Alternative splicing dispersion : average distance (nts) of all weak splice sites from the strongest one. Position relative to primary splice site, nt Gene number Why alternative splicing? Usually does not create protein isoforms. Noise? Regulatory role? - Affinity of splice sites could depend on environmental conditions. - Different 5UTRs can carry sequences that determine the fate of the mRNA. Future studies will find out whether splice sites usage varies between environments, life cycles, and strains. Polyadenylation sites Median 142nts Summary Deep sequencing of Trypanosoma brucei mRNA reveals the transcriptome of the parasite at single nucleotide resolution. Hundreds of genes reannotated. Splice sites and polyadenylation sites mapped for the first time. Splice site sequence is HAG. PPT length and distance from splice site highly variable. Considerable amount of alternative splicing previously unpredicted. Polyadenylation occurs preferentially at adenosynes but location is highly irregular. Evidence for coupling of polyadenylation and trans-splicing of the downstream gene. Does splicing regulate gene expression? Splicing factor silenced Gene expression is regulated by the presence of splicing factors. What is the molecular mechanism? No significant sequence motifs. Downregulation Tb nucleobase/nucleoside transporter 8.1. Downregulated in all lines. Regulatory sequence: CAGTATCATCCCCACTTAAGGAAACTGTAAGCTTAGT CACTTCCCTCCTTTCTCTTTCTTTTTGTACGAAGGTT AAAGCCACAAGACTCTCTTACTGAACTCAGGCAAGT GAACAACACCGCACTAAACCAGAATCGCATAAGTTA CATCCACTATCCATCCACTCGGGTTTAACTGAATTGC ATCGCTGGATACCTTTCGTGTGCAATG Polypyrimidine tract (PPT) 3-splice site START codon 5-UTR C-rich PPT! Particularly short PPT-AG distance! Hypothesis Binding of splicing factors (U2AF65) to the PPT is weak because of the short distance to the AG. Binding of PTB (PPT Binding) protein to its target- the C-rich PPT is required for efficient splicing. Knockdown of U2AF65 or PTB1 decreases splicing factors affinity and splicing efficiency. Rest of intronPPTAG5UTR Rest of intronPPTAG5UTR Normal Short PPT-AG distance and C-rich PPT Experiment design promoterintron5UTRreporterAG Procyclin Tb Luciferase PPTspacer intron5UTRreporterAG TTTTTTTTTspacer promoter intron5UTRreporterAG PPTspacer Transfect constructs into U2AF65 silenced cells Expect: (1) Downregulation of luciferase activity in response to U2AF65 silencing. (2-3) Elimination of downregulation. Upregulation Tb Asparagine synthetase a, putative. Upregulated in U2AF65. Hypothesis Biochemical evidence that upregulation is due to cytoplasmatic binding of U2AF65 to the 3UTR of the mature mRNA. U2AF65 binding expected when trans-splicing occurs in the 3UTR. Possible that U2AF65 binding to 3UTR of mature mRNA responsible for downregulation of the species with the downstream polyadenylation site. ORF3UTRPPT3UTR mRNA species degraded in the presence of U2AF65 5UTRPolyA tail ORF3UTR5UTRPolyA tail Other species Experiment design promoterIntron+5UTR reporter Procyclin Luciferase 1 2 Tb UTR PA PPT promoterIntron+5UTR reporterPA Transfect constructs into U2AF65 silenced cells. Expect: (1) Upregulation of luciferase activity in response to U2AF65 silencing. (2) Elimination of upregulation. Results are expected in the upcoming few months. Summary The mapping of splice sites and polyadenylation sites by deep sequencing improves our understanding of these processes. The presence/absence of specific splicing factors regulates the expression of some genes. Regulation is likely to be related to structural features of the mRNA rather than sequence motifs. Model genes were selected for which we have conjectures about the molecular mechanism of regulation. Reporter gene assays are carried out to test these conjectures. Acknowledgements Navigation in networks Prof. Daniel ben-Avraham (Clarkson University, NY) students: Dr. Hernan Rozenfeld, Stephen Carter, Jie Sun Prof. Paul Krapivsky (Boston University) Splicing in trypanosomes Prof. Shulamit Michaeli (Bar-Ilan) students: Sachin Kumar-Gupta, Asher Pivko, Ilana Naboishchikov Prof. Elisabetta Ullu, Prof. Christian Tschudi (Yale) staff: Dr. Joseph Franklin, Dr. Nikolay Kolev, Dr. Huafang Shi Thesis advisor: Prof. Shlomo Havlin (Bar-Ilan). Funding: Adams Fellowship Program of the Israel Academy of Sciences and Humanities Thank you for your attention! My research interests Networks Modeling Flow Diffusion Percolation Disease spreading Navigation Data analysis The Internet Glass models Biology (general) Protein interaction (comp) DNA editing (comp) Trypanosomes Unfolded protein response (comp + expr) Splicing regulation (comp + expr) Mapping alternative splicing (comp) Diffusion Anomalous functionals (theory) Microscopy (biophysics) Random network models In a network, links (edges) connect computers/individuals (nodes). 1. Simplest model: a regular lattice. * Good for purely spatial, local interactions. 2. Erdos-Renyi (ER) network model (G N,p ): fully random. * Number of nodes N, probability of link p. * Narrow degree distribution (Poisson). 3. Scale-free (SF) networks: emergence of hubs. * Broad degree distribution: * Nodes with extremely high degree exist (hubs). * Found to describe most real-world systems. Basins of attraction vs. community detection The calculation of the basins of attraction provides a decomposition of the network. How does it compare with state of the art community detectors? Most community detectors use global information. More importantly, community detection and separation to basins have different goals. Consider this example: Community detectors: Maximize links within communities; minimize links between communities. Basins of attraction: Separate nodes by the hub they associate with. Not really two communities! Tie breaking What happens when the neighbor of highest degree of a node has the same degree as the node itself? In our local search, a node can be a peak even if it has neighbors of equal degree. In a recursive search, we surf over ridges of connected nodes of equal degree to reach the true hub. Less basins exist, but other results remain qualitatively the same 2D random surface example Kleinberg model simulations Our solution agrees with numerical results (navigation simulations and iteration of the master equation). Message loss probability Kleinbergs model is unrealistic: why does the network need to be fine-tuned (have =d) for greedy routing to work? The missing ingredient- message loss probability. We calculated T z (L) analytically, where z is the probability of successful completion of a single step. The system is small-world for a much wider range of ! Explains why the system need not be fine-tuned to become navigable. No message loss With message loss z=0.9, 1D Splicing machinery and sequence Yeast conserved branch site: TACTAAC mammalian Splicing regulation splicing enhancersplicing silencer SR proteins create bridges to stabilize the spliceosome hnRNP In trypanosomes: U2AF65 and 35 exist and do not interact. U2AF65 interacts with SF1. Interacting SR proteins were identified. hnRNP proteins exist. Predicting splicing heterogeneity What determines if a gene will be differentially spliced? Look at 100nts up- and down-stream the strongest site. Rank all potential splice sites: TAG-3, AAG, CAG-2, GAG-1. heterogeneity rank of a gene = sum of ranks of all other AG dinucleotides / rank of strongest site. Average heterogeneity rank about 10 for high uncertainty genes, but only about 7 for low uncertainty genes (P= ). Signatures do not look meaningful, but analysis shows that longer 5UTRs, shorter PPTs, and longer PPT-AG distance also contribute significantly to heterogeneity. Explaining abundance A-rich exons are more abundant. Other correlations: Genes with longer PPT and shorter 5UTR are more abundant. Splice-site ambiguity is anti-correlated with abundance. Dispersion Abundance

how are navigation in networks and splicing in parasites related? shai carmi bar-ilan university...

Documents