computational functional genomics

44
1 Computational functional genomics Lital Haham Sivan Pearl

Upload: duyen

Post on 08-Feb-2016

29 views

Category:

Documents


0 download

DESCRIPTION

Computational functional genomics. Lital Haham Sivan Pearl. Introduction. Piles of information but only flakes of knowledge. The existing information:. Collections of genomic sequences. Expression profiles Protein-protein interactions And many more…. Introduction. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Computational functional genomics

1

Computational functional genomics

Lital HahamSivan Pearl

Page 2: Computational functional genomics

2

Introduction

• Piles of information but only flakes of knowledge.

• The existing information:Collections of genomic sequences. Expression profilesProtein-protein interactions And many more…

Page 3: Computational functional genomics

3

Introduction

• Computational biology strives to extract the maximal possible information from known sequences, by classifying them according to their homologous relationships, predicting their biochemical activity, cellular function, 3-dimensional structures and evolutionary origin.

Page 4: Computational functional genomics

4

The COG-Clusters of Orthologous Groups of proteins

• Identification of orthologs is critical for reliable prediction of gene function in newly sequenced genomes.

• Reflects one-to-one, one-to-many and many-to-many relationships.

• The purpose of COG is to serve as a platform for functional annotation of newly sequenced genomes and for study of genome evolution.

Page 5: Computational functional genomics

5

The COG-statistics

• In 2003, there are 3307 COGs including 74059 proteins from 43 genomes.

• Genomes from- Bacteria, Archaea and Eukaryota.

• The database includes 17 functional groups.

Page 6: Computational functional genomics

6

The COG- make on your own

• COG construction procedure is based on the notion that any group of at least 3 proteins from distant genomes that are more similar to each other than to any other protein from the same genomes, are most likely to belong to an orthologous family.

Page 7: Computational functional genomics

7

The COG- make on your own

All-against-all protein sequence comparison

Detect and collapse paralogs

Detect triangles of mutually genome specific best hits

Merge triangles with a common side, to form COG

Page 8: Computational functional genomics

8

The COG- make on your own

Page 9: Computational functional genomics

9

The COG- adding new genomes

• The COGNITOR program adds new proteins to pre-existing COGs on the basis of multiple Best Hits.

• 60-80% of the proteins of prokaryotes could be included.

Page 10: Computational functional genomics

10

The COG- more applications:

• Detecting missed genes.

• Convenient for variety of evolutionary-oriented analyses of protein families.

Page 11: Computational functional genomics

11

Methods

• Experimental method:Biochemical and genetic experiments

• Computational methods:Homology method (BLAST), mRNA expression

Phylogenetic profile

Fusion method (Rosetta stone analysis)

Gene neighbour method

Page 12: Computational functional genomics

12

Homology method

• Homology method: searches proteins whose AA sequences are similar.

• 40-70% of new genome can be assigned to some function.

• Involve identification of some molecular function.

Page 13: Computational functional genomics

13

mRNA expression

• Analysis of correlated mRNA expression levels enables to establish functional linkages, by detecting changes in mRNA expression in different cell types, or different environments.

Page 14: Computational functional genomics

14

Phylogenetic profile

• Describes the pattern of presence or absence of a particular protein, across a set of organisms.

• Number of possible profiles:

910~2n

• This number far exceeds the protein families.

Page 15: Computational functional genomics

15

Phylogenetic profile

• Why would two proteins always both be inherited into new species or neither inherited, unless the two function together?

• If two proteins have the same phylogenetic profile, it is inferred that they have a functional link: engaged in a common pathway or complex.

Page 16: Computational functional genomics

16

Phylogenetic profile

1 1 1

Page 17: Computational functional genomics

17

Phylogenetic profile- example • Analysis of three proteins: RL7, FlgL and His5,

according to their phylogenetic profiles.

• RL7: more than half have function associated with the ribosome.

• FlgL: more than half include various flagellar

proteins and cell-wall maintenance proteins.

• His5: more than half involved in amino acid metabolism.

Page 18: Computational functional genomics

18

Phylogenetic profile- example

RL7 ribosome L7RL15 ribosome L15RL17 ribosome L17PTH peptidyl-tRNA hydrolaseRNC ribonuclease III

PgsA phospholipid synthesis

YGGH hypotheticalYBEX hypotheticalRL34 ribosome L34RL36 ribosome L36RL27 ribosome L27RL25 ribosome L25

YQCB hypotheticalYABO hypotheticalYCEC hypotheticalRFH peptide release factorClpB geat shock proteinYJFH hypothethocal

RS14 ribosome S14

G3P3 dehydrogenase

RL4 ribosome L4

NONE hypothtical

GrpE co-chaperone

GidB glucose inhib. DivisionRL24 ribosome L24DEF polypeptide deformylaseRL20 ribosome L20MesJ cell cycle proteinRL19 ribosome L19RL21 ribosome L21RL9 ribosome L9SmpB small protein B

Page 19: Computational functional genomics

19

Phylogenetic profile

Keyword No. proteins

No. neighbors

in keyword group

No. neighbors in random

groupRibosome 60 197 27

Transcription 36 17 10

tRNA synthase and ligase 26 11 5

Membrane proteins 25 89 5

Flagellar 21 89 3

Iron, ferric, and ferritin 19 31 2

Galactose metabolism 18 31 2

Molybdoterin and Molybdenum, and molybdoterin 12 6 1

Hypothetical 1084 108226 8440

Phylogenetic profiles link protein with similar keywords

Page 20: Computational functional genomics

20

Fusion method or the Rosetta stone analysis

• Some pairs of interacting proteins have homologs in another organism, fused into a single protein chain.

• When two separate proteins in one organism, A and B, are expressed as a fused protein in some other species, there is a high probability that A and B are linked in function.

Page 21: Computational functional genomics

21

Fusion method

Page 22: Computational functional genomics

22

The Rosetta Stone model

Page 23: Computational functional genomics

23

Fusion method –what is it good for?

• Predicts protein pairs that have related biological functions.

• Predicts potential protein-protein interactions.

• Can turn up complexes of proteins, or protein pathways.

Page 24: Computational functional genomics

24

Fusion method –what is it good for?

Page 25: Computational functional genomics

25

Fusion method

• The group searched the 4290 protein sequences of the E.coli genome.

• The proteins could form at most (4290)(4289)/2 pair interactions. But we expect much less…

• There were found 6809 candidate for pair interactions.

Page 26: Computational functional genomics

26

Fusion method –validation

• Looking for a similar function in existing annotations that would imply at least functional interaction.

• Of the E.coli pairs that were found in the Rosetta Stone analysis, 68% share at least one keyword in their annotations, whereas from E.coli proteins that were selected randomly, only 15% share a keyword.

Page 27: Computational functional genomics

27

Fusion method –validation

• From a database containing protein pairs that have been found to interact (experimentally) – 6.4% are linked by Rosetta Stone sequences.

• The phylogenetic profile method was applied to the interactions predicted by the fusion method. It found more than 8 times as many interactions suggested by the phylogenetic profile method, as for randomly chosen sets of interactions.

Page 28: Computational functional genomics

28

Fusion method –missing pairs

• False negatives:

There was no fusion of the interacting proteins.

The fused protein disappeared during the course of evolution.

Page 29: Computational functional genomics

29

Fusion method –False alarms

• False positives:

False prediction of physical interactions when the proteins are fused, but are co-regulated and don’t interact.

Cannot distinguish between homologs that bind and those that do not.

Page 30: Computational functional genomics

30

Fusion method –False alarms

• The false positive rate in E.coli due to the inability to distinguish homologs is about 82%.

• To reduce these errors: the “promiscuous” domains were found and removed during the analysis.

• By filtering of only 5% of all domains, we can remove the majority of falsely predicted interactions.

Page 31: Computational functional genomics

31

Fusion method –False alarms

Page 32: Computational functional genomics

32

Neighbour method

• Functional links between genes can be identified by examining whether the proximity of the genes is conserved across multiple genomes.

• Powerful in uncovering functional linkages in prokaryotes where operons are common.

Page 33: Computational functional genomics

33

Neighbour method

Page 34: Computational functional genomics

34

Neighbour method- definitions

• ‘close’: proximate genes are on the same strand within 300 bp, and transcribed in the same direction.

• Direct link: two proximate genes that are also proximate in at least two other genomes of different phylogenetic groups.

• Inferred link: two genes that are not close but with orthologs that are close in at least three other genomes of different phylogenetic groups.

Page 35: Computational functional genomics

35

Neighbour method- defenitions

Page 36: Computational functional genomics

36

Neighbour method

• Proximity between genes is maintained mostly because it facilitates their co-transfer to another organism.

• Example: restriction-modification systems.

Page 37: Computational functional genomics

37

Neighbour method- validation

• Identification of links that are annotated in KEGG or COG – and calculate the fraction of those in the same functional pathway / category.

• The functional correspondence is correlated to the minimal number of phylogenetic groups, in which the proximity is detected.

Page 38: Computational functional genomics

38

Neighbour method- validation

N tradeoff

Page 39: Computational functional genomics

39

Neighbour method- example

Page 40: Computational functional genomics

40

Happy end???

• The group analyzed the 6,217 proteins of the yeast Saccharomyces combining several methods.

• one can expect each protein to be functionally linked to perhaps 5–50 other proteins, giving 30,000–300,000 biologically meaningful links.

Page 41: Computational functional genomics

41

Happy end???

Page 42: Computational functional genomics

42

Networks

• When methods of detecting functional linkages are applied to all the proteins of an organism, network of interacting, functionally linked proteins can be traced.

• As methods improve for detecting protein linkages, it seems likely that most of the proteins will be included in the network.

Page 43: Computational functional genomics

43

Networks

Page 44: Computational functional genomics

44

פורים שמח