bioinformatics and evolutionary genomics high throughput “functional” data / functional genomics...

34
Bioinformatics and Evolutionary Genomics Bioinformatics and Evolutionary Genomics High throughput “functional” data / High throughput “functional” data / functional genomics / Omics functional genomics / Omics

Post on 15-Jan-2016

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bioinformatics and Evolutionary Genomics High throughput “functional” data / functional genomics / Omics

Bioinformatics and Evolutionary GenomicsBioinformatics and Evolutionary Genomics

High throughput “functional” data / functional High throughput “functional” data / functional genomics / Omics genomics / Omics

Bioinformatics and Evolutionary GenomicsBioinformatics and Evolutionary Genomics

High throughput “functional” data / functional High throughput “functional” data / functional genomics / Omics genomics / Omics

Page 2: Bioinformatics and Evolutionary Genomics High throughput “functional” data / functional genomics / Omics

High-throuhput data on gene functionHigh-throuhput data on gene functionHigh-throuhput data on gene functionHigh-throuhput data on gene function

• What do I mean: omics, microarray, chip-on-chipWhat do I mean: omics, microarray, chip-on-chip• Why are people generating these data?Why are people generating these data?

– post-genomic era / systems biology: the challenge to post-genomic era / systems biology: the challenge to understand the roles of the e.g. 6,000 gene products in understand the roles of the e.g. 6,000 gene products in yeast and yeast and how they interacthow they interact to create a eukaryotic organism. to create a eukaryotic organism.

– Because they can: apply automation also to other areas of Because they can: apply automation also to other areas of molecular biology beyond sequencingmolecular biology beyond sequencing

– To have “screens” for the research question at hand rather To have “screens” for the research question at hand rather than to have to test each guess at a timethan to have to test each guess at a time

• What about evolutionary genomics?What about evolutionary genomics?• YeastYeast• Accuracy / noiseAccuracy / noise

• What do I mean: omics, microarray, chip-on-chipWhat do I mean: omics, microarray, chip-on-chip• Why are people generating these data?Why are people generating these data?

– post-genomic era / systems biology: the challenge to post-genomic era / systems biology: the challenge to understand the roles of the e.g. 6,000 gene products in understand the roles of the e.g. 6,000 gene products in yeast and yeast and how they interacthow they interact to create a eukaryotic organism. to create a eukaryotic organism.

– Because they can: apply automation also to other areas of Because they can: apply automation also to other areas of molecular biology beyond sequencingmolecular biology beyond sequencing

– To have “screens” for the research question at hand rather To have “screens” for the research question at hand rather than to have to test each guess at a timethan to have to test each guess at a time

• What about evolutionary genomics?What about evolutionary genomics?• YeastYeast• Accuracy / noiseAccuracy / noise

Page 3: Bioinformatics and Evolutionary Genomics High throughput “functional” data / functional genomics / Omics

HTP dataHTP dataHTP dataHTP data

• What do they mean: experimental knowledge, but still What do they mean: experimental knowledge, but still what do they in terms of e.g. function?what do they in terms of e.g. function?

• A delugeA deluge

• Bioinformatics is needed for basic data handling; and Bioinformatics is needed for basic data handling; and has IMHO only scratched the surface in terms of has IMHO only scratched the surface in terms of coming up with biological questions with which we coming up with biological questions with which we can probe this datacan probe this data

• What do they mean: experimental knowledge, but still What do they mean: experimental knowledge, but still what do they in terms of e.g. function?what do they in terms of e.g. function?

• A delugeA deluge

• Bioinformatics is needed for basic data handling; and Bioinformatics is needed for basic data handling; and has IMHO only scratched the surface in terms of has IMHO only scratched the surface in terms of coming up with biological questions with which we coming up with biological questions with which we can probe this datacan probe this data

Page 4: Bioinformatics and Evolutionary Genomics High throughput “functional” data / functional genomics / Omics

Microarray Microarray datadata

Page 5: Bioinformatics and Evolutionary Genomics High throughput “functional” data / functional genomics / Omics

Microarray dataMicroarray dataMicroarray dataMicroarray data

two conditions often used for “screens”two conditions often used for “screens”

Page 6: Bioinformatics and Evolutionary Genomics High throughput “functional” data / functional genomics / Omics

(Correlated) (Correlated) mRNA mRNA

expressionexpression

(Correlated) (Correlated) mRNA mRNA

expressionexpression

• mRNA levels are mRNA levels are systematically measured systematically measured under a variety of under a variety of different cellular different cellular conditions, and genes conditions, and genes are grouped if they show are grouped if they show a similar transcriptional a similar transcriptional response to these response to these conditions. conditions.

• mRNA levels are mRNA levels are systematically measured systematically measured under a variety of under a variety of different cellular different cellular conditions, and genes conditions, and genes are grouped if they show are grouped if they show a similar transcriptional a similar transcriptional response to these response to these conditions. conditions.

Page 7: Bioinformatics and Evolutionary Genomics High throughput “functional” data / functional genomics / Omics

Profile Similarity Identifies Sterol-Pathway Disturbance Resulting from Deletion of Uncharacterized ORF YER044c (ERG28) and from Dyclonine Treatment

(A) Prominent gene clusters responding to interference with ergosterol biosynthesis,

(B) Comparison of the transcript profile of an erg28Δ strain to that of an erg3Δ strain.

(C) Sterol content of wild-type (left) and erg28Δ (right) strains.

Hughes et al. 2000Cell

Page 8: Bioinformatics and Evolutionary Genomics High throughput “functional” data / functional genomics / Omics

Conventional hierarchical clustering of co-expression data could fail, because genes can play a role in multiple cellular processes and their common regulatory element can only be detected in a subset of experiments.

detect genes that are co-expressed under a subset of conditions. a comprehensive set of overlapping ‘transcriptional modules’

Ihmels et al. 2002 Nature Genetics

Ihmels et al. 2002 Nature Genetics

Page 9: Bioinformatics and Evolutionary Genomics High throughput “functional” data / functional genomics / Omics

Citric acid cycle? Different activity under different Citric acid cycle? Different activity under different experimental conditions experimental conditions

Citric acid cycle? Different activity under different Citric acid cycle? Different activity under different experimental conditions experimental conditions

Page 10: Bioinformatics and Evolutionary Genomics High throughput “functional” data / functional genomics / Omics

Rapid divergence in expression between duplicate genes inferred Rapid divergence in expression between duplicate genes inferred from microarray & promotor datafrom microarray & promotor data

Rapid divergence in expression between duplicate genes inferred Rapid divergence in expression between duplicate genes inferred from microarray & promotor datafrom microarray & promotor data

0.1 = 3.2 My

Page 11: Bioinformatics and Evolutionary Genomics High throughput “functional” data / functional genomics / Omics

Clustering conditions Clustering conditions where the conditions are where the conditions are genes: yet another way to genes: yet another way to get to functional “links”get to functional “links”

Clustering conditions Clustering conditions where the conditions are where the conditions are genes: yet another way to genes: yet another way to get to functional “links”get to functional “links”

Page 12: Bioinformatics and Evolutionary Genomics High throughput “functional” data / functional genomics / Omics

Yeast-2-hybridYeast-2-hybrid

Pairs of proteins to be tested for Pairs of proteins to be tested for interaction are expressed as interaction are expressed as fusion proteins ('hybrids') in fusion proteins ('hybrids') in yeast: one protein is fused to a yeast: one protein is fused to a DNA-binding domain, the other DNA-binding domain, the other to a transcriptional activator to a transcriptional activator domain. Any interaction domain. Any interaction between them is detected by the between them is detected by the formation of a functional formation of a functional transcription factor.transcription factor.

Page 13: Bioinformatics and Evolutionary Genomics High throughput “functional” data / functional genomics / Omics

Examples from the original Ito publication:A autophagyB spindle pole body functionC and vesicular transport

Arrows ~ orientation of two-hybrid interaction, beginning from the bait to the prey.

Examples from the original Ito publication:A autophagyB spindle pole body functionC and vesicular transport

Arrows ~ orientation of two-hybrid interaction, beginning from the bait to the prey.

Page 14: Bioinformatics and Evolutionary Genomics High throughput “functional” data / functional genomics / Omics

Accuracy of Y2H and how to improve itAccuracy of Y2H and how to improve itAccuracy of Y2H and how to improve itAccuracy of Y2H and how to improve it

b

Page 15: Bioinformatics and Evolutionary Genomics High throughput “functional” data / functional genomics / Omics

Improving reliability using protein complexes reasoning /Improving reliability using protein complexes reasoning /internal consistencyinternal consistency

Improving reliability using protein complexes reasoning /Improving reliability using protein complexes reasoning /internal consistencyinternal consistency

Internal filtering!Internal filtering!

Page 16: Bioinformatics and Evolutionary Genomics High throughput “functional” data / functional genomics / Omics

Accuracy of Y2H and how to improve itAccuracy of Y2H and how to improve itAccuracy of Y2H and how to improve itAccuracy of Y2H and how to improve it

B

Page 17: Bioinformatics and Evolutionary Genomics High throughput “functional” data / functional genomics / Omics

Mass Mass spectrometry of spectrometry of

purified purified complexes.complexes.

Mass Mass spectrometry of spectrometry of

purified purified complexes.complexes.

• Individual proteins Individual proteins are tagged and are tagged and used as 'hooks' to used as 'hooks' to biochemically biochemically purify whole purify whole protein protein complexes. These complexes. These are then are then separated and separated and their components their components identified by mass identified by mass spectrometry. spectrometry.

• Individual proteins Individual proteins are tagged and are tagged and used as 'hooks' to used as 'hooks' to biochemically biochemically purify whole purify whole protein protein complexes. These complexes. These are then are then separated and separated and their components their components identified by mass identified by mass spectrometry. spectrometry.

Page 18: Bioinformatics and Evolutionary Genomics High throughput “functional” data / functional genomics / Omics
Page 19: Bioinformatics and Evolutionary Genomics High throughput “functional” data / functional genomics / Omics

b

Page 20: Bioinformatics and Evolutionary Genomics High throughput “functional” data / functional genomics / Omics
Page 21: Bioinformatics and Evolutionary Genomics High throughput “functional” data / functional genomics / Omics

socio-affinity indices: dotted lines, 5–10; dashed lines, 10–15; plain lines, >15. Bait proteins are shown in bold and shaded circles around groups of proteins indicate cores and modules.

Exosome Ski

Stages in mRNA degradation

Page 22: Bioinformatics and Evolutionary Genomics High throughput “functional” data / functional genomics / Omics

pdbpdb Y2HY2H

Cellular FunctionCellular Function Phylogenetic profilePhylogenetic profile

Page 23: Bioinformatics and Evolutionary Genomics High throughput “functional” data / functional genomics / Omics

Protein interactions: literature databasesProtein interactions: literature databasesProtein interactions: literature databasesProtein interactions: literature databases

• Literature derived, normally manually curated (as opposed to Literature derived, normally manually curated (as opposed to text mining)text mining)

• Biased? Biased? • No new knowledgeNo new knowledge• Useful for benchmarking & for the study of the evolution of e.g. Useful for benchmarking & for the study of the evolution of e.g.

protein complexesprotein complexes• For example: Munich Informatation center for Protein For example: Munich Informatation center for Protein

Sequences (MIPS) Sequences (MIPS) • Databases that contain literature Databases that contain literature andand omics: Database of omics: Database of

Interacting Proteins (DIP), Biomolecular INteraction Database Interacting Proteins (DIP), Biomolecular INteraction Database (BIND),(BIND),

• Literature derived, normally manually curated (as opposed to Literature derived, normally manually curated (as opposed to text mining)text mining)

• Biased? Biased? • No new knowledgeNo new knowledge• Useful for benchmarking & for the study of the evolution of e.g. Useful for benchmarking & for the study of the evolution of e.g.

protein complexesprotein complexes• For example: Munich Informatation center for Protein For example: Munich Informatation center for Protein

Sequences (MIPS) Sequences (MIPS) • Databases that contain literature Databases that contain literature andand omics: Database of omics: Database of

Interacting Proteins (DIP), Biomolecular INteraction Database Interacting Proteins (DIP), Biomolecular INteraction Database (BIND),(BIND),

Page 24: Bioinformatics and Evolutionary Genomics High throughput “functional” data / functional genomics / Omics

Systematic screening for lethality of knockouts on a rich Systematic screening for lethality of knockouts on a rich mediummedium

Systematic screening for lethality of knockouts on a rich Systematic screening for lethality of knockouts on a rich mediummedium

• The functions of many open reading frames (ORFs) identified in genome-The functions of many open reading frames (ORFs) identified in genome-sequencing projects are unknown. New, whole-genome approaches are sequencing projects are unknown. New, whole-genome approaches are required to systematically determine their function. A total of required to systematically determine their function. A total of 6925 6925 Saccharomyces cerevisiaeSaccharomyces cerevisiae strains were constructed, by a high- strains were constructed, by a high-throughput strategy, each with a precise deletion of one of 2026 ORFs Of throughput strategy, each with a precise deletion of one of 2026 ORFs Of the deleted ORFs, 17 percent were essential for viability in rich medium. the deleted ORFs, 17 percent were essential for viability in rich medium.

• The functions of many open reading frames (ORFs) identified in genome-The functions of many open reading frames (ORFs) identified in genome-sequencing projects are unknown. New, whole-genome approaches are sequencing projects are unknown. New, whole-genome approaches are required to systematically determine their function. A total of required to systematically determine their function. A total of 6925 6925 Saccharomyces cerevisiaeSaccharomyces cerevisiae strains were constructed, by a high- strains were constructed, by a high-throughput strategy, each with a precise deletion of one of 2026 ORFs Of throughput strategy, each with a precise deletion of one of 2026 ORFs Of the deleted ORFs, 17 percent were essential for viability in rich medium. the deleted ORFs, 17 percent were essential for viability in rich medium.

Winzeler et al. 1999 Science

Page 25: Bioinformatics and Evolutionary Genomics High throughput “functional” data / functional genomics / Omics

Genetic interactions (synthetic lethal/sick)Genetic interactions (synthetic lethal/sick)Genetic interactions (synthetic lethal/sick)Genetic interactions (synthetic lethal/sick)

• Two nonessential Two nonessential genes that cause genes that cause lethality when mutated lethality when mutated at the same time form at the same time form a synthetic lethal a synthetic lethal interaction. Such interaction. Such genes are often genes are often functionally associated functionally associated and their encoded and their encoded proteins may also proteins may also interact physically. interact physically.

• Two nonessential Two nonessential genes that cause genes that cause lethality when mutated lethality when mutated at the same time form at the same time form a synthetic lethal a synthetic lethal interaction. Such interaction. Such genes are often genes are often functionally associated functionally associated and their encoded and their encoded proteins may also proteins may also interact physically. interact physically.

Tong et al. 2001 Science

Page 26: Bioinformatics and Evolutionary Genomics High throughput “functional” data / functional genomics / Omics
Page 27: Bioinformatics and Evolutionary Genomics High throughput “functional” data / functional genomics / Omics

One thing we can do with synthetic lethalsOne thing we can do with synthetic lethalsOne thing we can do with synthetic lethalsOne thing we can do with synthetic lethals

• Ideker: protein interactionsIdeker: protein interactions• Ideker: protein interactionsIdeker: protein interactions

Page 28: Bioinformatics and Evolutionary Genomics High throughput “functional” data / functional genomics / Omics

What do to with What do to with synthetic synthetic lethals?lethals?

What do to with What do to with synthetic synthetic lethals?lethals?

Kelley and Ideker 2005 Nature Biotech

Page 29: Bioinformatics and Evolutionary Genomics High throughput “functional” data / functional genomics / Omics
Page 30: Bioinformatics and Evolutionary Genomics High throughput “functional” data / functional genomics / Omics

ChIP-on-chipChIP-on-chipChIP-on-chipChIP-on-chip

• Tagged strains (one strain for each regulator). Tagged strains (one strain for each regulator). • Micro-array for a strain to see which pieces of DNA Micro-array for a strain to see which pieces of DNA

are found in excess if you isolate the regulator plus are found in excess if you isolate the regulator plus bound DNA.bound DNA.

• Tagged strains (one strain for each regulator). Tagged strains (one strain for each regulator). • Micro-array for a strain to see which pieces of DNA Micro-array for a strain to see which pieces of DNA

are found in excess if you isolate the regulator plus are found in excess if you isolate the regulator plus bound DNA.bound DNA.

b

Page 31: Bioinformatics and Evolutionary Genomics High throughput “functional” data / functional genomics / Omics

Gfp localizationGfp localizationGfp localizationGfp localization

• Mating of fluorescent Mating of fluorescent protein markers specific protein markers specific for organelles plus for organelles plus fluorescent protein tags fluorescent protein tags for each genefor each gene

• Mating of fluorescent Mating of fluorescent protein markers specific protein markers specific for organelles plus for organelles plus fluorescent protein tags fluorescent protein tags for each genefor each gene

Page 32: Bioinformatics and Evolutionary Genomics High throughput “functional” data / functional genomics / Omics

Other functional genomics data: the omesOther functional genomics data: the omesOther functional genomics data: the omesOther functional genomics data: the omes

• quantitative proteomicsquantitative proteomics• KinomeKinome• PTMomePTMome

• (almost) All of these data is freely and publicly (almost) All of these data is freely and publicly availableavailable

• Take home message “wow this exists !!!”Take home message “wow this exists !!!”

• quantitative proteomicsquantitative proteomics• KinomeKinome• PTMomePTMome

• (almost) All of these data is freely and publicly (almost) All of these data is freely and publicly availableavailable

• Take home message “wow this exists !!!”Take home message “wow this exists !!!”

Page 33: Bioinformatics and Evolutionary Genomics High throughput “functional” data / functional genomics / Omics

Accuracy

Co

ver

age

purifiedcomplexes

TAP

yeast two-hybrid

two methods

three methods

PurifiedComplexesHMS-PCI

combinedevidence

mRNAco-expression

genomic context

syntheticlethality

fra

cti

on

of

refe

ren

ce

se

t c

ov

ere

d b

y d

ata

fraction of data confirmed by reference set

filtered data

raw data

parameter choices

Bioinformatics for Benchmarking & IntegrationBioinformatics for Benchmarking & Integration

Page 34: Bioinformatics and Evolutionary Genomics High throughput “functional” data / functional genomics / Omics

Advanced integrationAdvanced integrationAdvanced integrationAdvanced integration

B