evolutionary and computational genomics shin-han shiu plant biology / cmb / eebb / genetics / qbmi

17
EVOLUTIONARY AND COMPUTATIONAL EVOLUTIONARY AND COMPUTATIONAL GENOMICS GENOMICS Shin-Han Shiu Shin-Han Shiu Plant Biology / CMB / EEBB / Genetics / QBMI Plant Biology / CMB / EEBB / Genetics / QBMI

Post on 22-Dec-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: EVOLUTIONARY AND COMPUTATIONAL GENOMICS Shin-Han Shiu Plant Biology / CMB / EEBB / Genetics / QBMI

EVOLUTIONARY AND COMPUTATIONAL EVOLUTIONARY AND COMPUTATIONAL GENOMICSGENOMICS

Shin-Han ShiuShin-Han Shiu

Plant Biology / CMB / EEBB / Genetics / Plant Biology / CMB / EEBB / Genetics / QBMIQBMI

Page 2: EVOLUTIONARY AND COMPUTATIONAL GENOMICS Shin-Han Shiu Plant Biology / CMB / EEBB / Genetics / QBMI

Duplicate genes in the genomeDuplicate genes in the genome

Arabidopsis gene families*Arabidopsis gene families*

*: Clusters of Markov clustering using all-against-all BLAST E values as distance measures

Page 3: EVOLUTIONARY AND COMPUTATIONAL GENOMICS Shin-Han Shiu Plant Biology / CMB / EEBB / Genetics / QBMI

Focus I: Duplication Mechanism and Loss Focus I: Duplication Mechanism and Loss RateRate

GeneDuplications

Mechanisms ConsequencesPreferential

retention

Page 4: EVOLUTIONARY AND COMPUTATIONAL GENOMICS Shin-Han Shiu Plant Biology / CMB / EEBB / Genetics / QBMI

Gain vs. LossGain vs. Loss

3 rounds of whole-genome duplications in the Arabidopsis lineage3 rounds of whole-genome duplications in the Arabidopsis lineage

15,000*30,000

60,000

120,000

Arabidopsisgene content:

21,000**

*: Number of orthologous groups in shared families between Arabidopsis and rice.**: Number of genes in shared families.

Genome duplications + tandem duplications – gene losses =

Page 5: EVOLUTIONARY AND COMPUTATIONAL GENOMICS Shin-Han Shiu Plant Biology / CMB / EEBB / Genetics / QBMI

Receptor KinaseReceptor Kinase

Shiu & Bleecker 2001. Science’s STKE

Page 6: EVOLUTIONARY AND COMPUTATIONAL GENOMICS Shin-Han Shiu Plant Biology / CMB / EEBB / Genetics / QBMI

RLK family sizes in 16 eukaryotesRLK family sizes in 16 eukaryotes

OrganismNumber of genes

Kinase superfamily

RLK/Pelle family Size

Percent total gene

Percent total kin

Arabidopsis thaliana 25,814 1041 615 2.382 59.1

Oryza sativa subsp. indica ~35,000 1607 1131 3.231 70.4

Chlamydomonas reinhardtii ~12,200 414 2 0.016 0.5

Plasmodium falciparum 5,334 94 1 0.019 1.1

Plasmodium yoelii 7,681 70 1 0.013 1.4

Caenorhabditis elegans 19,484 417 1 0.005 0.2

Drosophila melanogaster 13,808 262 1 0.007 0.4

Anopheles gambiae 15,088 216 1 0.007 0.5

Ciona intestinalis 15,852 316 6 0.038 1.9

Fugu rubripes 33,609 632 6 0.018 0.9

Mus musculus 22,444 495 3 0.013 0.6

Homo sapiens 22,980 472 4 0.017 0.8

Saccharomyces cerevisiae 6449 113 0 - -

Candida albicans 6,164 95 0 - -

Neurospora crassa 10082 104 0 - -

Schizosaccharomyces pombe 4945 109 0 - -

Page 7: EVOLUTIONARY AND COMPUTATIONAL GENOMICS Shin-Han Shiu Plant Biology / CMB / EEBB / Genetics / QBMI

Intro 1: Duplicate genes in the genomeIntro 1: Duplicate genes in the genome

*: Clusters of Markov clustering using all-against-all BLAST E values as distance measures

Protein kinase:~1000

Hanada & Shiu, in prep.

Page 8: EVOLUTIONARY AND COMPUTATIONAL GENOMICS Shin-Han Shiu Plant Biology / CMB / EEBB / Genetics / QBMI

Focus II: Differential Retention of Focus II: Differential Retention of DuplicatesDuplicates

GeneDuplications

Mechanisms ConsequencesPreferential

retention

Page 9: EVOLUTIONARY AND COMPUTATIONAL GENOMICS Shin-Han Shiu Plant Biology / CMB / EEBB / Genetics / QBMI

Genome Remodeling in PolyploidsGenome Remodeling in Polyploids

Genome duplication occur frequently in plantsGenome duplication occur frequently in plants What is the fate of duplicates?What is the fate of duplicates?

Mostly lostMostly lost But how fast did gene losses occurBut how fast did gene losses occur

Page 10: EVOLUTIONARY AND COMPUTATIONAL GENOMICS Shin-Han Shiu Plant Biology / CMB / EEBB / Genetics / QBMI

What is tiling arrayWhat is tiling array

Does not rely on annotationDoes not rely on annotation

Exon

UTR

Intron

Cis-regulatory elements

Novel genes

MAR (Matrix attachment regions)

Transcript array

Tiling array

Page 11: EVOLUTIONARY AND COMPUTATIONAL GENOMICS Shin-Han Shiu Plant Biology / CMB / EEBB / Genetics / QBMI

Differences in DuplicabilityDifferences in Duplicability

CategoryCategory ArabidopsisArabidopsis HumanHuman

Defense responseDefense response

ProteolysisProteolysis

TransportTransport

Ion channel activityIon channel activity

MetabolismMetabolism

DevelopmentDevelopment

Protein kinase activityProtein kinase activity

Transcription factor activityTranscription factor activity

DuplicabilityDuplicability The propensity for the retention of a duplicate geneThe propensity for the retention of a duplicate gene Computational analysis of genome-wide trendComputational analysis of genome-wide trend

Page 12: EVOLUTIONARY AND COMPUTATIONAL GENOMICS Shin-Han Shiu Plant Biology / CMB / EEBB / Genetics / QBMI

Gene family expansion and stress Gene family expansion and stress responsivenessresponsiveness

Exp: recently expanded genesExp: recently expanded genes

T: tandem, N: non-tandem T: tandem, N: non-tandem

Page 13: EVOLUTIONARY AND COMPUTATIONAL GENOMICS Shin-Han Shiu Plant Biology / CMB / EEBB / Genetics / QBMI

Focus III: Functional ConsequencesFocus III: Functional Consequences

GeneDuplications

Mechanisms ConsequencesPreferential

retention

Page 14: EVOLUTIONARY AND COMPUTATIONAL GENOMICS Shin-Han Shiu Plant Biology / CMB / EEBB / Genetics / QBMI

Functional Consequences of DuplicationFunctional Consequences of Duplication

Functional divergence and conservationFunctional divergence and conservation Is it because of changes in cis-regulatory elements or coding sequencesIs it because of changes in cis-regulatory elements or coding sequences

How are duplicates retained, subfunctionalization or How are duplicates retained, subfunctionalization or neofunctionalizationneofunctionalization

Page 15: EVOLUTIONARY AND COMPUTATIONAL GENOMICS Shin-Han Shiu Plant Biology / CMB / EEBB / Genetics / QBMI

Stress cis-regulatory logicStress cis-regulatory logic

Develop bioinformatics pipelines for cis-element prediction and Develop bioinformatics pipelines for cis-element prediction and inference of stress expression patternsinference of stress expression patterns

Clusters ofgenes with similarexpression profiles

Machine learning

Motif functionalprediction

Cis-regulatorylogic

Expression dataOver-representedsequence motifs

in 5’ regions

Experimentalvalidations

Page 16: EVOLUTIONARY AND COMPUTATIONAL GENOMICS Shin-Han Shiu Plant Biology / CMB / EEBB / Genetics / QBMI

Detailed Functional Studies of Duplicate Detailed Functional Studies of Duplicate GenesGenes

Functional analyses of DDF1 and DDF2 transcription factorsFunctional analyses of DDF1 and DDF2 transcription factors Derived from recent whole genome duplication in ArabidopsisDerived from recent whole genome duplication in Arabidopsis Related to the well known CBF factors involved in cold and draught Related to the well known CBF factors involved in cold and draught

stressstress

DDFs

PromoterGFP

Knockouts

Over-expression

studies

Interactingproteins

Bindingtargets

DDFs

PromoterGFP

Knockouts

Over-expression

studies

Interactingproteins

Bindingtargets

Arabidopsis thaliana Arabidopsis lyrata

Page 17: EVOLUTIONARY AND COMPUTATIONAL GENOMICS Shin-Han Shiu Plant Biology / CMB / EEBB / Genetics / QBMI

Recent completion …Recent completion …