evolutionary and computational genomics shin-han shiu plant biology / cmb / eebb / genetics / qbmi
Post on 22-Dec-2015
216 views
TRANSCRIPT
EVOLUTIONARY AND COMPUTATIONAL EVOLUTIONARY AND COMPUTATIONAL GENOMICSGENOMICS
Shin-Han ShiuShin-Han Shiu
Plant Biology / CMB / EEBB / Genetics / Plant Biology / CMB / EEBB / Genetics / QBMIQBMI
Duplicate genes in the genomeDuplicate genes in the genome
Arabidopsis gene families*Arabidopsis gene families*
*: Clusters of Markov clustering using all-against-all BLAST E values as distance measures
Focus I: Duplication Mechanism and Loss Focus I: Duplication Mechanism and Loss RateRate
GeneDuplications
Mechanisms ConsequencesPreferential
retention
Gain vs. LossGain vs. Loss
3 rounds of whole-genome duplications in the Arabidopsis lineage3 rounds of whole-genome duplications in the Arabidopsis lineage
15,000*30,000
60,000
120,000
Arabidopsisgene content:
21,000**
*: Number of orthologous groups in shared families between Arabidopsis and rice.**: Number of genes in shared families.
Genome duplications + tandem duplications – gene losses =
Receptor KinaseReceptor Kinase
Shiu & Bleecker 2001. Science’s STKE
RLK family sizes in 16 eukaryotesRLK family sizes in 16 eukaryotes
OrganismNumber of genes
Kinase superfamily
RLK/Pelle family Size
Percent total gene
Percent total kin
Arabidopsis thaliana 25,814 1041 615 2.382 59.1
Oryza sativa subsp. indica ~35,000 1607 1131 3.231 70.4
Chlamydomonas reinhardtii ~12,200 414 2 0.016 0.5
Plasmodium falciparum 5,334 94 1 0.019 1.1
Plasmodium yoelii 7,681 70 1 0.013 1.4
Caenorhabditis elegans 19,484 417 1 0.005 0.2
Drosophila melanogaster 13,808 262 1 0.007 0.4
Anopheles gambiae 15,088 216 1 0.007 0.5
Ciona intestinalis 15,852 316 6 0.038 1.9
Fugu rubripes 33,609 632 6 0.018 0.9
Mus musculus 22,444 495 3 0.013 0.6
Homo sapiens 22,980 472 4 0.017 0.8
Saccharomyces cerevisiae 6449 113 0 - -
Candida albicans 6,164 95 0 - -
Neurospora crassa 10082 104 0 - -
Schizosaccharomyces pombe 4945 109 0 - -
Intro 1: Duplicate genes in the genomeIntro 1: Duplicate genes in the genome
*: Clusters of Markov clustering using all-against-all BLAST E values as distance measures
Protein kinase:~1000
Hanada & Shiu, in prep.
Focus II: Differential Retention of Focus II: Differential Retention of DuplicatesDuplicates
GeneDuplications
Mechanisms ConsequencesPreferential
retention
Genome Remodeling in PolyploidsGenome Remodeling in Polyploids
Genome duplication occur frequently in plantsGenome duplication occur frequently in plants What is the fate of duplicates?What is the fate of duplicates?
Mostly lostMostly lost But how fast did gene losses occurBut how fast did gene losses occur
What is tiling arrayWhat is tiling array
Does not rely on annotationDoes not rely on annotation
Exon
UTR
Intron
Cis-regulatory elements
Novel genes
MAR (Matrix attachment regions)
Transcript array
Tiling array
Differences in DuplicabilityDifferences in Duplicability
CategoryCategory ArabidopsisArabidopsis HumanHuman
Defense responseDefense response
ProteolysisProteolysis
TransportTransport
Ion channel activityIon channel activity
MetabolismMetabolism
DevelopmentDevelopment
Protein kinase activityProtein kinase activity
Transcription factor activityTranscription factor activity
DuplicabilityDuplicability The propensity for the retention of a duplicate geneThe propensity for the retention of a duplicate gene Computational analysis of genome-wide trendComputational analysis of genome-wide trend
Gene family expansion and stress Gene family expansion and stress responsivenessresponsiveness
Exp: recently expanded genesExp: recently expanded genes
T: tandem, N: non-tandem T: tandem, N: non-tandem
Focus III: Functional ConsequencesFocus III: Functional Consequences
GeneDuplications
Mechanisms ConsequencesPreferential
retention
Functional Consequences of DuplicationFunctional Consequences of Duplication
Functional divergence and conservationFunctional divergence and conservation Is it because of changes in cis-regulatory elements or coding sequencesIs it because of changes in cis-regulatory elements or coding sequences
How are duplicates retained, subfunctionalization or How are duplicates retained, subfunctionalization or neofunctionalizationneofunctionalization
Stress cis-regulatory logicStress cis-regulatory logic
Develop bioinformatics pipelines for cis-element prediction and Develop bioinformatics pipelines for cis-element prediction and inference of stress expression patternsinference of stress expression patterns
Clusters ofgenes with similarexpression profiles
Machine learning
Motif functionalprediction
Cis-regulatorylogic
Expression dataOver-representedsequence motifs
in 5’ regions
Experimentalvalidations
Detailed Functional Studies of Duplicate Detailed Functional Studies of Duplicate GenesGenes
Functional analyses of DDF1 and DDF2 transcription factorsFunctional analyses of DDF1 and DDF2 transcription factors Derived from recent whole genome duplication in ArabidopsisDerived from recent whole genome duplication in Arabidopsis Related to the well known CBF factors involved in cold and draught Related to the well known CBF factors involved in cold and draught
stressstress
DDFs
PromoterGFP
Knockouts
Over-expression
studies
Interactingproteins
Bindingtargets
DDFs
PromoterGFP
Knockouts
Over-expression
studies
Interactingproteins
Bindingtargets
Arabidopsis thaliana Arabidopsis lyrata
Recent completion …Recent completion …