requests a tsunami of omics & genomics datatheory.bio.uu.nl/snel/beg/introductionbeg2014.pdfa...
TRANSCRIPT
3/3/2014
1
2/27/2014 1
Bioinformatics and Evolutionary Bioinformatics and Evolutionary
GenomicsGenomics Evolution of Genomes, Proteomes, Networks and Evolution of Genomes, Proteomes, Networks and
ComplexesComplexes
Berend Snel Associate Professor
Theoretical Biology and Bioinformatics Department of Biology
Science Faculty Utrecht University
TodayToday
• Introduction on general aims of the course and on procedural stuff
• Lecture on homology and domains
RequestsRequests • very heterogeneous with respect to previous
knowledge (IBMB, GB, research projects, PhD students)
• PLEASE: interrupt / ask questions when I am going to fast, when I use jargon, when I make jumps/conclusions that to me seem obvious 100% logical, but to your are erratic; please point out my implicit assumptions regarding what everybody knows
• Computer exercises: more experienced people help
http://www.johnpool.net/projects.html
A tsunami of A tsunami of omicsomics & genomics data& genomics data
Genomes Epigenomes Proteomes Interactomes Transcriptomes Phosphoproteomes …
Why? (post)genomic era, systems biology, Why? (post)genomic era, systems biology, etc. we now have the (human) genome, etc. we now have the (human) genome,
how does this make a human?how does this make a human?
this data tsunami & this course?this data tsunami & this course? • UNIQUE opportunity to study evolution using
tons of public data not even generated for this purpose!
• Also it has been abundantly shown that understanding the evolution of stuff and especially “protein/dna sequences “ is an incredible help to understand the function of the proteins (and thus likely also genomes, networks)
• “interplay between network and genome evolution”
3/3/2014
2
(protein coding) Gene (protein coding) Gene ContentContent
bag of genes
Why does the genome contain the genes that it does
C. pneumoniae
C. trachomatis
M. tuberculosis M. pneumoniae M. genitalium
B. subtilis
T. pallidum
T. maritima
B. burgdorferi
P. horikoshii
M. thermoautotrophicum
A. fulgidus
M. jannaschii
S. cerevisiae
C. elegans
A. aeolicus
E. coli
H. influenzae R. prowazekii
H. pylori 26695
Synechocystis sp.
H. pylori J99
A. pernix 100 100
100 100
100
100
100
100
100
98
93 89
69
88
0.1
97
Proteobacteria
Eukarya
Euryarchaeota
Low G+C Gram-Positive Bacteria
100
Snel Bork Huynen Nature Genet 1999 Huynen Snel Bork Science 1999
Fritz-Laylin et al. cell 2010
… however… however
Fokkens and Snel PLoS Comp Biol 2009
MediatorMediator
•Essential for transcription •Associated with general transcription machinery •Bridge •25 subunits •Four submodules
S.po
Y.lip
S.cer K.lac
C.alb
D.han
A.fum A.nid
G.zea N.cra
U.mayC.neo
P.chr
L.bic R.ory
E.cun
D.mel
H.sap M.musC.ele
E.hist D.dis
root
C.merA.thal O.tau
C.rei
T.the
Cryp
Thei
P.falcPhyt
T.pseP.tri
GiardN.gru
L.maj
Tryp
S.po
Y.lip
S.cer K.lac
C.alb
D.han
A.fum A.nid
G.zea N.cra
U.mayC.neo
P.chr
L.bic R.ory
E.cun
D.mel
H.sap M.musC.ele
E.hist D.dis
root
C.merA.thal O.tau
C.rei
T.the
Cryp
Thei
P.falcPhyt
T.pseP.tri
GiardN.gru
L.maj
Tryp
Animals Animals
Amoebozoa Amoebozoa
Fungi Fungi Excavata Excavata
Chrom- Alveolates* Chrom- Alveolates*
Archaeplastids Archaeplastids ?
3/3/2014
3
S.po
Y.lip
S.cer K.lac
C.alb
D.han
A.fum A.nid
G.zea N.cra
U.mayC.neo
P.chr
L.bic R.ory
E.cun
D.mel
H.sap M.musC.ele
E.hist D.dis
root
C.merA.thal O.tau
C.rei
T.the
Cryp
Thei
P.falcPhyt
T.pseP.tri
GiardN.gru
L.maj
Tryp
S.po
Y.lip
S.cer K.lac
C.alb
D.han
A.fum A.nid
G.zea N.cra
U.mayC.neo
P.chr
L.bic R.ory
E.cun
D.mel
H.sap M.musC.ele
E.hist D.dis
root
C.merA.thal O.tau
C.rei
T.the
Cryp
Thei
P.falcPhyt
T.pseP.tri
GiardN.gru
L.maj
Tryp
many partial losses,many partial losses, few gainsfew gains
Presence of genes across genomes: Presence of genes across genomes: orthologyorthology
bags of genes
Why does the genome contain the genes that it does
Bag of genes, what about the Bag of genes, what about the homologshomologs within a within a genome? (genome? (paralogsparalogs, duplicates) , duplicates)
Ub
Cdk1
Cdk1
Cyclin B Securin
Ub Ub
Ub
Ub Ub
Ub Ub
Anaphase Promoting Complex/Cyclosome
(APC/C)
Initiating Anaphase
Mitotic
Checkpoint
Cdk1
Cyclin B Securin
The Mitotic Checkpoint
X
3/3/2014
4
The Mitotic Checkpoint Complex (MCC)
BubR1
Mad2
Cdc20
Mps1
Mitotic Checkpoint
BubR1
Mad2
Cdc20
MCC
Kinetochore
AurB Bub1
Kinase TPR
KEN BUB3-binding
ScBub1
HsBub1 (fungi & vertebrates)
TPR
KEN BUB3-binding
Kinase TPR
BUB3-binding CDI
hsBubR1 (vertebrates)
scMad3p (fungi)
9 independent duplications. 7 cases where a mad3-like and a bub1-like protein arose out of a bubmad-like ancestor. Parallel or convergent evolution?
What about the What about the kinasekinase domain in domain in human bubr1?human bubr1?
degeneration of motifs essential for catalysis
• the preserved ‘catalytic’ residues are essential for BUBR1 conformational stability in vitro and in cells. Uncoupling potential enzymatic activity from structural stability shows that catalysis is dispensable for BUBR1 function in mitosis.
• Suggest that BUBR1 is an atypical pseudokinase.
What about the kinase domain in human (and fly)
3/3/2014
5
What about all these What about all these BubMadBubMad containing species?containing species?
What about the ancestor?What about the ancestor?
?
Subfunctionalization? (cf. TOR but with convergent evolution of domain architecture)
What about all these What about all these bubmadbubmad containing species? containing species? What about the ancestor?: experimentally testing What about the ancestor?: experimentally testing
subfunctionalizationsubfunctionalization
Assembled hsBubMad protein from 1-714 bubr1 & bub1 734-1085 is able to functionally replace both bub1 and bubr1 in human cells !
This courseThis course • I want study evolution of genomes pathways and
networks, so that is why I study gene/protein evolution • At the end to be more or less able to replicate e.g.
bubmad, mediator • Understanding that many bioinformatic challenges are
a mix of conceptual and technical problems (e.g. why orthology is such an incredibly persistent problem)
• “what you should ~know” in order to this kind of research
• Topics are interrelated – e.g. orthology already in homology lecture but proper
explanation a day after – e.g. that trees can be used to time a duplication to
eukaryogenesis but proper discussion of eukaryogenis has its own lecture
Homology (& domains)Homology (& domains) • Absolute basis of any comparative analysis,
affects MSA and trees, detection still being improved,
Gene Phylogeny & Gene Phylogeny & OrthologyOrthology
• How do we get such trees and how do we interpret them
• Trees reveal some of the most important genome evolution processes: LGT, duplication, loss,
(Eukaryotic) tree of life(Eukaryotic) tree of life
• New genomes, cool/exotic animals / protists • Berend / Eelco / John / Like / Leny sitting behind
his/her computer and thinking should I include this genome? How should I interpret an absence? What source of species could I best use for homology based gene prediction.
• Crucial when interpreting gene trees: – Knowing it by heart >>> having to look it up
• With regards to evolutionary signaling cell biology ( kinases, smallgtpases etc. )the diversity in present day genomes is staggering and dwarfs e.g. human-fruit fly difference
3/3/2014
6
Large scale Large scale orthologyorthology
• Needed to move beyond anecdotes, but difficult to get
RapGAP (animals(LSE), fungi, dicty)
PHYSOJ14061 Phytophthora sojae 142624 PHYINF15173 Phytophthora infestans PITG 15173
RalGAPB (oomycetes, dicty, naegleria, fungi, animals))
RalGAPA (dicty, naegleria, fungi, animals)
RheBGAP (TSC2, oomycetes, diatoms, red algea, animals, fungi, dicty, tetrahymena)
99
13
8
23
31
100
24
0.5
EukaryogenesisEukaryogenesis / LECA/ LECA
• Biological topic, eukaryogensis / LECA for which these types of analyses are telling us a lot. But it also impacts a lot of things we do: we see it back in gene trees and it impacts getting orthologous groups across eukaryotes.
Gene content evolutionGene content evolution
• Fundamental level of genome evolution
• Gene invention -> inability to detect homologs vs real lack of homologs does not simply mean novel gene
• Evolutionary modules?
• Trying to move large scale but remember the pitfalls
Whole Genome Whole Genome Duplication (WGD)Duplication (WGD)
• Like LECA, WGD is important biology for which bioinf needed to research but which also impacts our data
• And which is welcome source of information for our analyses (Lidija, bubmad): independent and reliable reconstruction of the history of part of the history of genes
Using HTP data to study evolution of Using HTP data to study evolution of networks / complexesnetworks / complexes
• Is the number of conserved interactions between e.g. yeast and human 10% or 95%???
• On top of all the genome analysis pitfalls also all the HTP data pitfalls …
• Duplicates vs orthologs
What can happen to an interaction in What can happen to an interaction in evolutionevolution
?
?
3/3/2014
7
Techniques AND biologyTechniques AND biology
• Detective/forensics vs concepts; Large scale biology vs small-scale biology; Bioinformatics biology vs data/technique problems;
• A lot like police investigation … and less like Nobel prize winning physics ...
• Anything goes in genome evolution; many processes often entangled (i.e. google subneofunctionalization)
Literature discussionLiterature discussion • You should have read the papers before the day of the
discussion • On day itself group split in critique / defense • Groups prepare defense and critique • Discussion: (critique starts because we all have read the
paper) – Critique gives outline why paper is weak – break – Defense responds to critique – break – Critique gives final comments to which immediately response
can be made – Defense gives final comments to which immediately response
can be made
Computer ExercisesComputer Exercises
• Computer exercises for some topics many others more difficult (i.e. evolution of interaction networks based on HTP analysis).
• Previous years too much cookbook. Attempt at less cookbook, more playing around → you should learn more; but it is slower
• Ask help from fellow students.
• Ties strongly into mini-projects
Mini projectsMini projects
• A protein • what does swissprot / sgd already knows about your protein
function wise (GO) (pathway) • protein topology. (domains, disorder, TM?, motifs?) • Size of the (super)family in the genome you’re sequence is from • Homologs across tree of life • Point of invention of family? if bacterial invention, mito or archaeal
route? • Does it have WGD duplicates? • Tree of relevant sequences in diverse genomes • Orthologs in relevant genomes • (normally relevant genomes would be a few metazoa, fungi, other
ophistokonts, amoebazoa, strameopiles, alveolates, plantae, excavates, see e.g. bubmad but it depends on your protein)
RequestsRequests • very heterogeneous with respect to previous
knowledge (IBMB, GB, research projects, PhD students)
• PLEASE: interrupt / ask questions when I am going to fast, when I use jargon, when I make jumps/conclusions that to me seem obvious 100% logical, but to your are erratic; please point out my implicit assumptions regarding what everybody knows
• Computer exercises: more experienced people help
• And also apologies for some redundancy