requests a tsunami of omics & genomics datatheory.bio.uu.nl/snel/beg/introductionbeg2014.pdfa...

7
3/3/2014 1 2/27/2014 1 Bioinformatics and Evolutionary Bioinformatics and Evolutionary Genomics Genomics Evolution of Genomes, Proteomes, Networks and Evolution of Genomes, Proteomes, Networks and Complexes Complexes Berend Snel Associate Professor Theoretical Biology and Bioinformatics Department of Biology Science Faculty Utrecht University Today Today Introduction on general aims of the course and on procedural stuff Lecture on homology and domains Requests Requests very heterogeneous with respect to previous knowledge (IBMB, GB, research projects, PhD students) PLEASE: interrupt / ask questions when I am going to fast, when I use jargon, when I make jumps/conclusions that to me seem obvious 100% logical, but to your are erratic; please point out my implicit assumptions regarding what everybody knows Computer exercises: more experienced people help http://www.johnpool.net/projects.html A tsunami of A tsunami of omics omics & genomics data & genomics data Genomes Epigenomes Proteomes Interactomes Transcriptomes Phosphoproteomes Why? (post)genomic era, systems biology, Why? (post)genomic era, systems biology, etc. we now have the (human) genome, etc. we now have the (human) genome, how does this make a human? how does this make a human? this data tsunami & this course? this data tsunami & this course? UNIQUE opportunity to study evolution using tons of public data not even generated for this purpose! Also it has been abundantly shown that understanding the evolution of stuff and especially “protein/dna sequences “ is an incredible help to understand the function of the proteins (and thus likely also genomes, networks) “interplay between network and genome evolution”

Upload: others

Post on 14-Oct-2019

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Requests A tsunami of omics & genomics datatheory.bio.uu.nl/snel/BEG/IntroductionBEG2014.pdfA tsunami of omics & genomics data Genomes Epigenomes Proteomes Interactomes Transcriptomes

3/3/2014

1

2/27/2014 1

Bioinformatics and Evolutionary Bioinformatics and Evolutionary

GenomicsGenomics Evolution of Genomes, Proteomes, Networks and Evolution of Genomes, Proteomes, Networks and

ComplexesComplexes

Berend Snel Associate Professor

Theoretical Biology and Bioinformatics Department of Biology

Science Faculty Utrecht University

TodayToday

• Introduction on general aims of the course and on procedural stuff

• Lecture on homology and domains

RequestsRequests • very heterogeneous with respect to previous

knowledge (IBMB, GB, research projects, PhD students)

• PLEASE: interrupt / ask questions when I am going to fast, when I use jargon, when I make jumps/conclusions that to me seem obvious 100% logical, but to your are erratic; please point out my implicit assumptions regarding what everybody knows

• Computer exercises: more experienced people help

http://www.johnpool.net/projects.html

A tsunami of A tsunami of omicsomics & genomics data& genomics data

Genomes Epigenomes Proteomes Interactomes Transcriptomes Phosphoproteomes …

Why? (post)genomic era, systems biology, Why? (post)genomic era, systems biology, etc. we now have the (human) genome, etc. we now have the (human) genome,

how does this make a human?how does this make a human?

this data tsunami & this course?this data tsunami & this course? • UNIQUE opportunity to study evolution using

tons of public data not even generated for this purpose!

• Also it has been abundantly shown that understanding the evolution of stuff and especially “protein/dna sequences “ is an incredible help to understand the function of the proteins (and thus likely also genomes, networks)

• “interplay between network and genome evolution”

Page 2: Requests A tsunami of omics & genomics datatheory.bio.uu.nl/snel/BEG/IntroductionBEG2014.pdfA tsunami of omics & genomics data Genomes Epigenomes Proteomes Interactomes Transcriptomes

3/3/2014

2

(protein coding) Gene (protein coding) Gene ContentContent

bag of genes

Why does the genome contain the genes that it does

C. pneumoniae

C. trachomatis

M. tuberculosis M. pneumoniae M. genitalium

B. subtilis

T. pallidum

T. maritima

B. burgdorferi

P. horikoshii

M. thermoautotrophicum

A. fulgidus

M. jannaschii

S. cerevisiae

C. elegans

A. aeolicus

E. coli

H. influenzae R. prowazekii

H. pylori 26695

Synechocystis sp.

H. pylori J99

A. pernix 100 100

100 100

100

100

100

100

100

98

93 89

69

88

0.1

97

Proteobacteria

Eukarya

Euryarchaeota

Low G+C Gram-Positive Bacteria

100

Snel Bork Huynen Nature Genet 1999 Huynen Snel Bork Science 1999

Fritz-Laylin et al. cell 2010

… however… however

Fokkens and Snel PLoS Comp Biol 2009

MediatorMediator

•Essential for transcription •Associated with general transcription machinery •Bridge •25 subunits •Four submodules

S.po

Y.lip

S.cer K.lac

C.alb

D.han

A.fum A.nid

G.zea N.cra

U.mayC.neo

P.chr

L.bic R.ory

E.cun

D.mel

H.sap M.musC.ele

E.hist D.dis

root

C.merA.thal O.tau

C.rei

T.the

Cryp

Thei

P.falcPhyt

T.pseP.tri

GiardN.gru

L.maj

Tryp

S.po

Y.lip

S.cer K.lac

C.alb

D.han

A.fum A.nid

G.zea N.cra

U.mayC.neo

P.chr

L.bic R.ory

E.cun

D.mel

H.sap M.musC.ele

E.hist D.dis

root

C.merA.thal O.tau

C.rei

T.the

Cryp

Thei

P.falcPhyt

T.pseP.tri

GiardN.gru

L.maj

Tryp

Animals Animals

Amoebozoa Amoebozoa

Fungi Fungi Excavata Excavata

Chrom- Alveolates* Chrom- Alveolates*

Archaeplastids Archaeplastids ?

Page 3: Requests A tsunami of omics & genomics datatheory.bio.uu.nl/snel/BEG/IntroductionBEG2014.pdfA tsunami of omics & genomics data Genomes Epigenomes Proteomes Interactomes Transcriptomes

3/3/2014

3

S.po

Y.lip

S.cer K.lac

C.alb

D.han

A.fum A.nid

G.zea N.cra

U.mayC.neo

P.chr

L.bic R.ory

E.cun

D.mel

H.sap M.musC.ele

E.hist D.dis

root

C.merA.thal O.tau

C.rei

T.the

Cryp

Thei

P.falcPhyt

T.pseP.tri

GiardN.gru

L.maj

Tryp

S.po

Y.lip

S.cer K.lac

C.alb

D.han

A.fum A.nid

G.zea N.cra

U.mayC.neo

P.chr

L.bic R.ory

E.cun

D.mel

H.sap M.musC.ele

E.hist D.dis

root

C.merA.thal O.tau

C.rei

T.the

Cryp

Thei

P.falcPhyt

T.pseP.tri

GiardN.gru

L.maj

Tryp

many partial losses,many partial losses, few gainsfew gains

Presence of genes across genomes: Presence of genes across genomes: orthologyorthology

bags of genes

Why does the genome contain the genes that it does

Bag of genes, what about the Bag of genes, what about the homologshomologs within a within a genome? (genome? (paralogsparalogs, duplicates) , duplicates)

Ub

Cdk1

Cdk1

Cyclin B Securin

Ub Ub

Ub

Ub Ub

Ub Ub

Anaphase Promoting Complex/Cyclosome

(APC/C)

Initiating Anaphase

Mitotic

Checkpoint

Cdk1

Cyclin B Securin

The Mitotic Checkpoint

X

Page 4: Requests A tsunami of omics & genomics datatheory.bio.uu.nl/snel/BEG/IntroductionBEG2014.pdfA tsunami of omics & genomics data Genomes Epigenomes Proteomes Interactomes Transcriptomes

3/3/2014

4

The Mitotic Checkpoint Complex (MCC)

BubR1

Mad2

Cdc20

Mps1

Mitotic Checkpoint

BubR1

Mad2

Cdc20

MCC

Kinetochore

AurB Bub1

Kinase TPR

KEN BUB3-binding

ScBub1

HsBub1 (fungi & vertebrates)

TPR

KEN BUB3-binding

Kinase TPR

BUB3-binding CDI

hsBubR1 (vertebrates)

scMad3p (fungi)

9 independent duplications. 7 cases where a mad3-like and a bub1-like protein arose out of a bubmad-like ancestor. Parallel or convergent evolution?

What about the What about the kinasekinase domain in domain in human bubr1?human bubr1?

degeneration of motifs essential for catalysis

• the preserved ‘catalytic’ residues are essential for BUBR1 conformational stability in vitro and in cells. Uncoupling potential enzymatic activity from structural stability shows that catalysis is dispensable for BUBR1 function in mitosis.

• Suggest that BUBR1 is an atypical pseudokinase.

What about the kinase domain in human (and fly)

Page 5: Requests A tsunami of omics & genomics datatheory.bio.uu.nl/snel/BEG/IntroductionBEG2014.pdfA tsunami of omics & genomics data Genomes Epigenomes Proteomes Interactomes Transcriptomes

3/3/2014

5

What about all these What about all these BubMadBubMad containing species?containing species?

What about the ancestor?What about the ancestor?

?

Subfunctionalization? (cf. TOR but with convergent evolution of domain architecture)

What about all these What about all these bubmadbubmad containing species? containing species? What about the ancestor?: experimentally testing What about the ancestor?: experimentally testing

subfunctionalizationsubfunctionalization

Assembled hsBubMad protein from 1-714 bubr1 & bub1 734-1085 is able to functionally replace both bub1 and bubr1 in human cells !

This courseThis course • I want study evolution of genomes pathways and

networks, so that is why I study gene/protein evolution • At the end to be more or less able to replicate e.g.

bubmad, mediator • Understanding that many bioinformatic challenges are

a mix of conceptual and technical problems (e.g. why orthology is such an incredibly persistent problem)

• “what you should ~know” in order to this kind of research

• Topics are interrelated – e.g. orthology already in homology lecture but proper

explanation a day after – e.g. that trees can be used to time a duplication to

eukaryogenesis but proper discussion of eukaryogenis has its own lecture

Homology (& domains)Homology (& domains) • Absolute basis of any comparative analysis,

affects MSA and trees, detection still being improved,

Gene Phylogeny & Gene Phylogeny & OrthologyOrthology

• How do we get such trees and how do we interpret them

• Trees reveal some of the most important genome evolution processes: LGT, duplication, loss,

(Eukaryotic) tree of life(Eukaryotic) tree of life

• New genomes, cool/exotic animals / protists • Berend / Eelco / John / Like / Leny sitting behind

his/her computer and thinking should I include this genome? How should I interpret an absence? What source of species could I best use for homology based gene prediction.

• Crucial when interpreting gene trees: – Knowing it by heart >>> having to look it up

• With regards to evolutionary signaling cell biology ( kinases, smallgtpases etc. )the diversity in present day genomes is staggering and dwarfs e.g. human-fruit fly difference

Page 6: Requests A tsunami of omics & genomics datatheory.bio.uu.nl/snel/BEG/IntroductionBEG2014.pdfA tsunami of omics & genomics data Genomes Epigenomes Proteomes Interactomes Transcriptomes

3/3/2014

6

Large scale Large scale orthologyorthology

• Needed to move beyond anecdotes, but difficult to get

RapGAP (animals(LSE), fungi, dicty)

PHYSOJ14061 Phytophthora sojae 142624 PHYINF15173 Phytophthora infestans PITG 15173

RalGAPB (oomycetes, dicty, naegleria, fungi, animals))

RalGAPA (dicty, naegleria, fungi, animals)

RheBGAP (TSC2, oomycetes, diatoms, red algea, animals, fungi, dicty, tetrahymena)

99

13

8

23

31

100

24

0.5

EukaryogenesisEukaryogenesis / LECA/ LECA

• Biological topic, eukaryogensis / LECA for which these types of analyses are telling us a lot. But it also impacts a lot of things we do: we see it back in gene trees and it impacts getting orthologous groups across eukaryotes.

Gene content evolutionGene content evolution

• Fundamental level of genome evolution

• Gene invention -> inability to detect homologs vs real lack of homologs does not simply mean novel gene

• Evolutionary modules?

• Trying to move large scale but remember the pitfalls

Whole Genome Whole Genome Duplication (WGD)Duplication (WGD)

• Like LECA, WGD is important biology for which bioinf needed to research but which also impacts our data

• And which is welcome source of information for our analyses (Lidija, bubmad): independent and reliable reconstruction of the history of part of the history of genes

Using HTP data to study evolution of Using HTP data to study evolution of networks / complexesnetworks / complexes

• Is the number of conserved interactions between e.g. yeast and human 10% or 95%???

• On top of all the genome analysis pitfalls also all the HTP data pitfalls …

• Duplicates vs orthologs

What can happen to an interaction in What can happen to an interaction in evolutionevolution

?

?

Page 7: Requests A tsunami of omics & genomics datatheory.bio.uu.nl/snel/BEG/IntroductionBEG2014.pdfA tsunami of omics & genomics data Genomes Epigenomes Proteomes Interactomes Transcriptomes

3/3/2014

7

Techniques AND biologyTechniques AND biology

• Detective/forensics vs concepts; Large scale biology vs small-scale biology; Bioinformatics biology vs data/technique problems;

• A lot like police investigation … and less like Nobel prize winning physics ...

• Anything goes in genome evolution; many processes often entangled (i.e. google subneofunctionalization)

Literature discussionLiterature discussion • You should have read the papers before the day of the

discussion • On day itself group split in critique / defense • Groups prepare defense and critique • Discussion: (critique starts because we all have read the

paper) – Critique gives outline why paper is weak – break – Defense responds to critique – break – Critique gives final comments to which immediately response

can be made – Defense gives final comments to which immediately response

can be made

Computer ExercisesComputer Exercises

• Computer exercises for some topics many others more difficult (i.e. evolution of interaction networks based on HTP analysis).

• Previous years too much cookbook. Attempt at less cookbook, more playing around → you should learn more; but it is slower

• Ask help from fellow students.

• Ties strongly into mini-projects

Mini projectsMini projects

• A protein • what does swissprot / sgd already knows about your protein

function wise (GO) (pathway) • protein topology. (domains, disorder, TM?, motifs?) • Size of the (super)family in the genome you’re sequence is from • Homologs across tree of life • Point of invention of family? if bacterial invention, mito or archaeal

route? • Does it have WGD duplicates? • Tree of relevant sequences in diverse genomes • Orthologs in relevant genomes • (normally relevant genomes would be a few metazoa, fungi, other

ophistokonts, amoebazoa, strameopiles, alveolates, plantae, excavates, see e.g. bubmad but it depends on your protein)

RequestsRequests • very heterogeneous with respect to previous

knowledge (IBMB, GB, research projects, PhD students)

• PLEASE: interrupt / ask questions when I am going to fast, when I use jargon, when I make jumps/conclusions that to me seem obvious 100% logical, but to your are erratic; please point out my implicit assumptions regarding what everybody knows

• Computer exercises: more experienced people help

• And also apologies for some redundancy