inferring functional constraints on drosophila noncoding...

40
Casey M. Bergman Faculty of Life Sciences University of Manchester [email protected] Inferring functional constraints on Drosophila noncoding DNA from patterns of sequence evolution.

Upload: phamthuy

Post on 01-Nov-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanRiva... · Inferring functional constraints on Drosophila

Casey M. Bergman

Faculty of Life SciencesUniversity of Manchester

[email protected]

Inferring functional constraints on Drosophila noncoding DNA from patterns of sequence evolution.

Page 2: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanRiva... · Inferring functional constraints on Drosophila

Outline of Talk

• Noncoding DNA and Drosophila as a system

• Conserved noncoding sequences are selectively constrained.

• A framework for predicting enhancers and transcription factor motifs.

• Spatial constraints on conserved noncoding sequences

Page 3: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanRiva... · Inferring functional constraints on Drosophila

Higher organisms have ahigher proportion of noncoding DNA

Bacteria15 %

Yeast30 %

Worm70 %

Fly75 %

Page 4: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanRiva... · Inferring functional constraints on Drosophila

The function of most noncoding DNA is unknown & unannotated

Bioinformatic & functional analysis of noncoding DNA ⇒

Genome organization

Transcriptional regulation

= Exon

Mef2

Mef2

Mef2

Mef2

Mef2

CG15863

CG12130

CG1418

CG12133

Adam

CG12134

CG12134

eve

TER94

TER94

Pka-R2

Pka-R2

Pka-R2

CG12128

BS 1360

Page 5: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanRiva... · Inferring functional constraints on Drosophila

(A)n

Mef2

Mef2

Mef2

Mef2

Mef2

CG15863

CG12130

CG1418

CG12133

Adam

CG12134

CG12134

eve

TER94

TER94

Pka-R2

Pka-R2

Pka-R2

CG12128

BS 1360

Enhancers

AR3/7

2

APRCQ4/6

mes

15RP2

Transposable elements

Goal: comprehensive functional annotation of noncoding sequences in Drososphila

Page 6: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanRiva... · Inferring functional constraints on Drosophila

Why is annotation of cis-regulatory sequences important?

• Better understand development

• Better understand mechanisms of transcription

• Provide material for forward genetics

• Provide material for evolutionary biology

• Generate data for systems biology

Page 7: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanRiva... · Inferring functional constraints on Drosophila

Why Drosophila as a model system?

~120 Mb of euchromatin~15,000 genes

75% noncoding

Compact, deletion bias

Page 8: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanRiva... · Inferring functional constraints on Drosophila

“Pseudogenes” decay rapidly by deletion in Drosophila

Petrov and Hartl (1998) Mol. Biol. Evol. 15:293-302

Page 9: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanRiva... · Inferring functional constraints on Drosophila

Genes with complex expression have long intergenic regions in compact genomes

Nelson, Hersh & Carroll (2004) Genome Biology 5:R25

Page 10: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanRiva... · Inferring functional constraints on Drosophila

Long introns & intergenic regions have slower rates of sequence evolution in Drosophila

Halligan & Keightley (2006) Genome Research 16:875-884

Page 11: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanRiva... · Inferring functional constraints on Drosophila

A wealth of comparative genomic data exists for the genus Drosophila

http://species.flybase.nethttp://rana.lbl.gov/drosophila

Page 12: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanRiva... · Inferring functional constraints on Drosophila

image from Pavel Tomancak (MPI-Dresden)

Thousands of candidate expression patterns:BDGP embryonic in situ database

http://www.fruitfly.org/cgi-bin/ex/insitu.pl

Page 13: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanRiva... · Inferring functional constraints on Drosophila

Base Position

Chromosome Band

Conservation

d_yakubad_pseudoobscura

a_gambiae

5034500 5035000 5035500 5036000 5036500 5037000 5037500 5038000 5038500 5039000 5039500 5040000 5040500 5041000Chromosome Bands

Protein-Coding Genes from FlyBase

Non-Coding Genes from FlyBaseFlyReg: Drosophila DNase I Footprint Database

D.mel./D.yakuba/D.pseudoob./A.gambiae Multiz Alignments & phastCons Scores

46C10

eve

eveUnspecified

evettk

UnspecifiedUnspecified

knihbhbknihbknihb

hbkni

hbhb

kni

hbhb

hbhb

KrKrKrbcd

Krgt

bcdgt

KrKr

Krbcd

KrKr

bcd

Krgt

hbKr

bcd

Kr

hbKr

hb

UnspecifiedUnspecifiedUnspecified

ttkUnspecified

ttkUnspecified

prdeve

UnspecifiedUnspecified

eveprd

UnspecifiedUnspecifiedUnspecifiedUnspecified

Unspecified

Systematic annotation of cis-regulatory datain Drosophila: FlyReg & REDfly databases

Bergman et al. (2005) Bioinformatics 21:1747-1749Gallo et al. (2006) Bioinformatics 22:381-383

Page 14: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanRiva... · Inferring functional constraints on Drosophila

cis-regulatory annotation & systems biology

Ashburner & Bergman (2005) Genome Research 15:1661-1667

Page 15: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanRiva... · Inferring functional constraints on Drosophila

shnAbd-A

fkh

ko

Dll

dpp

mus209

tsh

bcd

salm

Antp

dl

Ubx

zen

kni

ftz

eve

hb

tll

Kr

Trl

grh

cad

h

en

gt

ttk

cis-regulatory annotation & systems biology

Ashburner & Bergman (2005) Genome Research 15:1661-1667

Page 16: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanRiva... · Inferring functional constraints on Drosophila

ORegAnno: Open Regulatory Annotation

Montgomery et al. (2006) Bioinformatics 22:637-640

Page 17: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanRiva... · Inferring functional constraints on Drosophila

Outline of Talk

• Noncoding DNA and Drosophila as a system

• Conserved noncoding sequences are selectively constrained.

• A framework for predicting enhancers and transcription factor motifs.

• Spatial constraints on conserved noncoding sequences

Page 18: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanRiva... · Inferring functional constraints on Drosophila

mel

sim yak ere tak ana pse

500 bp spacer

Pattern of noncoding sequence evolution in Drosophila: the eve stripe 2 enhancer

block

Page 19: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanRiva... · Inferring functional constraints on Drosophila

Are conserved blocks functionally constrained or simply mutational cold spots?

Bergman & Kreitman (2001) Genome Research 11:1335-1345

Clark (2001) Genome Research 11:1319-1320

median: 19 bp

Page 20: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanRiva... · Inferring functional constraints on Drosophila

Using population genetics to test of the mutational cold-spot hypothesis

1. Excess of mutations in blocks relative to fixed differences

(“MK” test - blocks vs. spacers, polymorphism & divergence)

2. Excess of rare derived mutations in blocks relative to spacers

(Non-parametric test - blocks vs. spacers, frequency spectrum)

If blocks are functionally constrained we predict the following:

Page 21: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanRiva... · Inferring functional constraints on Drosophila

If blocks are functionally constrained we predict the following:

block blockspacer

Divergence

Polymorphism

div.

π

1. Excess of mutations in blocks relative to fixed differences

(“MK” test - blocks vs. spacers, polymorphism & divergence)

2. Excess of rare derived mutations in blocks relative to spacers

(Non-parametric test - blocks vs. spacers, frequency spectrum)

Using population genetics to test of the mutational cold-spot hypothesis

Page 22: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanRiva... · Inferring functional constraints on Drosophila

0 ! 0.1 0.1 ! 0.2 0.2 ! 0.3 0.3 ! 0.4 0.4 ! 0.5 0.5 ! 0.6 0.6 ! 0.7 0.7 ! 0.8 0.8 ! 0.9 0.9 ! 1.0

Derived Allele Frequency

0.0

2.0

4.0

6.0

Fra

ction o

f S

NP

s

1. Excess of mutations in blocks relative to fixed differences

(“MK” test - blocks vs. spacers, polymorphism & divergence)

2. Excess of rare derived mutations in blocks relative to spacers

(Non-parametric test - blocks vs. spacers, frequency spectrum)

spacer

If blocks are functionally constrained we predict the following:

Using population genetics to test of the mutational cold-spot hypothesis

Page 23: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanRiva... · Inferring functional constraints on Drosophila

0 ! 0.1 0.1 ! 0.2 0.2 ! 0.3 0.3 ! 0.4 0.4 ! 0.5 0.5 ! 0.6 0.6 ! 0.7 0.7 ! 0.8 0.8 ! 0.9 0.9 ! 1.0

Derived Allele Frequency

0.0

2.0

4.0

6.0

Fra

ction o

f S

NP

s

blockspacer

1. Excess of mutations in blocks relative to fixed differences

(“MK” test - blocks vs. spacers, polymorphism & divergence)

2. Excess of rare derived mutations in blocks relative to spacers

(Non-parametric test - blocks vs. spacers, frequency spectrum)

If blocks are functionally constrained we predict the following:

Using population genetics to test of the mutational cold-spot hypothesis

Page 24: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanRiva... · Inferring functional constraints on Drosophila

Harvesting data from GenBank using PDA: a pipeline to study polymorphism

Casillas & Barbadilla (2004) Nucl. Acids Res. 32:W166-W169

Get sequences & annotations

Input from sequencesfrom Genbank,

corresponding to theDrosophila genus

Minimum of 2sequences per category

MSAparameters

Gene, CDS, exon,intron, 5’UTR,

3’UTR, promoter

Group byspecies & gene

Sequences &annotations

1b

Muscle

Sequencesorganized incategories

2

Alignmentvalidation

Alignmentswith Scores

3

Sequencessubgroups

4

Read geneannotations

8

Extract generegions

Sequences,positions and orientations

9 Alignmentssubgroups

56

Polymorphism

Syn & Non-synpolymorphisms

Linkagedisequilibrium

Codon bias

Diversity AnalysisModule

7Web-based

output

Alignments

Jalview

Output

1a

MySQLdatabase

Seq. manipulations

External programs

OutputDiversity analysis

Low qualitysequences

excluded

Alignqualityvalues

Page 25: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanRiva... · Inferring functional constraints on Drosophila

Glinka (2003) + Ometto (2005)

African

Glinka (2003) + Ometto (2005)

European

Orengo (2004)

European

Intronic 167 173 28

Intergenic 90 93 80

Total loci 257 266 108

# Alleles 11.7 11.8 12.7

bp block 30,683 33,292 28,721

bp spacer 79,317 87,379 47,590

Summary of the polymorphism data sets

Page 26: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanRiva... · Inferring functional constraints on Drosophila

chr2R:

KrKrKr

bcdKrgt

bcdgtKrKrKr

bcdKrKr

bcdKrgt

hbKr

bcdKrhbKrhb

UnspecifiedUnspecifiedUnspecified

ttkUnspecified

ttkUnspecified

prdeve

UnspecifiedUnspecified

eveprd

UnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecified

Conservation

d_simulansd_yakuba

d_ananassaed_pseudoobscura

d_virilisd_mojavensis

a_gambiaea_mellifera

lod=76lod=16lod=97lod=17lod=59lod=18lod=84lod=47lod=58lod=57

lod=116lod=13

lod=121lod=125

lod=90lod=14lod=11lod=12lod=16lod=35lod=27lod=11lod=15lod=42lod=51lod=22lod=23

lod=465

5489500 5490000 5490500 5491000 5491500FlyBase Protein-Coding Genes

FlyReg: Drosophila DNase I Footprint Database

7 Flies, Mosquito and Honeybee Multiz Alignments & phastCons Scores

PhastCons Conserved Elements (7 Flies, Mosquito and Honeybee)

eve

Conserved blocks - UCSC phastcons track

Page 27: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanRiva... · Inferring functional constraints on Drosophila

0

1,250

2,500

3,750

5,000

polymorphism divergence

Single nucleotide polymorphisms & fixed differences are reduced in conserved blocks

66%

80%

3345

437

4901

386

Obs

erve

d nu

mbe

r

blockspacer

Page 28: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanRiva... · Inferring functional constraints on Drosophila

0

0.375

0.750

1.125

1.500

Blocks have an excess of point mutations within species relative to divergence between species

Poly

mor

phis

m :

dive

rgen

ce Chi-square:p<5x10-12

Block Spacer

Poly 437 3345

Div. 386 4901

block spacer

Page 29: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanRiva... · Inferring functional constraints on Drosophila

0

0.13

0.26

0.39

0.52

0.65

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

blockspacer

Blocks have an excess of rare SNPs relative to non-conserved spacers

KS test: p<6x10-11

Derived allele frequency (DAF)

Freq

uenc

y

Chi-square test: p<1x10-11

Page 30: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanRiva... · Inferring functional constraints on Drosophila

Outline of Talk

• Noncoding DNA and Drosophila as a system

• Conserved noncoding sequences are selectively constrained.

• A framework for predicting enhancers and transcription factor motifs.

• Spatial constraints on conserved noncoding sequences

Page 31: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanRiva... · Inferring functional constraints on Drosophila

1020

2020

3020

4020

5020

6020

7020

8020

37350 38350 39350 40350 41350 42350 43350

D. melanogaster

Muller & Basler(2000)

Hepker et al.(1999)

D. v

irilis

dpp 3’ cis-regulatory region

Conserved noncoding sequences are clustered in complex cis-regulatory regions

1 Kb

Page 32: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanRiva... · Inferring functional constraints on Drosophila
Page 33: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanRiva... · Inferring functional constraints on Drosophila

chi2 = 2040d.f. = 30 p < 10-6

Conserved noncoding sequences are clustered in Drosophila

Bergman et al. (2002) Genome Biology 3:0086.

Page 34: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanRiva... · Inferring functional constraints on Drosophila

A molecular interpretation of conservation in complex cis-regulatory regions

= Conserved noncoding sequence

= Spacer intervals

Enhancer 1 Enhancer 2

&

= Transcription factors

Page 35: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanRiva... · Inferring functional constraints on Drosophila

75%

75%

75%

75%

10k 11k 12k 13k 14k 15k 16k 17k 18k 19k

CNS cluster 1 HB

75%

75%

75%

75%

20k 21k 22k 23k 24k 25k 26k 27k 28k 29k

CNS cluster 2

75%

75%

75%

75%

30k 31k 32k 33k 34k 35k 36k 37k 38k 39k 40k

ap-RA

CNS cluster 3

75%

75%

75%

75%

41k 42k 43k 44k 45k 46k 47k 48k 49k 50k

brain enhancer muscle enhancer

vlc-RA

mel-ere

mel-pse

mel-vir

mel-ano

mel-ere

mel-pse

mel-vir

mel-ano

mel-ere

mel-pse

mel-vir

mel-ano

mel-ere

mel-pse

mel-vir

mel-ano

A cluster of conserved noncoding sequences in the apterous region predicts a brain specific enhancer

Bergman et al. (2002) Genome Biology 3:0086.

Coding exon

Conserved noncoding sequence

Conserved regulatory sequence

Page 36: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanRiva... · Inferring functional constraints on Drosophila

MEME

Clusters of conserved noncoding sequencescontain over-represented binding site motifs

--------------------------------------------------------------------------------Sequence name Strand Start P-value Site ------------- ------ ----- --------- ----------38 + 9 1.03e-05 CATTCATA TTTTTATGAG GCTGTTCCTT4 + 15 1.03e-05 TTTGTTGCTC TTTTTATGAG TTTTTTCCAT3 + 15 1.03e-05 TTTGTTGCTC TTTTTATGAG TTTTTTCCAT14 + 10 1.41e-05 GGACGCGCC TTTTTATTGG TGCACCTTCG13 + 10 1.41e-05 GGACGCGCC TTTTTATTGG TGCACCTTCG

.

.--------------------------------------------------------------------------------hunchback recognition motif Stanojevic et al. (1989) TTTTTRNG

Page 37: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanRiva... · Inferring functional constraints on Drosophila

Enhancer

Motif

Berman et al. (2002) Proc. Nat. Acad. Sci.

99:757

Enhancer prediction by clustering: conserved noncoding sequences vs. binding site prediction

Enhancer

Motif

Page 38: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanRiva... · Inferring functional constraints on Drosophila

Matching inferred motifs to functions & factors

Down, Bergman, Su & Hubbard (2007) PLoS Comp Biol 3:e7.

pannierserpent

Page 39: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanRiva... · Inferring functional constraints on Drosophila

Summary

• Drosophila present an excellent model system for understanding the function of noncoding DNA

• Conserved noncoding sequences are under selective constraints & are not mutational cold spots

• Conserved noncoding sequences are clustered into higher order units in complex cis-regulatory regions

• Combining conservation and over-representation can produce high quality cis-regulatory predictions in Drosophila

Page 40: Inferring functional constraints on Drosophila noncoding ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanRiva... · Inferring functional constraints on Drosophila

Acknowledgements

Marty Kreitman

David Huen, Michael Ashburner

Thomas Down - Wellcome Trust Sanger Institute

Sue Celniker, Gerry Rubin,Eddy Rubin

Nora Pierstorff - CologneSonia Casillas - Barcelona