assessing the impact of transposable element variation on mouse phenotypes and traits

17
Thomas Keane, WTSI 14 th May, 2011 Assessing the impact of transposable element variation on mouse phenotypes and traits Thomas Keane Vertebrate Resequencing Informatics Wellcome Trust Sanger Institute Cambridge, UK Christoffer Nellåker and Chris Ponting MRC Functional Genomics Unit University of Oxford Oxford, UK

Upload: thomas-keane

Post on 10-May-2015

694 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Assessing the impact of transposable element variation on mouse phenotypes and traits

Thomas Keane, WTSI 14th May, 2011

Assessing the impact of transposable element variation on mouse phenotypes and traits

Thomas Keane Vertebrate Resequencing Informatics Wellcome Trust Sanger Institute Cambridge, UK

Christoffer Nellåker and Chris Ponting MRC Functional Genomics Unit University of Oxford Oxford, UK

Page 2: Assessing the impact of transposable element variation on mouse phenotypes and traits

Thomas Keane, WTSI 14th May, 2011

Transposable Elements (TEs)

Transposons are segments of DNA that can move within the genome   A minimal ‘genome’ – ability to replicate and change location

Dominate landscape of mammalian genomes   38-45% of rodent and primate genomes   Genome size proportional to number of TEs

Class 1 (RNA intermediate) and 2 (DNA intermediate) Potent genetic mutagens   Disrupt expression of genes   Genome reorganisation and evolution   Transduction of flanking sequence

Transposable elements (TEs) active amongst laboratory mouse strains Mouse Genomes Project: Whole genome sequencing of 17 key laboratory mouse strains   13 classical laboratory strains and 4 wild derived inbred strains   Average of ~25x illumina sequencing per strain

Page 3: Assessing the impact of transposable element variation on mouse phenotypes and traits

Thomas Keane, WTSI 14th May, 2011

Agouti Mouse Model

Dolinoy PNAS 2007;104:13056–13061

Page 4: Assessing the impact of transposable element variation on mouse phenotypes and traits

Thomas Keane, WTSI 14th May, 2011

Mouse TEs

3 main classes of TEs in mouse genome   Long interspersed nuclear elements (LINE)   Short interspersed nuclear elements (SINE)   Endogenous retrovirus superfamily (ERV)

 Etn, IAP, MuLV, IS2, MaLR, VL30, RLTR

Key questions   What is the true extent and distribution of TEs in the germline of laboratory mouse

strains?   What can we learn about the selective pressure acting on TEs maintained in the

germline?   How much phenotypic variation and complex traits can we associate with TEs?

Page 5: Assessing the impact of transposable element variation on mouse phenotypes and traits

Thomas Keane, WTSI 14th May, 2011

TE Calling

Terminology   B6+: Present in the reference genome   B6-: Not present in reference   TEV: Transposable element variant

Computational calling methods   B6+

 SVMerge* pipeline: Integrate calls from several read-pair based SV ‘deletion’ (!) callers (Kim Wong, WTSI)

  B6-

 RetroSeq** pipeline developed   Identifies discordant mate pairs and compares to a library of known TE sequences

 Size estimation   Full length element (~5-8kb) vs. solo LTR (<1kb)   30-40x physical coverage long fragment (~3kb) end reads (15 strains)   Test if insertion point spanned by 3kb fragment read pairs

*Wong K, Keane TM, Stalker J, Adams DJ (2010) SVMerge: Enhanced structural variant and breakpoint detection by integration of multiple detection methods and local assembly, Genome Biology, 11:R128

**RetroSeq available from https://github.com/tk2/RetroSeq

Page 6: Assessing the impact of transposable element variation on mouse phenotypes and traits

Thomas Keane, WTSI 14th May, 2011

B6+ TEV Example

C57B6/NJ strain has the ERV Absent in DBA/2J strain   Flanking spanning read pairs denote

absence

DBA/2J

C57B6/NJ

Page 7: Assessing the impact of transposable element variation on mouse phenotypes and traits

Thomas Keane, WTSI 14th May, 2011

B6- TEV Example

NOD/ShiLtJ   Full length (~8kb) IAP insertion   Not spanned by 3kb fragment

reads

3kb fragments

Zoomed into breakpoint

Page 8: Assessing the impact of transposable element variation on mouse phenotypes and traits

Thomas Keane, WTSI 14th May, 2011

TEV Catalog

103,798 TEVs detected   28,951 SINEs   40,074 LINEs   34,773 ERVs

Evolutionary context   MP consensus tree based on

strain distribution patterns of TEs B6+

  44,401 insertions within the C57BL/6J lineage

B6-

  59,397 TEVs insertions outside of C57BL/6J lineage

TEs more frequent in wild strains   13.8-22.4 vs. 4.2-6.3 per Mb

Notable expansion/contraction of certain classes   ERVs expanding relative to the

other classes   IAPs active amongst ERVs

!"#$%"&%'(%"&%)*%"&%)$+,"!+-"%./0.123&4

! "

# $

*566&3!78

*566&3!79 ))()0#(')()):)#;0;)#:)')'$$')#55))#;)<)'<0$)):;()'50<)<55)#:()0'$:##50:#;

1

*

=

>

3

?

@A

.B

C

"

+D

EF

2

=(<*"G5B=(<*"G5DBDHEGAI"JBDE>G/KL"JB1C%GB1GB*1"*GMB=#AGANB=*1GB>*1G0B)0;/)G/OIPB)0;20GEI-A7Q)0;/(G/O3O*RQ"2GBS/*G3LB=1/&G3LB2SCG2KB/2%3&G3LB

%

1* *= => >3 3? ?@ @A)#T#)< 'T$($ ;T(:< )T5'$ :T<)) 5T<<00T);#

)$$U

:$U

5$U

'$U

0$U

$U

3%!

/.D3

".D3VWR-X

".D3

1* *= => >3 3? ?" "+ +D)T0(( )'0 )0$)#T#)< 'T$($ ;T(:< )T5'$0T);#

)$$U

:$U

5$U

'$U

0$U

$U

1* *= => >3 3? ?@ @A)T<## 5$0 )T5): ';5 #'$ 0T$0# )T:(;

)$$U

:$U

5$U

'$U

0$U

$U

1* *= => >3 3? ?" "+ +D5)$ << <$)T<## 5$0 )T5): ';5 #'$

)$$U

:$U

5$U

'$U

0$U

$U

!"#$%"&%'(%"&%)*%"&%)$+,"!+-"%./0.123&4

! "

# $

*566&3!78

*566&3!79 ))()0#(')()):)#;0;)#:)')'$$')#55))#;)<)'<0$)):;()'50<)<55)#:()0'$:##50:#;

1

*

=

>

3

?

@A

.B

C

"

+D

EF

2

=(<*"G5B=(<*"G5DBDHEGAI"JBDE>G/KL"JB1C%GB1GB*1"*GMB=#AGANB=*1GB>*1G0B)0;/)G/OIPB)0;20GEI-A7Q)0;/(G/O3O*RQ"2GBS/*G3LB=1/&G3LB2SCG2KB/2%3&G3LB

%

1* *= => >3 3? ?@ @A)#T#)< 'T$($ ;T(:< )T5'$ :T<)) 5T<<00T);#

)$$U

:$U

5$U

'$U

0$U

$U

3%!

/.D3

".D3VWR-X

".D3

1* *= => >3 3? ?" "+ +D)T0(( )'0 )0$)#T#)< 'T$($ ;T(:< )T5'$0T);#

)$$U

:$U

5$U

'$U

0$U

$U

1* *= => >3 3? ?@ @A)T<## 5$0 )T5): ';5 #'$ 0T$0# )T:(;

)$$U

:$U

5$U

'$U

0$U

$U

1* *= => >3 3? ?" "+ +D5)$ << <$)T<## 5$0 )T5): ';5 #'$

)$$U

:$U

5$U

'$U

0$U

$U

!"#$%"&%'(%"&%)*%"&%)$+,"!+-"%./0.123&4

! "

# $

*566&3!78

*566&3!79 ))()0#(')()):)#;0;)#:)')'$$')#55))#;)<)'<0$)):;()'50<)<55)#:()0'$:##50:#;

1

*

=

>

3

?

@A

.B

C

"

+D

EF

2

=(<*"G5B=(<*"G5DBDHEGAI"JBDE>G/KL"JB1C%GB1GB*1"*GMB=#AGANB=*1GB>*1G0B)0;/)G/OIPB)0;20GEI-A7Q)0;/(G/O3O*RQ"2GBS/*G3LB=1/&G3LB2SCG2KB/2%3&G3LB

%

1* *= => >3 3? ?@ @A)#T#)< 'T$($ ;T(:< )T5'$ :T<)) 5T<<00T);#

)$$U

:$U

5$U

'$U

0$U

$U

3%!

/.D3

".D3VWR-X

".D3

1* *= => >3 3? ?" "+ +D)T0(( )'0 )0$)#T#)< 'T$($ ;T(:< )T5'$0T);#

)$$U

:$U

5$U

'$U

0$U

$U

1* *= => >3 3? ?@ @A)T<## 5$0 )T5): ';5 #'$ 0T$0# )T:(;

)$$U

:$U

5$U

'$U

0$U

$U

1* *= => >3 3? ?" "+ +D5)$ << <$)T<## 5$0 )T5): ';5 #'$

)$$U

:$U

5$U

'$U

0$U

$U

Page 9: Assessing the impact of transposable element variation on mouse phenotypes and traits

Thomas Keane, WTSI 14th May, 2011

Callset Validation

B6+

  Manually annotated all of Chr19 across 8 strains (Flint group, Oxford)   PCR validation of 250 randomly selected calls across 8 strains

B6-

  PCR validation of 109 calls across 8 strains (Binnaz Yalcin, Oxford)   Initially SINE false positive rate found to be high

 Further filtering of low complexity, microsatellites, simple repeats was required  Reduced false positive from ~30% to 9%

  False negative determined by examining SDP from PCR data   Size status assignment accurate

 >95% of SINEs assigned <3kb status

Page 10: Assessing the impact of transposable element variation on mouse phenotypes and traits

Thomas Keane, WTSI 14th May, 2011

Structure of ERV Families

!

"!

#!

$!

%!

&!!

'()*+(,-

.)-)/012

MuLV

VL30 IS2

ETn

RLTR

1B

RLTR

45 IAP

RLTR

10

MaLR

34(5

46781

9:

892

:;8

<=>

!

"!

#!

$!

%!

&!!

?+@4A

19:

MuLV

VL30 IS2

ETn

RLTR

1B

RLTR

45 IAP

RLTR

10

MaLR

34(5

4678)

B8C46

)D+5

892

:;8

<=>

!

E

$

F

0)C&

!8G

,;4;

892

:;8

+68C

46)D

4

MuLV

VL30 IS2

ETn

RLTR

1B

RLTR

45 IAP

RLTR

10

MaLR

" #

$

34

(54

6781

9:

8;)-)

801

2;8

<=>

!

"!

#!

$!

%!

&!!

HG GI IJ J9 9? ?K KL

916MH'2012&!N,0224D,+6+6C892:;

5’ LTR (~430 nt)

3’ LTR

IAP Type I 7.3 kb (full length) gag-pol genes (usually defective)

Solo LTR

Solo LTR element

Recombination of the flanking LTRs

Page 11: Assessing the impact of transposable element variation on mouse phenotypes and traits

Thomas Keane, WTSI 14th May, 2011

Genomic Sequence Context

!

"!

#!

$!

%!

&!!

'!()"*

')"()#*

')#()$*

')$()%*

')%(#!*

'#!(#"*

'#"(##*

'##(#$*

'#$(#%*

'#%(+!*

'+!(+"*

'+"(+#*

'+#(&!!*

,-.-/01234564784910:45;<5=>?@5'A*

B,5(5C295'A*

DEF>5=>?

GEF>5=>?

>H?5=>?

B49;.285DEF>

B49;.285GEF>

B49;.285>H?

I-D?5=>?

!

>H?

DEF>

GEF>

>H?

DEF>

GEF>

"#$%&'%#()"#$&*#()

>H?

DEF>

GEF>

>J;9

E917;9

K/09L

E9147:4928

+

)

,

-./

%

DEF>5:49;.28DEF>5=>?>H?5:49;.28>H?5=>?GEF>5:49;.28GEF>5=>?

&##

"))

)MM

$&!

N%M

&+NM

"+%#

#&%&

$M$+

&!N#$

&MM&&

"%$+M

#$)$%

"

&

!O+

!O"+

!O&"+

!"#$$%&'

"

P2@109845'C6*

&##

"))

)MM

$&!

N%M

&+NM

"+%#

#&%&

$M$+

&!N#$

&MM&&

"%$+M

#$)$%

P2@109845'C6*

&##

"))

)MM

$&!

N%M

&+NM

"+%#

#&%&

$M$+

&!N#$

&MM&&

"%$+M

#$)$%

"

&

!O+

!O"+

!O&"+

!"#$$%&'

"

P2@109845'C6*

&##

"))

)MM

$&!

N%M

&+NM

"+%#

#&%&

$M$+

&!N#$

&MM&&

"%$+M

#$)$%

P2@109845'C6*

01 21

!

"!

#!

$!

%!

&!!

'!()"*

')"()#*

')#()$*

')$()%*

')%(#!*

'#!(#"*

'#"(##*

'##(#$*

'#$(#%*

'#%(+!*

'+!(+"*

'+"(+#*

'+#(&!!*

,-.-/01234564784910:45;<5=>?@5'A*

B,5(5C295'A*

DEF>5=>?

GEF>5=>?

>H?5=>?

B49;.285DEF>

B49;.285GEF>

B49;.285>H?

I-D?5=>?

!

>H?

DEF>

GEF>

>H?

DEF>

GEF>

"#$%&'%#()"#$&*#()

S5&OM+K;/P58T09:4

Q5!O"+ Q5!O+ Q5!OM+ Q5& Q5&O"+ Q5&O+ Q5&OM+

R(?0/-4@

Q5&! Q5&! Q5&! Q5&! Q5&!(% (M ($ (+ (#

>H?

DEF>

GEF>

>J;9

E917;9

K/09L

E9147:4928

+

)

,

-./

%

DEF>5:49;.28DEF>5=>?>H?5:49;.28>H?5=>?GEF>5:49;.28GEF>5=>?

&##

"))

)MM

$&!

N%M

&+NM

"+%#

#&%&

$M$+

&!N#$

&MM&&

"%$+M

#$)$%

"

&

!O+

!O"+

!O&"+

!"#$$%&'

"

P2@109845'C6*

&##

"))

)MM

$&!

N%M

&+NM

"+%#

#&%&

$M$+

&!N#$

&MM&&

"%$+M

#$)$%

P2@109845'C6*

&##

"))

)MM

$&!

N%M

&+NM

"+%#

#&%&

$M$+

&!N#$

&MM&&

"%$+M

#$)$%

"

&

!O+

!O"+

!O&"+

!"#$$%&'

"P2@109845'C6*

&##

"))

)MM

$&!

N%M

&+NM

"+%#

#&%&

$M$+

&!N#$

&MM&&

"%$+M

#$)$%

P2@109845'C6*

01 21

Page 12: Assessing the impact of transposable element variation on mouse phenotypes and traits

Thomas Keane, WTSI 14th May, 2011

5’ and 3’ Relative Densities

!

"!

#!

$!

%!

&!!

'!()"*

')"()#*

')#()$*

')$()%*

')%(#!*

'#!(#"*

'#"(##*

'##(#$*

'#$(#%*

'#%(+!*

'+!(+"*

'+"(+#*

'+#(&!!*

,-.-/01234564784910:45;<5=>?@5'A*

B,5(5C295'A*

DEF>5=>?

GEF>5=>?

>H?5=>?

B49;.285DEF>

B49;.285GEF>

B49;.285>H?

I-D?5=>?

!

>H?

DEF>

GEF>

>H?

DEF>

GEF>

"#$%&'%#()"#$&*#()

>H?

DEF>

GEF>

>J;9

E917;9

K/09L

E9147:4928

+

)

,

-./

%

DEF>5:49;.28DEF>5=>?>H?5:49;.28>H?5=>?GEF>5:49;.28GEF>5=>?

&##

"))

)MM

$&!

N%M

&+NM

"+%#

#&%&

$M$+

&!N#$

&MM&&

"%$+M

#$)$%

"

&

!O+

!O"+

!O&"+

!"#$$%&'

"

P2@109845'C6*

&##

"))

)MM

$&!

N%M

&+NM

"+%#

#&%&

$M$+

&!N#$

&MM&&

"%$+M

#$)$%

P2@109845'C6*

&##

"))

)MM

$&!

N%M

&+NM

"+%#

#&%&

$M$+

&!N#$

&MM&&

"%$+M

#$)$%

"

&

!O+

!O"+

!O&"+

!"#$$%&'

"

P2@109845'C6*

&##

"))

)MM

$&!

N%M

&+NM

"+%#

#&%&

$M$+

&!N#$

&MM&&

"%$+M

#$)$%

P2@109845'C6*

01 21

5’ 3’

Sense

Anti-sense

Page 13: Assessing the impact of transposable element variation on mouse phenotypes and traits

Thomas Keane, WTSI 14th May, 2011

Density and Orientation within Genes

Distinct anti-sense bias observed in all types   Significantly different bias in first introns between ERVs vs SINEs

Orientation bias remains constant despite divergence of element   Biphasic selection process

Assuming no sense/anti-sense insertion bias   Implies that half of sense orientated ERVs and one third of SINE/LINEs are deleterious

! "!"#

$!#

$"#

%!#

%"#

&!#

'()*+,*-.*/0#1

"23

&2%

$2!

425

627

3"233

3&23%

3$23!

!"#!#

"$%#&'%

&'()&

()'#"

%)&$*$

%#&'+,

-

89:;<9:;;=>

;?@*.A*B

;=> <9:; 89:;

!"#

$"#

%"#

&"#

3"#

"#

C(+DA

E(BBF*

<GDA

HHH HHH

! "!"#

$!#

$"#

%!#

%"#

&!#

'()*+,*-.*/0#1

"23

&2%

$2!

425

627

3"233

3&23%

3$23!

!"#!#

"$%#&'%

&'()&

()'#"

%)&$*$

%#&'+,

-

89:;<9:;;=>

;?@*.A*B

;=> <9:; 89:;

!"#

$"#

%"#

&"#

3"#

"#

C(+DA

E(BBF*

<GDA

HHH HHH

! "!"#

$!#

$"#

%!#

%"#

&!#

'()*+,*-.*/0#1

"23

&2%

$2!

425

627

3"233

3&23%

3$23!

!"#!#

"$%#&'%

&'()&

()'#"

%)&$*$

%#&'+,

-

89:;<9:;;=>

;?@*.A*B

;=> <9:; 89:;

!"#

$"#

%"#

&"#

3"#

"#

C(+DA

E(BBF*

<GDA

HHH HHH

Page 14: Assessing the impact of transposable element variation on mouse phenotypes and traits

Thomas Keane, WTSI 14th May, 2011

QTLs associated with TEs

29

Table 3: QTLs associated with SVs

Phenotype Chr SV start SV stop Ancestral

Event Gene SV overlap LogP

Mean platelet volume 1 175158884 175158885 insertion Fcer1a upstream 52.833

OFT Total activity 2 144402772 144402974 SINE insertion Sec23b intron 15.721

Hippocampus cellular proliferation marker 4 49690364 49690365 SINE insertion Grin3a intron 20.119

Home cage activity 4 108951264 108951265 ERV insertion Eps15 upstream 15.922

T-cells: %CD3 4 130038389 130038390 SINE insertion Snrnp40 intron 12.129

Wound healing 7 90731819 90731820 ERV insertion Tmc3 upstream 22.216

Red cells: mean cellular haemoglobin 7 111398000 111480000 insertion Trim5 exon 13.016

Red cells: mean cellular haemoglobin 7 111504957 111505193 deletion Trim30b UTR 12.806

Red cells: mean cellular volume 8 87957244 87957245 LINE insertion 4921524J17Rik upstream 18.141

Serum urea concentration 11 115106122 115106250 deletion Tmem104 UTR 13.404

Hippocampus cellular proliferation marker 13 113783196 113783359 deletion Gm6320 upstream 17.456 T-cells: CD4/CD8 ratio 17 34483680 34483681 deletion H2-Ea upstream 82.858

Start and stop coordinates are given for build37 of the mouse genome, so that insertions into the reference are given as

consecutive base pairs (columns headed SV start and SV stop). The part of the gene overlapped is reported in the column

headed SV overlap. LogP is the negative logarithm of the P-value for association between the SV and the phenotype as

assessed in outbred HS mice 22.

Yalcin et al, under review

SINE deletion

Page 15: Assessing the impact of transposable element variation on mouse phenotypes and traits

Thomas Keane, WTSI 14th May, 2011

Eps15 IAP Candidate

!"#$% #$%

&'(

)((

)'(

*((

*'(

+((

+'(

$,-&'

!"#$% #$%

&(

&'

)(

'

(

#./*

&0*(1, #/.*

234

&0)(1, $,-&'

234

!"#$% #$%

&'(

)((

)'(

*((

*'(

+((

+'(

$,-&'

!"#$% #$%

&(

&'

)(

'

(

#./*

&0*(1, #/.*

234

&0)(1, $,-&'

234

0

2000

4000

6000

8000

10000

12000

14000

16000

Eps15/Eps15 -/-

Whole Arena Total Distance

0

50

100

150

200

250

Eps15/Eps15 -/-

Number Of Entries To Centre

Eps15: epidermal growth factor receptor pathway substrate 15

Page 16: Assessing the impact of transposable element variation on mouse phenotypes and traits

Thomas Keane, WTSI 14th May, 2011

Conclusions

Unprecedented catalog (>100k) of mouse TEV elements identified

False positive and negative rates are low

Wild derived strains contain significantly more TEs

Evolutionary context shows expansion of ERVs in mouse lineage

Distinct anti-sense bias for all elements within genes

Estimate that half of sense orientated ERVs and one third of SINE/LINEs are deleterious

Page 17: Assessing the impact of transposable element variation on mouse phenotypes and traits

Thomas Keane, WTSI 14th May, 2011

Acknowledgements

Mouse TE Project   Christoffer Nellåker (Oxford)   Wayne Frankel (Jax)   Chris Ponting (Oxford)

Mouse Genomes Project   Sanger

  Petr Danecek   Kim Wong   David Adams   Richard Durbin   Sanger Sequencing Teams

  EBI   Ewan Birney

  Wellcome Trust Centre Oxford   Jonathan Flint et al.   Binnaz Yalcin   Avigail Agam   Richard Mott

  Jackson Lab   Laura Reinholdt   Leah Rae Donahue

Further Information   http://www.sanger.ac.uk/mousegenomes   Contacts

[email protected][email protected][email protected]