assessing the impact of transposable element variation on mouse phenotypes and traits
TRANSCRIPT
Thomas Keane, WTSI 14th May, 2011
Assessing the impact of transposable element variation on mouse phenotypes and traits
Thomas Keane Vertebrate Resequencing Informatics Wellcome Trust Sanger Institute Cambridge, UK
Christoffer Nellåker and Chris Ponting MRC Functional Genomics Unit University of Oxford Oxford, UK
Thomas Keane, WTSI 14th May, 2011
Transposable Elements (TEs)
Transposons are segments of DNA that can move within the genome A minimal ‘genome’ – ability to replicate and change location
Dominate landscape of mammalian genomes 38-45% of rodent and primate genomes Genome size proportional to number of TEs
Class 1 (RNA intermediate) and 2 (DNA intermediate) Potent genetic mutagens Disrupt expression of genes Genome reorganisation and evolution Transduction of flanking sequence
Transposable elements (TEs) active amongst laboratory mouse strains Mouse Genomes Project: Whole genome sequencing of 17 key laboratory mouse strains 13 classical laboratory strains and 4 wild derived inbred strains Average of ~25x illumina sequencing per strain
Thomas Keane, WTSI 14th May, 2011
Agouti Mouse Model
Dolinoy PNAS 2007;104:13056–13061
Thomas Keane, WTSI 14th May, 2011
Mouse TEs
3 main classes of TEs in mouse genome Long interspersed nuclear elements (LINE) Short interspersed nuclear elements (SINE) Endogenous retrovirus superfamily (ERV)
Etn, IAP, MuLV, IS2, MaLR, VL30, RLTR
Key questions What is the true extent and distribution of TEs in the germline of laboratory mouse
strains? What can we learn about the selective pressure acting on TEs maintained in the
germline? How much phenotypic variation and complex traits can we associate with TEs?
Thomas Keane, WTSI 14th May, 2011
TE Calling
Terminology B6+: Present in the reference genome B6-: Not present in reference TEV: Transposable element variant
Computational calling methods B6+
SVMerge* pipeline: Integrate calls from several read-pair based SV ‘deletion’ (!) callers (Kim Wong, WTSI)
B6-
RetroSeq** pipeline developed Identifies discordant mate pairs and compares to a library of known TE sequences
Size estimation Full length element (~5-8kb) vs. solo LTR (<1kb) 30-40x physical coverage long fragment (~3kb) end reads (15 strains) Test if insertion point spanned by 3kb fragment read pairs
*Wong K, Keane TM, Stalker J, Adams DJ (2010) SVMerge: Enhanced structural variant and breakpoint detection by integration of multiple detection methods and local assembly, Genome Biology, 11:R128
**RetroSeq available from https://github.com/tk2/RetroSeq
Thomas Keane, WTSI 14th May, 2011
B6+ TEV Example
C57B6/NJ strain has the ERV Absent in DBA/2J strain Flanking spanning read pairs denote
absence
DBA/2J
C57B6/NJ
Thomas Keane, WTSI 14th May, 2011
B6- TEV Example
NOD/ShiLtJ Full length (~8kb) IAP insertion Not spanned by 3kb fragment
reads
3kb fragments
Zoomed into breakpoint
Thomas Keane, WTSI 14th May, 2011
TEV Catalog
103,798 TEVs detected 28,951 SINEs 40,074 LINEs 34,773 ERVs
Evolutionary context MP consensus tree based on
strain distribution patterns of TEs B6+
44,401 insertions within the C57BL/6J lineage
B6-
59,397 TEVs insertions outside of C57BL/6J lineage
TEs more frequent in wild strains 13.8-22.4 vs. 4.2-6.3 per Mb
Notable expansion/contraction of certain classes ERVs expanding relative to the
other classes IAPs active amongst ERVs
!"#$%"&%'(%"&%)*%"&%)$+,"!+-"%./0.123&4
! "
# $
*566&3!78
*566&3!79 ))()0#(')()):)#;0;)#:)')'$$')#55))#;)<)'<0$)):;()'50<)<55)#:()0'$:##50:#;
1
*
=
>
3
?
@A
.B
C
"
+D
EF
2
=(<*"G5B=(<*"G5DBDHEGAI"JBDE>G/KL"JB1C%GB1GB*1"*GMB=#AGANB=*1GB>*1G0B)0;/)G/OIPB)0;20GEI-A7Q)0;/(G/O3O*RQ"2GBS/*G3LB=1/&G3LB2SCG2KB/2%3&G3LB
%
1* *= => >3 3? ?@ @A)#T#)< 'T$($ ;T(:< )T5'$ :T<)) 5T<<00T);#
)$$U
:$U
5$U
'$U
0$U
$U
3%!
/.D3
".D3VWR-X
".D3
1* *= => >3 3? ?" "+ +D)T0(( )'0 )0$)#T#)< 'T$($ ;T(:< )T5'$0T);#
)$$U
:$U
5$U
'$U
0$U
$U
1* *= => >3 3? ?@ @A)T<## 5$0 )T5): ';5 #'$ 0T$0# )T:(;
)$$U
:$U
5$U
'$U
0$U
$U
1* *= => >3 3? ?" "+ +D5)$ << <$)T<## 5$0 )T5): ';5 #'$
)$$U
:$U
5$U
'$U
0$U
$U
!"#$%"&%'(%"&%)*%"&%)$+,"!+-"%./0.123&4
! "
# $
*566&3!78
*566&3!79 ))()0#(')()):)#;0;)#:)')'$$')#55))#;)<)'<0$)):;()'50<)<55)#:()0'$:##50:#;
1
*
=
>
3
?
@A
.B
C
"
+D
EF
2
=(<*"G5B=(<*"G5DBDHEGAI"JBDE>G/KL"JB1C%GB1GB*1"*GMB=#AGANB=*1GB>*1G0B)0;/)G/OIPB)0;20GEI-A7Q)0;/(G/O3O*RQ"2GBS/*G3LB=1/&G3LB2SCG2KB/2%3&G3LB
%
1* *= => >3 3? ?@ @A)#T#)< 'T$($ ;T(:< )T5'$ :T<)) 5T<<00T);#
)$$U
:$U
5$U
'$U
0$U
$U
3%!
/.D3
".D3VWR-X
".D3
1* *= => >3 3? ?" "+ +D)T0(( )'0 )0$)#T#)< 'T$($ ;T(:< )T5'$0T);#
)$$U
:$U
5$U
'$U
0$U
$U
1* *= => >3 3? ?@ @A)T<## 5$0 )T5): ';5 #'$ 0T$0# )T:(;
)$$U
:$U
5$U
'$U
0$U
$U
1* *= => >3 3? ?" "+ +D5)$ << <$)T<## 5$0 )T5): ';5 #'$
)$$U
:$U
5$U
'$U
0$U
$U
!"#$%"&%'(%"&%)*%"&%)$+,"!+-"%./0.123&4
! "
# $
*566&3!78
*566&3!79 ))()0#(')()):)#;0;)#:)')'$$')#55))#;)<)'<0$)):;()'50<)<55)#:()0'$:##50:#;
1
*
=
>
3
?
@A
.B
C
"
+D
EF
2
=(<*"G5B=(<*"G5DBDHEGAI"JBDE>G/KL"JB1C%GB1GB*1"*GMB=#AGANB=*1GB>*1G0B)0;/)G/OIPB)0;20GEI-A7Q)0;/(G/O3O*RQ"2GBS/*G3LB=1/&G3LB2SCG2KB/2%3&G3LB
%
1* *= => >3 3? ?@ @A)#T#)< 'T$($ ;T(:< )T5'$ :T<)) 5T<<00T);#
)$$U
:$U
5$U
'$U
0$U
$U
3%!
/.D3
".D3VWR-X
".D3
1* *= => >3 3? ?" "+ +D)T0(( )'0 )0$)#T#)< 'T$($ ;T(:< )T5'$0T);#
)$$U
:$U
5$U
'$U
0$U
$U
1* *= => >3 3? ?@ @A)T<## 5$0 )T5): ';5 #'$ 0T$0# )T:(;
)$$U
:$U
5$U
'$U
0$U
$U
1* *= => >3 3? ?" "+ +D5)$ << <$)T<## 5$0 )T5): ';5 #'$
)$$U
:$U
5$U
'$U
0$U
$U
Thomas Keane, WTSI 14th May, 2011
Callset Validation
B6+
Manually annotated all of Chr19 across 8 strains (Flint group, Oxford) PCR validation of 250 randomly selected calls across 8 strains
B6-
PCR validation of 109 calls across 8 strains (Binnaz Yalcin, Oxford) Initially SINE false positive rate found to be high
Further filtering of low complexity, microsatellites, simple repeats was required Reduced false positive from ~30% to 9%
False negative determined by examining SDP from PCR data Size status assignment accurate
>95% of SINEs assigned <3kb status
Thomas Keane, WTSI 14th May, 2011
Structure of ERV Families
!
"!
#!
$!
%!
&!!
'()*+(,-
.)-)/012
MuLV
VL30 IS2
ETn
RLTR
1B
RLTR
45 IAP
RLTR
10
MaLR
34(5
46781
9:
892
:;8
<=>
!
"!
#!
$!
%!
&!!
?+@4A
19:
MuLV
VL30 IS2
ETn
RLTR
1B
RLTR
45 IAP
RLTR
10
MaLR
34(5
4678)
B8C46
)D+5
892
:;8
<=>
!
E
$
F
0)C&
!8G
,;4;
892
:;8
+68C
46)D
4
MuLV
VL30 IS2
ETn
RLTR
1B
RLTR
45 IAP
RLTR
10
MaLR
" #
$
34
(54
6781
9:
8;)-)
801
2;8
<=>
!
"!
#!
$!
%!
&!!
HG GI IJ J9 9? ?K KL
916MH'2012&!N,0224D,+6+6C892:;
5’ LTR (~430 nt)
3’ LTR
IAP Type I 7.3 kb (full length) gag-pol genes (usually defective)
Solo LTR
Solo LTR element
Recombination of the flanking LTRs
Thomas Keane, WTSI 14th May, 2011
Genomic Sequence Context
!
"!
#!
$!
%!
&!!
'!()"*
')"()#*
')#()$*
')$()%*
')%(#!*
'#!(#"*
'#"(##*
'##(#$*
'#$(#%*
'#%(+!*
'+!(+"*
'+"(+#*
'+#(&!!*
,-.-/01234564784910:45;<5=>?@5'A*
B,5(5C295'A*
DEF>5=>?
GEF>5=>?
>H?5=>?
B49;.285DEF>
B49;.285GEF>
B49;.285>H?
I-D?5=>?
!
>H?
DEF>
GEF>
>H?
DEF>
GEF>
"#$%&'%#()"#$&*#()
>H?
DEF>
GEF>
>J;9
E917;9
K/09L
E9147:4928
+
)
,
-./
%
DEF>5:49;.28DEF>5=>?>H?5:49;.28>H?5=>?GEF>5:49;.28GEF>5=>?
&##
"))
)MM
$&!
N%M
&+NM
"+%#
#&%&
$M$+
&!N#$
&MM&&
"%$+M
#$)$%
"
&
!O+
!O"+
!O&"+
!"#$$%&'
"
P2@109845'C6*
&##
"))
)MM
$&!
N%M
&+NM
"+%#
#&%&
$M$+
&!N#$
&MM&&
"%$+M
#$)$%
P2@109845'C6*
&##
"))
)MM
$&!
N%M
&+NM
"+%#
#&%&
$M$+
&!N#$
&MM&&
"%$+M
#$)$%
"
&
!O+
!O"+
!O&"+
!"#$$%&'
"
P2@109845'C6*
&##
"))
)MM
$&!
N%M
&+NM
"+%#
#&%&
$M$+
&!N#$
&MM&&
"%$+M
#$)$%
P2@109845'C6*
01 21
!
"!
#!
$!
%!
&!!
'!()"*
')"()#*
')#()$*
')$()%*
')%(#!*
'#!(#"*
'#"(##*
'##(#$*
'#$(#%*
'#%(+!*
'+!(+"*
'+"(+#*
'+#(&!!*
,-.-/01234564784910:45;<5=>?@5'A*
B,5(5C295'A*
DEF>5=>?
GEF>5=>?
>H?5=>?
B49;.285DEF>
B49;.285GEF>
B49;.285>H?
I-D?5=>?
!
>H?
DEF>
GEF>
>H?
DEF>
GEF>
"#$%&'%#()"#$&*#()
S5&OM+K;/P58T09:4
Q5!O"+ Q5!O+ Q5!OM+ Q5& Q5&O"+ Q5&O+ Q5&OM+
R(?0/-4@
Q5&! Q5&! Q5&! Q5&! Q5&!(% (M ($ (+ (#
>H?
DEF>
GEF>
>J;9
E917;9
K/09L
E9147:4928
+
)
,
-./
%
DEF>5:49;.28DEF>5=>?>H?5:49;.28>H?5=>?GEF>5:49;.28GEF>5=>?
&##
"))
)MM
$&!
N%M
&+NM
"+%#
#&%&
$M$+
&!N#$
&MM&&
"%$+M
#$)$%
"
&
!O+
!O"+
!O&"+
!"#$$%&'
"
P2@109845'C6*
&##
"))
)MM
$&!
N%M
&+NM
"+%#
#&%&
$M$+
&!N#$
&MM&&
"%$+M
#$)$%
P2@109845'C6*
&##
"))
)MM
$&!
N%M
&+NM
"+%#
#&%&
$M$+
&!N#$
&MM&&
"%$+M
#$)$%
"
&
!O+
!O"+
!O&"+
!"#$$%&'
"P2@109845'C6*
&##
"))
)MM
$&!
N%M
&+NM
"+%#
#&%&
$M$+
&!N#$
&MM&&
"%$+M
#$)$%
P2@109845'C6*
01 21
Thomas Keane, WTSI 14th May, 2011
5’ and 3’ Relative Densities
!
"!
#!
$!
%!
&!!
'!()"*
')"()#*
')#()$*
')$()%*
')%(#!*
'#!(#"*
'#"(##*
'##(#$*
'#$(#%*
'#%(+!*
'+!(+"*
'+"(+#*
'+#(&!!*
,-.-/01234564784910:45;<5=>?@5'A*
B,5(5C295'A*
DEF>5=>?
GEF>5=>?
>H?5=>?
B49;.285DEF>
B49;.285GEF>
B49;.285>H?
I-D?5=>?
!
>H?
DEF>
GEF>
>H?
DEF>
GEF>
"#$%&'%#()"#$&*#()
>H?
DEF>
GEF>
>J;9
E917;9
K/09L
E9147:4928
+
)
,
-./
%
DEF>5:49;.28DEF>5=>?>H?5:49;.28>H?5=>?GEF>5:49;.28GEF>5=>?
&##
"))
)MM
$&!
N%M
&+NM
"+%#
#&%&
$M$+
&!N#$
&MM&&
"%$+M
#$)$%
"
&
!O+
!O"+
!O&"+
!"#$$%&'
"
P2@109845'C6*
&##
"))
)MM
$&!
N%M
&+NM
"+%#
#&%&
$M$+
&!N#$
&MM&&
"%$+M
#$)$%
P2@109845'C6*
&##
"))
)MM
$&!
N%M
&+NM
"+%#
#&%&
$M$+
&!N#$
&MM&&
"%$+M
#$)$%
"
&
!O+
!O"+
!O&"+
!"#$$%&'
"
P2@109845'C6*
&##
"))
)MM
$&!
N%M
&+NM
"+%#
#&%&
$M$+
&!N#$
&MM&&
"%$+M
#$)$%
P2@109845'C6*
01 21
5’ 3’
Sense
Anti-sense
Thomas Keane, WTSI 14th May, 2011
Density and Orientation within Genes
Distinct anti-sense bias observed in all types Significantly different bias in first introns between ERVs vs SINEs
Orientation bias remains constant despite divergence of element Biphasic selection process
Assuming no sense/anti-sense insertion bias Implies that half of sense orientated ERVs and one third of SINE/LINEs are deleterious
! "!"#
$!#
$"#
%!#
%"#
&!#
'()*+,*-.*/0#1
"23
&2%
$2!
425
627
3"233
3&23%
3$23!
!"#!#
"$%#&'%
&'()&
()'#"
%)&$*$
%#&'+,
-
89:;<9:;;=>
;?@*.A*B
;=> <9:; 89:;
!"#
$"#
%"#
&"#
3"#
"#
C(+DA
E(BBF*
<GDA
HHH HHH
! "!"#
$!#
$"#
%!#
%"#
&!#
'()*+,*-.*/0#1
"23
&2%
$2!
425
627
3"233
3&23%
3$23!
!"#!#
"$%#&'%
&'()&
()'#"
%)&$*$
%#&'+,
-
89:;<9:;;=>
;?@*.A*B
;=> <9:; 89:;
!"#
$"#
%"#
&"#
3"#
"#
C(+DA
E(BBF*
<GDA
HHH HHH
! "!"#
$!#
$"#
%!#
%"#
&!#
'()*+,*-.*/0#1
"23
&2%
$2!
425
627
3"233
3&23%
3$23!
!"#!#
"$%#&'%
&'()&
()'#"
%)&$*$
%#&'+,
-
89:;<9:;;=>
;?@*.A*B
;=> <9:; 89:;
!"#
$"#
%"#
&"#
3"#
"#
C(+DA
E(BBF*
<GDA
HHH HHH
Thomas Keane, WTSI 14th May, 2011
QTLs associated with TEs
29
Table 3: QTLs associated with SVs
Phenotype Chr SV start SV stop Ancestral
Event Gene SV overlap LogP
Mean platelet volume 1 175158884 175158885 insertion Fcer1a upstream 52.833
OFT Total activity 2 144402772 144402974 SINE insertion Sec23b intron 15.721
Hippocampus cellular proliferation marker 4 49690364 49690365 SINE insertion Grin3a intron 20.119
Home cage activity 4 108951264 108951265 ERV insertion Eps15 upstream 15.922
T-cells: %CD3 4 130038389 130038390 SINE insertion Snrnp40 intron 12.129
Wound healing 7 90731819 90731820 ERV insertion Tmc3 upstream 22.216
Red cells: mean cellular haemoglobin 7 111398000 111480000 insertion Trim5 exon 13.016
Red cells: mean cellular haemoglobin 7 111504957 111505193 deletion Trim30b UTR 12.806
Red cells: mean cellular volume 8 87957244 87957245 LINE insertion 4921524J17Rik upstream 18.141
Serum urea concentration 11 115106122 115106250 deletion Tmem104 UTR 13.404
Hippocampus cellular proliferation marker 13 113783196 113783359 deletion Gm6320 upstream 17.456 T-cells: CD4/CD8 ratio 17 34483680 34483681 deletion H2-Ea upstream 82.858
Start and stop coordinates are given for build37 of the mouse genome, so that insertions into the reference are given as
consecutive base pairs (columns headed SV start and SV stop). The part of the gene overlapped is reported in the column
headed SV overlap. LogP is the negative logarithm of the P-value for association between the SV and the phenotype as
assessed in outbred HS mice 22.
Yalcin et al, under review
SINE deletion
Thomas Keane, WTSI 14th May, 2011
Eps15 IAP Candidate
!"#$% #$%
&'(
)((
)'(
*((
*'(
+((
+'(
$,-&'
!"#$% #$%
&(
&'
)(
'
(
#./*
&0*(1, #/.*
234
&0)(1, $,-&'
234
!"#$% #$%
&'(
)((
)'(
*((
*'(
+((
+'(
$,-&'
!"#$% #$%
&(
&'
)(
'
(
#./*
&0*(1, #/.*
234
&0)(1, $,-&'
234
0
2000
4000
6000
8000
10000
12000
14000
16000
Eps15/Eps15 -/-
Whole Arena Total Distance
0
50
100
150
200
250
Eps15/Eps15 -/-
Number Of Entries To Centre
Eps15: epidermal growth factor receptor pathway substrate 15
Thomas Keane, WTSI 14th May, 2011
Conclusions
Unprecedented catalog (>100k) of mouse TEV elements identified
False positive and negative rates are low
Wild derived strains contain significantly more TEs
Evolutionary context shows expansion of ERVs in mouse lineage
Distinct anti-sense bias for all elements within genes
Estimate that half of sense orientated ERVs and one third of SINE/LINEs are deleterious
Thomas Keane, WTSI 14th May, 2011
Acknowledgements
Mouse TE Project Christoffer Nellåker (Oxford) Wayne Frankel (Jax) Chris Ponting (Oxford)
Mouse Genomes Project Sanger
Petr Danecek Kim Wong David Adams Richard Durbin Sanger Sequencing Teams
EBI Ewan Birney
Wellcome Trust Centre Oxford Jonathan Flint et al. Binnaz Yalcin Avigail Agam Richard Mott
Jackson Lab Laura Reinholdt Leah Rae Donahue
Further Information http://www.sanger.ac.uk/mousegenomes Contacts