diversity and survival strategies of ltr retrotransposons in the arabidopsis genome

45
Diversity and survival strategies of LTR retrotransposons in the Arabidopsis genome Brooke Peterson-Burch Voytas Laboratory Iowa State University

Upload: giza

Post on 13-Jan-2016

31 views

Category:

Documents


0 download

DESCRIPTION

Diversity and survival strategies of LTR retrotransposons in the Arabidopsis genome. Brooke Peterson-Burch Voytas Laboratory Iowa State University. Beyond genes. Most DNA in eukaryotes doesn’t code for anything necessary for the survival and replication of the organism. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Diversity and survival strategies of LTR retrotransposons in the  Arabidopsis  genome

Diversity and survival strategies of LTR

retrotransposons in the Arabidopsis genome

Brooke Peterson-Burch

Voytas Laboratory

Iowa State University

Page 2: Diversity and survival strategies of LTR retrotransposons in the  Arabidopsis  genome

Beyond genes

Most DNA in eukaryotes doesn’t code for anything necessary for the survival and replication of the organism.

How did that sequence get there?Why isn’t it eliminated?

Genome sequences can teach us about genome evolution and the part that retroelements play

Page 3: Diversity and survival strategies of LTR retrotransposons in the  Arabidopsis  genome

What’s a retroelement?

Type of transposable element

A mRNA copy of the parental element ‘genome’ is reverse transcribed into DNA and inserted into a new location in the host

Transposition is replicative

Page 4: Diversity and survival strategies of LTR retrotransposons in the  Arabidopsis  genome

Retroelement genomes

pol

env

LTR

vif

vpr

LTRgagMACANC p6

PR RT RH IN

TMSU

tat

nefHIV-1

vpurevRetroviridae

retroposonsgag

RT RHEN AAAn

MA CA NC PR RT RHINPseudoviridae

MA CA NC

PR RT RH INMetaviridae

DirsRT RH

λ Recombinase

gag

BEL gag PR RT RH IN

Page 5: Diversity and survival strategies of LTR retrotransposons in the  Arabidopsis  genome

Element

Retro living…

Transcription

mRNA

pol

env

LTR

vif

vpr

LTRgagMA CA NC p6

PR RT RH IN

TMSU

tat

nefHIV-1

vpurev

Translation

LTRMA CA NC PR RT RHIN

LTRPseudoviridae

Page 6: Diversity and survival strategies of LTR retrotransposons in the  Arabidopsis  genome

Element

Retroelement life cycle

Particle

Only virusesescape host cell

Packaging

pol

env

LTR

vif

vpr

LTRgagMA CA NC p6

PR RT RH IN

TMSU

tat

nefHIV-1

vpurev

LTRMA CA NC PR RT RHIN

LTRPseudoviridae

Page 7: Diversity and survival strategies of LTR retrotransposons in the  Arabidopsis  genome

Element

Retroelement life cycle

cDNA

Reverse Transcription

pol

env

LTR

vif

vpr

LTRgagMA CA NC p6

PR RT RH IN

TMSU

tat

nefHIV-1

vpurev

LTRMA CA NC PR RT RHIN

LTRPseudoviridae

Page 8: Diversity and survival strategies of LTR retrotransposons in the  Arabidopsis  genome

Element

Retroelement life cycle

New CopycDNA

IN

Integration

pol

env

LTR

vif

vpr

LTRgagMA CA NC p6

PR RT RH IN

TMSU

tat

nefHIV-1

vpurev

LTRMA CA NC PR RT RHIN

LTRPseudoviridae

Page 9: Diversity and survival strategies of LTR retrotransposons in the  Arabidopsis  genome

Retroelements play a major role in the structure and evolution of many genomes

Genome sequences provide a great resource for diversity, distribution, and element identification studies

Page 10: Diversity and survival strategies of LTR retrotransposons in the  Arabidopsis  genome

Retroelements and GenomesGenome data-mining can help answer questions about:

Number of ElementsTypes of ElementsDiversityPhysical distributionImpact on hostOdd or interesting elementsEvolutionary historyElement sequence and domain characteristics

Page 11: Diversity and survival strategies of LTR retrotransposons in the  Arabidopsis  genome

Diversity of the Pseudoviridae

Page 12: Diversity and survival strategies of LTR retrotransposons in the  Arabidopsis  genome

A retroelement family tree

RetroposonsRetroposons

PseudoviridaePseudoviridae

BELBEL

DirsDirs

RetroviridaeRetroviridae

MetaviridaeMetaviridae

Page 13: Diversity and survival strategies of LTR retrotransposons in the  Arabidopsis  genome

3

6

4

1

25

Melm

oth

Tgmr

2 2904626

5 21307623

Tst1X66399

AtRE1

Evelknievel

Hopscotch

Retrofit

Luec

kenb

uess

er (

G)

Oss

er (

G) E

ndovir1-1 S

IRE

1

ToR

TL1

Opie-2

PR

EM

2

Art1 Tpv2-6

1 16648808

copia (I)

RIRE1 BARE 1

Sto 4

Tnt1 94 Tto1

Panzee

Ta1

-3

Tca

5 (F

)17

31

Ty4

(F

)

Ty1

(F

) Tca

2 (F

)

5 8

7838

61

Ta1

1

0.1

5 14977057

4 80

8019

8

Ty5-

6p (F

)

Mos

qcop

ia (I

)

95

68

97

100

92

85

70

9491

95

100

78

86

54

A.thaliana captures all plant Pseudoviridae diversity

Retroposons

Pseudoviridae

BEL

Dirs

Retroviridae

Metaviridae

Page 14: Diversity and survival strategies of LTR retrotransposons in the  Arabidopsis  genome

Mapping proteases to HIV-1 structure helps explain patterns of conservation

LTRMA CA NC RT RHIN

LTRPR

Page 15: Diversity and survival strategies of LTR retrotransposons in the  Arabidopsis  genome

Integrase: what’s happening in the back?

H D D EH CC

(Meta/Retro)viridae

GPF/Y

common region

Other

GKGY

GPF/Y

PseudoviridaeG KGY

Proline rich regionH D D EH CC

GKG Y

GPFY

-- 1 --

-- 1 --

-- 1 --

-- 1 --

-- 1 --

-- 1 --

-- 1 --

-- 1 --

-- 1 --

-- 1 --

- -1 --

-- 4 --

-- 60 --

-- 60 --

-- 57 --

-- 58 --

-- 60 --

-- 68 --

Chromodomain

+/-

Del

Athila5-1

MMLV

SnRV

Tf1

Ty3-2

gypsy

HIV1osvaldo

RSV

WDSV

BARE-1copia

Endovir1-1

Retrofit

Ty1Ty5

Melmoth

1731

Osser

Tnt1-94

Opie-2

Mosqcopia

+-----

…217

…211

…311

…239

…223

…218

…257

…290

…327

…231

…465

…476

…249

…189

…238

…248

…201

…198

…133

...137

…192

…167

…167

ILGD

+/-

---

+--+-----

Chromodomain present

ILGD motifpresent

* * * * **

LTRMA CA NC RT RH

LTRPR IN

Page 16: Diversity and survival strategies of LTR retrotransposons in the  Arabidopsis  genome

24%

2000nt 12000nt10000nt8000nt6000nt4000nt

29%

Calypso

Endovir

SIRE-1

Athila4-6

Cyclops-2

gag pol env?

24%

2000nt 12000nt10000nt8000nt6000nt4000nt

29%

Calypso

Endovir

SIRE-1

Athila4-6

Cyclops-2

gag pol env?

Putative env gene is conserved across species

Page 17: Diversity and survival strategies of LTR retrotransposons in the  Arabidopsis  genome

HIV

-1

Ro

usv

Mo

ML

V

Ty3Gypsy

Del1

Reina

Cyclops

Calypso

Fababean

Athila

4-6

Grande

Tat4

-1

Cin

ful-

1

MA

G

SU

RL

Ty1

cop

ia Tto

1

Tn

t1-94

Ta1-3

Art1

ToRTL1

Opie-2

Endovir1-1

SIRE-1

Tst1

Retrofit

Hop

scot

ch

Eve

lkn

ieve

l

Oss

er

Ty5

-6p

0.1 changes

Retroviridae

Pseudoviridae

Metaviridae

Putative retroviruses

Retroviruses independently evolved at least twice in

plants

Page 18: Diversity and survival strategies of LTR retrotransposons in the  Arabidopsis  genome

retrovirus envlike-coding regions show a bipartite structural organization

Endovir1-1 env

668 aa ToRTL1 env

31% ID

24% ID

648 aa SIRE-1 env

476 aa

pol

env

LTR

vif

vpr

LTRgagMA CA NC p6

PR RT RH IN

TMSU

tat

nefHIV-1

vpurev

Page 19: Diversity and survival strategies of LTR retrotransposons in the  Arabidopsis  genome

Gag surprises…

Putative retrovirus group

(Hemi/Pseudo)virusB

C

C

A

A

BA B

A

C

CB

LTRRT RH

LTRPR INMA CA NC

Gag is much larger in the retroviral lineage

Sequence and structural conservation is evident

Page 20: Diversity and survival strategies of LTR retrotransposons in the  Arabidopsis  genome

Diversity of the Pseudoviridae family summary

Enzymatic regions appear to be highly constrained other than the IN C-terminus.Arabidopsis LTR retrotransposons are representative of plant elements in the familyThe putative retroviruses represent an uniquely evolving Pseudoviridae lineage bearing numerous changes in the retrotransposon genome. Sub-lineage differences suggest areas to focus experimental efforts for functional studies.Gag shows greater sequence conservation than previously thought

Page 21: Diversity and survival strategies of LTR retrotransposons in the  Arabidopsis  genome

Summary continued…

envlike-coding regions have been evolutionarily conserved indicating a functional role for the ORF

features suggestive of viral env proteins have been identified in all LTR retrotransposon envlike ORFs

putative env proteins have evolved in at least two independent plant LTR retrotransposon lineages, giving credence to the hypothesis that retroviruses evolved from retrotransposons

Page 22: Diversity and survival strategies of LTR retrotransposons in the  Arabidopsis  genome

Organization of the retroelement populations of the Arabidopsis genome

Page 23: Diversity and survival strategies of LTR retrotransposons in the  Arabidopsis  genome

Do retroelements of higher eukaryotes choose where they integrate?

Is yeast a good model?Multicellular organism genome projects have noted that transposable element numbers are markedly increased near centromeres. This project quantitatively documents these anecdotal observations for the Arabidopsis genome

Page 24: Diversity and survival strategies of LTR retrotransposons in the  Arabidopsis  genome

Completed genome?

10MB 20 30 40 50 60 70 80 90

3

4

X

28.0

2

Page 25: Diversity and survival strategies of LTR retrotransposons in the  Arabidopsis  genome

RetroMap: a graphical tool for simplifying whole-genome analysis of retroelements

Page 26: Diversity and survival strategies of LTR retrotransposons in the  Arabidopsis  genome

RetroMap FeaturesRetroMap provides the following tools to work with genome

data:• Parse blast results• Assign Lineages or arbitrary groupings to retroelements• View chromosomal locations• Identify and extract LTRS• Identify and extract full length elements• Assign ages to complete LTR retroelements• Extract sequence(s) for hits• Visualize hit open reading frames• Generate information about neighboring annotated features

(Arabidopsis thaliana only)• Generate tab-delimited datafiles of retroelement information for direct

import into statistical software packages

Page 27: Diversity and survival strategies of LTR retrotransposons in the  Arabidopsis  genome

Overview of how RetroMap generates retroelement data for a genome

Page 28: Diversity and survival strategies of LTR retrotransposons in the  Arabidopsis  genome

Starting eprobe sequences

0.1

TAtRL ta11

L1 Hs

R2 Dm.

R1 Dm

Jockey Dm

996

Tca2 Ca.

Ty5 Sp

copia DmArt1 At

Endovir1 1 At

SIRE1 Gm

1000

Pao Bm

BEL Dm

Mazi Dm

Roo Dm1000

Prt1 Pbla

Dirs1 Dd

PAT Pred

861

HIV1

RSV

SnRVMMLV

WDSV

Cer1 CeOsvaldo Db

Athila At con

Ty3 Sc

sushi Fr

Tf1 Spom

946

988

Page 29: Diversity and survival strategies of LTR retrotransposons in the  Arabidopsis  genome

A. thaliana LTR retrotransposon genome overview

0.2

Tat

Athila

Metavirus

root

Metaviridae

0.1

Pseudoviridaeroot

Full-length Solo LTRs RT only A. thal DNARetroposon -- -- 311 0.22%Pseudoviridae 220 483 83 1.25%Metaviridae 217 2803 143 3.16%Athila 47 -- -- 0.60%Tat 48 -- -- 0.50%Metavirus 88 -- -- 0.64%Totals 437 3286 537 4.63%

Page 30: Diversity and survival strategies of LTR retrotransposons in the  Arabidopsis  genome

A. thaliana retroelements consist of retroposons and only two LTR families

Pseudoviridae elements are significantly shorter (p=.0001)

Page 31: Diversity and survival strategies of LTR retrotransposons in the  Arabidopsis  genome

Dating LTR retrotransposons

gag pol

identical at time of insertion

Relative ages can be estimated from the sequence divergence (genetic distance) of the LTRs

e.g. T = d (genetic distance: 1 – (% identity ÷ 100))

2k (k: nucleotide substitution rate for genome)

Page 32: Diversity and survival strategies of LTR retrotransposons in the  Arabidopsis  genome

Pseudos are younger than Metas. The Athila sublineage being the oldest tested

Page 33: Diversity and survival strategies of LTR retrotransposons in the  Arabidopsis  genome

A. thaliana RT distributions

Page 34: Diversity and survival strategies of LTR retrotransposons in the  Arabidopsis  genome

Going solo

homologous recombination loops out and deletes retroelement internal sequences

host DNA

host DNA

Full-length element

solo LTR

Page 35: Diversity and survival strategies of LTR retrotransposons in the  Arabidopsis  genome

Where have they been?

Page 36: Diversity and survival strategies of LTR retrotransposons in the  Arabidopsis  genome

No family distribution is randomMetaviridae Athila and Tat are found preferentially inside heterochromatic regions, others groups are not

Pseudoviridae and retroposon distributions are not significantly different

Solo LTRs show same distributions as full-length family members

Page 37: Diversity and survival strategies of LTR retrotransposons in the  Arabidopsis  genome

Hypotheses

Retroelement lineages show ‘universal’ organizational characteristics on the family levelGeneral retroelement abundance at centromeres is due to reduced elimination…the ‘graveyard scenario’Metaviridae in Arabidopsis are targeted to heterochromatin

Page 38: Diversity and survival strategies of LTR retrotransposons in the  Arabidopsis  genome

ConclusionsHeterochromatic regions DO appear to act as graveyards, at least in the case of the Pseudoviridae (and presumably the retroposons)

Younger Pseudoviridae elements tend to be found outside of heterochromatinSolo LTR distributions indicate that homologous recombination between LTRs is not greatly inhibited in heterochromatin

The Metaviridae lineages appear to use targeting in their interactions with the host genome

Page 39: Diversity and survival strategies of LTR retrotransposons in the  Arabidopsis  genome

AcknowledgementsSo many people helped make this research happen, I couldn’t have done it without their support and input.

Special thanks go to the many members of the Voytas lab, past and present, undergrads too!

I’ve been lucky to have good collaborators who are interesting and fun to work with. These have included Dr. Nettleton, Dr. Wright, Dr. Laten from Loyola University, and always Dr. Voytas.

To the head honcho: no one can say it hasn’t been a crazy, crazy ride. Thanks. :o)

Page 40: Diversity and survival strategies of LTR retrotransposons in the  Arabidopsis  genome
Page 41: Diversity and survival strategies of LTR retrotransposons in the  Arabidopsis  genome

Basic Hit Redundancy Elimination SchemeQuery sequence

1) Simple match, no overlap with nearest hit, no compression

case 1

case 2

2) Overlap case(s) both hits merged into one representing their combined maximum extent on the database sequence

case 3

3) Two non-overlapping hits which should be combined:a) Left checks it’s boundary position on its query sequence and determines

if the other hit falls within that range. If so merge.b) Right repeats the proceedure if Left failed to indicate a merge

case 4

4) An example of a merge case which may lead to false positives

Page 42: Diversity and survival strategies of LTR retrotransposons in the  Arabidopsis  genome

BLAST false-positive amplification problem

RTBlast Round 1

RT RT R TLTR

RT RT RT RT R TLTR R TLTR LTR LTR LTR RT

Blast Round 2

Page 43: Diversity and survival strategies of LTR retrotransposons in the  Arabidopsis  genome

LTR prediction

• Works only for hits of a sequence interior to LTRs

10 kb 10 kb

Blast2Sequences

Genome sequenceHit

H it

Hit

• Blast2Sequences is used to detect repeats• 10kb of sequence upstream and downstream are compared

• Innermost matching repeats are taken to be the LTRs

Page 44: Diversity and survival strategies of LTR retrotransposons in the  Arabidopsis  genome

LTR Identification ErrorsHit

Predicted element Hit

Tandem elements

10 kb 10 kb

Hit1 Hit2

Nested elements

10 kb 10 kb

Hit2Predicted element

Hit

pA pA

10 kb 10 kb

Degenerate or simple internal repeat elements

Hit

Page 45: Diversity and survival strategies of LTR retrotransposons in the  Arabidopsis  genome

Sample distribution data

Sample hit neighbors annotation data