prof e r t ep assembly annotation tools for repeats

14
ASSEMBLY ANNOTATION TOOLS FOR REPEATS Nina Hoštáková May 2018 PROF REP D A NTE

Upload: others

Post on 10-Feb-2022

0 views

Category:

Documents


0 download

TRANSCRIPT

ASSEMBLYANNOTATION TOOLS FOR REPEATS

Nina Hoštáková May 2018

PROFR

EP

DANTE

REPEAT EXPLORER

PROFREP DANTE

NGSreads

automatic clusters

classification clusters

domains DB

domains classification

RE archive

PROFREP DANTE

Domains GFFRepeats GFFN GFF Profiles BigWig

Domains sequences

Repeats distributionQuantitative info Annotation Tracks Phylogenetic

relations

Domain-based ANnotation Of Transposable ElementsPROFiles of REPeats

PROFREP

Domains GFFRepeats GFFN GFFProfiles BigWig

DNA

ProfRep RefinerProfRep MaskerGFF Region Selector

RE ARCHIVE

Data Preparation

Prepared Datasets

ORPisumSativumTerno (2015)

NGSreads

automatic clusters

classification clusters

Extract Datafor ProfRep

ProfRepReducingcd-hit

Representativereads

Reducedclusters

RE ARCHIVE

Data preparation -PROFREP-

42 repeat|mobile_element|Class_I|LTR|Ty1/copia|SIRE48 repeat|satellite|PisTR/B134 organelle|plastid

>1fCGTAATATACATACTTGCTAGCTAGTTGGATGCATCCAACTTGCAAGCTAGTTTGATG>1rGATTTGACGGACACACTAACTAGCTAGTTGCATCTAAGCGGGCACACTAACTAACTAT

>CL42 2 1624460f 63975r >CL48 1 882044r

bp

hits

CL1 None/unclassified

DNA seq

97f

487f

34f

2854r

150f

Ty1/copia|Angela

Principle -PROFREP-

CL3

chromovirus|Tekay

28f

ALL

2f

70r

3141f

250f

1502r 1741f

22r

8r

277f

bp

hits

Ty1/copia|Angelachromovirus|Tekay

> Threshold hits

> Threshold bp

1 10000

AvePID = 90 %

Principle -PROFREP-

1. PROFILES

2. GFF3

1 6000

AvePID = 90 %

All

DOMAINS [DANTE]

REPEATS [ProfRep]

Athila

Ogre/Tat

CACTA

satellite

GENES [Ensembl]

Theobroma cacao assembly (2014), chr 9 Distributions

DOMAINS [DANTE]

REPEATS [ProfRep]

REFINED REPEATS [ProfRep]

REPEATS ANNOTATION [Ensembl]

PROFILE [ProfRep]

Theobroma cacao assembly (2014), chr 9 Chromovirus|Tekay element

N [ProfRep]

DANTEDomains Finder

DNA

DANTEDomains Filter

TE Domains DBDomains Classification

Primary Domains GFF

Filtered Domains GFF

Translated Domains Seqs

Viridiplantae

(Metazoa)

● ELEMENT TYPE

● DOMAIN TYPE

● QUALITY ➔ Identity ➔ Similarity ➔ Relative length ➔ Interruptions

Relaxed conditions

Stringent conditions

DNA

DomainsDatabase hits

Cluster

Principle -DANTE-

DNA x PROT similarity search

+ - +

DNA

database hits

score threshold

best hit

Hits classifications

RT|Class_I|LTR|Ty3/gypsy|non-chromovirus|OTA|Ogre/Tat|TatVRT|Class_I|LTR|Ty3/gypsy|non-chromovirus|OTA|Ogre/Tat|TatVRT|Class_I|LTR|Ty3/gypsy|non-chromovirus|OTA|Ogre/Tat|TatVRT|Class_I|LTR|Ty3/gypsy|non-chromovirus|OTA|Ogre/Tat|TatIV/OgreRT|Class_I|LTR|Ty3/gypsy|non-chromovirus|OTA|Ogre/Tat|TatIV/Ogre

Common classification level

RT Class_I|LTR|Ty3/gypsy|non-chromovirus|OTA|Ogre/Tat

8756 9241

Principle -DANTE-

OTA|Ogre/Tat

Visualization -DANTE-

Class_I|LTR|Ty3/gypsy|non-chromovirus|OTA|Ogre/Tat|TatV

New element

REPEATS ANNOTATION [Banana Genome Hub]

DOMAINS[DANTE]

Musa acuminta assembly (2016), chr 1 Ogre/Tat| TatV element