trypdb analysis workflow

15
TrypDB Analysis Workflow Common Analysis T Cruzi Analysis T Brucei Analysis L Braziliens is Analysis L Infantum Analysis L Major Analysis Mercator

Upload: gisela-lancaster

Post on 01-Jan-2016

51 views

Category:

Documents


3 download

DESCRIPTION

TrypDB Analysis Workflow. Common Analysis. T Cruzi Analysis. T Brucei Analysis. L Braziliensis Analysis. L Infantum Analysis. L Major Analysis. Mercator. Common Analysis. Init Workflow Home Dir on Cluster. Init apiSiteFiles WebServices Dirs. Make Data Dir. Init User/Group/Project. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: TrypDB Analysis Workflow

TrypDB Analysis Workflow

Common Analysis

T Cruzi Analysis

T Brucei Analysis

L BraziliensisAnalysis

L Infantum Analysis

L Major Analysis

Mercator

Page 2: TrypDB Analysis Workflow

Common Analysis

Init Workflow Home Dir on Cluster

Init User/Group/Project

Copy PDB from

Downloads

Make Data Dir

Mirror Common

Data Dir to Cluster

Copy NRDB from

Downloads

Make NRDBShort Defline

Make Mercator Data Dir

Init apiSiteFiles

WebServices Dirs

Insert BlatAlignmentQuality

Table with Xml

Page 3: TrypDB Analysis Workflow

Organism Analysis Workflow

Genome Analysis

Proteome Analysis

Mirror Data Dir to Cluster

Make Gff FileRun Full Record

Dump

Init apiSiteFiles DownloadSite Organism Dir

Make Data Dir

Make and Format

Download Files

Run Tuning Manager

Page 4: TrypDB Analysis Workflow

Genome Analysis

Extract Genome Seqs

Find Tandem Repeats

Load Tandem Repeats

Copy Genomic Seqs to Cluster

BLASTXNRDB

Filter Sequences

Load Low Complexity

Seqs

Make Data Dir

Dump and Block Mixed

Genome Seqs

tRNA Scan

Load ORFs

Make ORFs

Make and Block

Candidate Assem Seqs

Make and Block DoTS Assemblies

Map Candidate

Assem Seqs to Genome

Map DoTS Assemblies to

genome

Page 5: TrypDB Analysis Workflow

Proteome Analysis

Calcuate Protein Seq

Calculate AASeq

Attributes

Extract Protein Seqs

Filter Seqs

Load Low Complexity

Seqs

Copy Protein Seqs to Cluster

BLASTPNRDB

Psipred InterproScan

Run TMHMM

Load TMHMM

Run SignalP

Load SignalP

EpitopesFind Seq Identity to

NRDB

Load NRDB xrefs

BLASTPPDB

Make Data Dir

Update TaxonId for PDB

ExternalAASequence

Page 6: TrypDB Analysis Workflow

BLASTMake data dir

Start blast

Wait for cluster

Copy files From cluster

extract IDsFrom Blast

result

Load Subjectsubset

Load Result

Optional steps(runtime test)

filter by subject

Update TaxonId for Nrdb ExternalAASequence

Page 7: TrypDB Analysis Workflow

Psipred

fix protein IDsFor psipred

create psipredTask dir

copy Data Dirto cluster

start psipredOn cluster

wait for cluster

copy psipredFiles from

cluster

fix psipredFile names

make Alg Inv

load psipred

run pfilt on nrdb

Make data dir

Page 8: TrypDB Analysis Workflow

Epitopes

Make Data Dir

Make Blast Dir

Format NCBI blast file

Create Epitoptes map file

Load Epitopes map

Page 9: TrypDB Analysis Workflow

InterproScan

Make Data Dir

Make InterproScan Cluster Task

Input Dir

Mirror InterproScan to Cluster

Start Cluster Task

Wait for Cluster Task

Mirror InterproScan From Cluster

Insert IprScan Results

Page 10: TrypDB Analysis Workflow

Make and Block Candidate Assembly Seqs

Make Candidate Assembly Seqs

Extract Candidate Assembly Seqs

Make Cluster Task Input Dir

Mirror To Cluster

Start Cluster Task

Wait for Cluster Task

Mirror From Cluster

Make Data Dir

Page 11: TrypDB Analysis Workflow

Map Candidate Assembly Seqs to Genome

Extract Genomic Seqs into Separate

Fasta Files

Make Data Dir

Make Gf Client Cluster Task Input

Dir

Mirror Gf Client to Cluster

Mirror Gf Client From Cluster

Insert BLAT Alignment

Setbest BLAT Alignment

Start GFCluster Task

Wait for GF Cluster Task

Run Nib On Cluster

Page 12: TrypDB Analysis Workflow

Cluster Transcripts by Genome Alignment

Put Unaligned Transcripts into One

Cluster

Assemble Transcripts

Extract Assemblies

Make Data Dir

Make Repeat Mask Cluster Task Input

Dir

Mirror Assembly Repeat Mask To

Cluster

Start RM Task on Cluster

Wait for RM Cluster Task

Make and Block Assemblies

Page 13: TrypDB Analysis Workflow

Make Data Dir

Make Assembly Gf Client Cluster Task

Input Dir

Mirror Assembly Gf Client to Cluster

Start GF Task on Cluster

Wait for GF Cluster Task

Mirror Gf Client From Cluster

Insert BLAT Alignment

Setbest BLAT Alignment

Update Assembly Source Id

Map Assemblies to Genome

Page 14: TrypDB Analysis Workflow

Dump Mixed Genomic Sequences

Make Repeat Mask Cluster Task Input

Dir

Mirror Repeat Mask To Cluster

Start Cluster Task

Wait for Cluster Task

Mirror Virtual Sequence Repeat Mask From Cluster

Make Data Dir

Dump and Block Mixed Genome Seqs

Move Blocked Seq File to Mercator

Data Dir

Page 15: TrypDB Analysis Workflow

Mercator

Run MercatorMavid

Create External Database and

Release for Synteny from Mercator

Insert Mercator Synteny Spans

Make Mercator Gff File

Correct Reading Frame in

Mercator Gff file